CHOSUN

Silhouette Edge-based Log-polar Descriptor for Human Action Representation and Recognition

Metadata Downloads
Author(s)
오도요 오냥고 윌프레드
Issued Date
2012
Abstract
인간 동작 인식은 수많은 영상처리 기법과 컴퓨터 비전에 있어서 수요가 지속되는 응용 분야로 인간이 지능을 가진 기계와 상호작용하는 방법을 향상시키는데 있어 필수적인 연구가 되어왔다.
본 논문에서는 동영상에서 인간 신체의 움직임을 추적하고 실루엣을 추출하며 신체 부분의 동작을 결합하여 동영상 내의 동작을 인식하는 방법을 제시한다. 인간 동작이 내포된 동영상은 동작을 표현하는 자세(posture)가 포함된 프레임으로 나눌 수 있다. 각각의 동작을 대표할 수 있는 주요 프레임을 검출하게 되면 적은 데이터 처리량으로 빠른 동작 인식이 가능하므로, 본 논문에서는 주요 프레임 검출을 위해 정보 엔트로피를 사용한다. 동영상 시퀀스의 한 동작에 해당하는 프레임들 간의 정보 엔트로피를 비교하여 임계값 이상의 변화량을 갖는 프레임은 자세 변화가 크므로 동작 인식을 위해 주요 프레임으로 선택된다.
그리고 특징 추출을 위해 선택된 프레임들의 실루엣 경계인 에지를 검출한다. 에지는 중요한 형태 정보를 보존하고 있으므로 인간 동작을 모델화하고 표현하는데 사용될 수 있다. 선택한 프레임들의 실루엣 에지의 결합은 각 동작을 구별할 수 있는 독특한 패턴의 영상을 만들어 내며, 이러한 패턴 영상은 결합 형태를 통해 동작들 간의 유사성과 비유사성을 보여준다. 그러나 에지 결합 영상은 동일한 동작임에도 실루엣의 크기 및 회전으로 인해 동작 인식의 어려움과 오인식의 문제가 존재한다. 따라서 동일한 동작이 서로 다른 동작으로 분류되지 않도록 크기와 회전에 강인한 기법이 요구되며, 본 논문에서는 이를 해결하기 위해 에지 결합 영상을 거리 변환 후 log-polar로 변환하는 방식을 제안한다.
거리 변환은 영상의 각 픽셀에 가장 근접한 검은 픽셀과의 거리를 할당하는 방법으로 픽셀들이 영상의 패턴을 분명히 표현하므로 두 영상을 구분하는 특징으로 사용된다. log-polar 변환은 각 동작의 에지 결합 영상을 직교 좌표 공간에서 극좌표 공간으로 이동하여 반경과 각도를 이용하므로 형태 크기에 관계없이 동일한 동작들은 유사한 분포를 나타내는 특징이 있다. 이러한 log-polar 변환을 통해 획득한 영상들은 동작 인식을 위해 원형 정합(template matching)을 사용하며 유사성의 측도로 Hausdorff 거리를 사용한다.
본 논문에서는 실험을 위해 KTH 데이터베이스가 제공하는 6가지 동작과 Weizmann 데이터베이스의 8가지 동작, 그리고 OGD 데이터베이스의 8가지 동작을 사용하였으며, 제안한 방법을 적용한 log-polar 변환과 Hausdorff 거리를 적용한 결과, 동일한 동작 간의 거리는 0이거나 아주 작은 값이었으며, 동작의 유사성이 클수록 거리 측정값은 작았고, 서로 다른 동작들은 큰 거리값을 나타냈다. Weizmann 데이터베이스와 OGD 데이터베이스 간의 인간 동작의 유사성을 측정하기 위해 교차 거리를 측정하였으며, 실험 결과는 본 논문에서 제안한 방법이 동작 인식에 있어 우수함을 보여주고 있다.|With numerous emerging applications in the field of computer vision and many other image processing technologies, human action recognition has become an essential study in improving the way we humans interact with intelligent machines.
A method is presented for tracking human body movement in videos, extracting the human-silhouette, observe the body parts movement, and model the action for activity recognition in videos.

We propose to use the log-polar transform of silhouette boundary edges to model and represent the perceived action. To be able to achieve this not-so-trivial task, we shall require many intermediary image processing techniques in the course of developing the final template for recognition. In a nutshell, a video data will be sliced into frames, use information entropy to select vital postures for template modeling, detect and extract the boundaries before summing them up to form a unique pattern per actions intended to be identified. Distance transform algorithm is then applied as a smoothening procedure before finally transforming the models into log-polar form. We attain a compact representation of actions in the form of log-polar template. We chose this form of representation for various advantages among them invariability to scale and rotation. It solves the problem of size registration during recognition step. Hausdorff distance measure is used to discriminate within actions. The distance measure takes in the input and compares it with the templates already trained and are stored in the database. We demonstrate in this dissertation that the proposed method can be accurately used to model, represent, and recognize human activities in videos. The method has been validated through various experiments with varied publicly available human action databases in addition to our own generated action videos. A considerable high rate of recognition has been achieved as demonstrated in the experimental result and analysis. The challenges in recognizing some of the activities that are close to each other in characteristics are also discussed. A final comparison of two databases sharing similar actions but performed by various actors prove that modeled actions can be clearly separated with our method. The average from Weizmann database and OGD actions reveal the difference between individually modeled actions and in general shows the effectiveness of our silhouette edge-based log-polar descriptor.
Alternative Title
실루엣 윤곽선 기반의 Log-polar Descriptor를 이용한 동작 인식
Alternative Author(s)
Wilfred Onyango Odoyo
Affiliation
조선대학교 전자정보공과대학 컴퓨터공학과
Department
일반대학원 컴퓨터공학
Advisor
조범준
Awarded Date
2012-08
Table Of Contents
1 Introduction 1
1.1 Research motivation ..................................................................... 1
1.2 Research context .......................................................................... 3
1.3 Contributions .................................................................................. 5
1.3.1 Inside the proposed method .......................................................... 6

2 Related Works 9
2.1 Introduction .................................................................................... 9
2.2 Model representations ................................................................ 11
2.2.1 Body models ................................................................................. 11
2.2.2 Sparse features ............................................................................. 12
2.2.3 Template / Image models ............................................................ 14
2.2.3.1 Motion Energy Images (MEI) ........................................................... 15
2.2.3.2 Motion History Images (MHI) ........................................................... 17

3 Posture Selection and Recognition Methods 19
3.1 Information entropy ............................................................. 19
3.2 Frame blocking ......................................................................... 21
3.3 Shape contexts .......................................................................... 23
3.4 Mathematical morphology .......................................................... 25
3.4.1 Sobel edge detector ........................................................ 26
3.5 Log-polar transformation ............................................................ 28
3.6 Action recognition methods ......................................................... 31
3.6.1 Hausdorff distance measure ................................................. 32
3.6.2 Chamfer matching distance measure ........................... 34
3.7 Distance transformation ............................................................ 36

4 The Silhouette Edge-based Log-polar Descriptor 38
4.1 Introduction .................................................................................. 38
4.2 Framework of the proposed method ........................................ 39
4.2.1 Preprocessing ............................................................................... 40
4.2.2 Feature extraction ........................................................................ 42
4.2.3 Representation ............................................................................. 44
4.2.4 Recognition ................................................................................... 48
4.3 The proposed feature descriptor step-by-step .................................... 49

5 Experimental Results and Analysis 53
5.1 Introduction to databases ........................................................... 53
5.1.1 KTH human motion dataset ......................................................... 54
5.1.2 Weizmann dataset ......................................................................... 55
5.1.3 Own-generated dataset ................................................................. 56
5.2 Results and analysis ........................................................................... 57
5.3 Challenges ............................................................................................ 73

6 Conclusions 77

References 79















List of Figures
1.1 Snapshots of videos of typical actions ...................................................... 2
1.2 Some feature representation images by three different methods .......... 4
2.1 Illustration of moving light displays (MLD) .............................................. 12
2.2 Extraction of space-time cuboids at interest points ................................. 13
2.3 Motion energy images (MEI) ...................................................................... 16
2.4 Motion history images (MHI) ...................................................................... 18
3.1 Key posture frame list formation framework ............................................ 20
3.2 Frame blocking ............................................................................................ 22
3.3 Bend action entropy images ...................................................................... 22
3.4 Posture matching using shape context ..................................................... 23
3.5 Skeleton method of shape description ...................................................... 24
3.6 Illustration of Sobel edge operator ............................................................ 27
3.7 Two-dimensional mapping in polar coordinates ....................................... 29
3.8 Log-polar transformation illustration............................................................. 30
4.1 Framework flow ....................................................................................... 39
4.2 Original images from OGD and background processed images ............ 41
4.3 Raw data from Weizmann database and edge detection ...................... 42
4.4 Sequences of boundary detected images from OGD ..........................43-44
4.5 Histogram of oriented rectangles ............................................................. 46
4.6 Boundary and block models comparison .................................................. 47
4.7 Two-hands wave representation by log-polar ......................................... 48
4.8 Two transformed images matching ............................................................ 49
4.9 Proposed feature descriptor pseudo-algorithm ......................................... 52
5.1 Hollywood and UCF dataset snapshot images ........................................ 54
5.2 KTH dataset example images ................................................................... 55
5.3 Selected images from Weizmann dataset ................................................ 56
5.4 Raw data from Own-generated Data ....................................................... 57
5.5 Silhouette-edge model and DT images .................................................... 58
5.6 Log-polar images of DT action images .................................................... 59
5.7 Sampled comparison graph data ............................................................. 61
5.8 Minimax distance comparison graph in OGD ......................................... 62
5.9 Average differences on OGD in both bar and line graphs ............... 63-64
5.10 Average of OGD dataset compared to Weizmann dataset for six actions.......................................................................................................... 66
5.11 Weizmann database distance transform images of four individuals compared and log transform of Daria’s actions of two-hands wave, bend, jump-jack, and one-hand wave ................................................... 67
5.12 Similarity between for actions performed by for different people ........ 68
5.13 Euclidean distance graph comparison of KTH database with OGD and Weizmann databases ............................................................................... 72
5.14 KTH database processed images under different environments ... 74-75















List of Tables
Table 5.1 Experiment where Hausdorff distance value is the maximum of and ........................................................................................... 60
Table 5.2 Minimax distance values based on Hausdorff distance measure. 62
Table 5.3 Sampled trained data from OGD and Weizmann databases .... 64
Table 5.4 Distance measure from each other from Figure 5.11 .................. 68
Table 5.5 Euclidean distance between KTH against (a) OGD and (b) Weizmann databases ....................................................................... 72
Degree
Doctor
Publisher
조선대학교 대학원
Citation
오도요 오냥고 윌프레드. (2012). Silhouette Edge-based Log-polar Descriptor for Human Action Representation and Recognition.
Type
Dissertation
URI
https://oak.chosun.ac.kr/handle/2020.oak/9528
http://chosun.dcollection.net/common/orgView/200000263325
Appears in Collections:
General Graduate School > 4. Theses(Ph.D)
Authorize & License
  • AuthorizeOpen
  • Embargo2012-08-09
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.