CHOSUN

패치와 시멘틱 구조 특성을 이용한 비주얼 객체 인식 및 검색

Metadata Downloads
Author(s)
아흐매드니스핫
Issued Date
2009
Abstract
Content based image retrieval has emerged as an important field encompassing fields like image processing, computer vision and artificial intelligence. Near the turn of the 21st century researchers finally got convinced that next evolution of systems would need to understand the semantics of an image, not simply the low level underlying computational features i.e., “bridging the semantic gap”. The image retrieval systems need to be more intelligent, to be able to recognize generic objects and visual object classes at the least and also abstract meanings as feelings, in the far run. This can be stated as the dawn of second generation research in Image retrieval.
Recognition of a multitude of objects as dogs, cars etc. is an un-noticeable every day activity, hardly considered an achievement of any subtle order. In contrast, it is the ultimate scientific challenge of computer vision. After 40 years of research, robustly identifying the familiar objects (chair, person, pet), scene categories (beach, forest, office), and activity patterns (conversation, dance, picnic) depicted in family pictures, news segments, or feature films is still far beyond the capabilities of today’s vision systems [Preface: Towards Object level Categorization Eds. Ponce J., et. al., 2006].
Visual object class recognition has gradually evolved from structure based approaches to appearance based techniques and presently processes of the human vision are under immense focus. The thesis proposes a new approach to visual object class recognition with an aim to better understanding and exploration of the underlying principles of human vision. The thesis investigates the basic level of semantic structure formation in the human vision inferential processes which is hierarchically combined with other semantic structures to form meanings at an abstract level. This is a micro level approach compared to other approaches considering the whole image structure as a unit or geometric modeling approaches. Using this approach two sets of semantic features have been derived for visual object class recognition.
The algorithm uses the hypothesis in line with Gestalt laws of proximity that; in an image, basic semantic structures are formed by line segments (arcs also approximated and broken into smaller line segments based on pixel deviation threshold) which are in close proximity of each other. Based on the notion of proximity a transitive relation is defined, which combines basic micro level semantic structures hierarchically till such a point where a semantic meaning of the structure can be extracted. The algorithm extracts line segments in an image and then forms semantic groups of these line segments based on a minimum distance threshold from each other. The line segment groups so formed can be differentiated from each other, by the number of group members and their geometrical properties. The geometrical properties of these semantic groups are used to generate rotation, translation and scale invariant histograms used as feature vectors for object class recognition tasks in a K-nearest neighbor framework.
In the second approach a semantic group based on the proximity distance is clustered and modeled as a graph vertex. The line segments which are common to more than one semantic group are defined as semantic relations between the semantic groups and are modeled as edges of the graph. This way an image object is transformed into a graph using micro level structure formations. Each vertex and edge is labeled using translation, rotation and scale invariant properties of the member segments of each vertex and edge. From a set of training images, a graph model is constructed for visual object class recognition. The graph model is constructed by iteratively combining the training graphs and frequency labeling the vertices and edges. After the combining phase, all the vertices and edges whose repetition frequency is below a threshold are removed. The final graph model consists of the semantic nodes which are highly common in the training images. The recognition is based on graph matching the query image graph and the model graph. The model graph generates a vote for the query and ties are resolved by considering the node frequencies in the query and model graph.
The algorithm has been applied to classify 101 object classes at one time. The results have been compared with existing state of the art approaches and are found promising. Results from above approaches show that low level image structure and other features can be used to construct different type of semantic features, which can help a model or a classifier make more intelligent decisions and work more effectively for the task compared to low level features alone. Our experimental results are comparable, or outperform other state-of-the-art approaches. We have also summarized the state-of-the-art at the time this work was finished. We conclude with a discussion about the possible future extensions.
Alternative Title
Visual Object Class Recognition and Retrieval using Local Patch and Semantic Image Structure Features
Alternative Author(s)
Ahmad Nishat
Affiliation
조선대학교 일반대학원
Department
일반대학원 정보통신공학과
Advisor
박종안
Awarded Date
2009-08
Table Of Contents
I. Introduction…………………………………………………………………………………………………… 1
A. Overview…………………………………………………………………………………………………… 1
B. First generation of Content Based Image retrieval…………………………………………………… 1
C. Image Content Descriptors……………………………………………………………………………… 3
1. Color………………………………………………………………………………………………………… 5
2. Texture……………………………………………………………………………………………………… 10
3. Shape……………………………………………………………………………………………………… 12
D. Second generation of Content Based Image retrieval……………………………………………… 14
1. Intelligent image retrieval………………………………………………………………………………… 16
2. Semantic image retrieval - current trends…………………………………………………………… 20
II. Local patch based approach for Image retrieval……………………………………………………… 23
A. Literature in perspective ………………………………………………………………………………… 24
B. Corner Definition ………………………………………………………………………………………… 26
C. Line Detection……………………………………………………………………………………………… 29
D. Corner Detection…………………………………………………………………………………………… 30
E. Feature Vector Construction…………………………………………………………………………… 34
1. Patch Extraction ………………………………………………………………………………………… 34
2. Descriptive statistical features computation………………………………………………………… 35
3. Classification into class labels ……………………………………………………………………… 37
4. Histogram Feature Vector Computation……………………………………………………………… 39
F. Histogram Similarity Measures for Image Retrieval………………………………………………… 40
1. Euclidean Distance Measure…………………………………………………………………………… 40
2. Relative Histogram Deviation Measure ……………………………………………………………… 41
3. Relative Histogram Bin Deviation Measure…………………………………………………………… 41
4. Quadratic Distance Measure…………………………………………………………………………… 41
G. Test Data Set……………………………………………………………………………………………… 42
H. Image Retrieval experiments…………………………………………………………………………… 42
I. Performance Evaluation…………………………………………………………………………………… 47
J. Comparison with known approaches………………………………………………………………… 48
III. Visual Object Class Recognition using Semantic Image structure……………………………… 51
A. Exploring semantic level intelligence in data………………………………………………………… 52
B. Approaches to Generic Object Class Recognition………………………………………………… 53
C. Image structure analysis for semantic features…………………………………………………… 56
D. Transforming image structure into a line segment model………………………………………… 59
E. Parameter of proximity…………………………………………………………………………………… 60
F. Defining a Transitive Relation for Semantic Modeling……………………………………………… 62
1. Feature representation…………………………………………………………………………………… 64
2. Experiments and results………………………………………………………………………………… 66
a. Data set…………………………………………………………………………………………………… 68
b. Multiclass categorization task………………………………………………………………………… 69
G. Semantic group formation and Graph modeling…………………………………………………… 74
1. Graph model for classification………………………………………………………………………… 77
2. Classification steps……………………………………………………………………………………… 81
3. Experiments and results………………………………………………………………………………… 82
H. Conclusion and future work…………………………………………………………………………… 85
References……………………………………………………………………………………………………… 88
Degree
Doctor
Publisher
조선대학교 대학원
Citation
아흐매드니스핫. (2009). 패치와 시멘틱 구조 특성을 이용한 비주얼 객체 인식 및 검색.
Type
Dissertation
URI
https://oak.chosun.ac.kr/handle/2020.oak/8346
http://chosun.dcollection.net/common/orgView/200000238497
Appears in Collections:
General Graduate School > 4. Theses(Ph.D)
Authorize & License
  • AuthorizeOpen
  • Embargo2009-08-04
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.