블로그 구조적인 특징정보를 활용한 클러스터 레이블 선정에 관한 연구

Metadata Downloads
Issued Date
레이블 선정
Document clustering is used in various clusters as the amount of data in a document type is increased by grouping similar documents into clusters through appearance frequency or characteristics of the keywords included in the document. Labels on these clusters help users to understand the meaning of the document cluster and help to understand the relationship between each document cluster. Therefore, it is necessary to have a label that covers the meaning of the cluster and can express the characteristics. Also, it is used in various fields such as blogs as well as documents. However, blogs have a problem that it is difficult to select a representative label that can express the meaning of each cluster generated as a result of clustering because the information is widened as a lot of information is produced in real time due to its characteristics.
Therefore, in this paper, we select the representative label that can cover the whole contents by expressing the characteristics of the cluster through the problem that it is difficult to select the representative label of the cluster due to a large amount of data due to the nature of the blog. First, we collect the title, body, and tags of a blog, extract only nouns, and generate a candidate keyword set through keyword normalization and specific location keyword weights. The generated candidate keyword set is generated by selecting the semantic association label with the cluster using the FP-growth algorithm. In this way, the performance of the representative label selection method which does not utilize the existing specific weight is improved, and the representative label of the blogs cluster for the specific search is determined and proved to be provided to the user.
Alternative Title
A study on cluster label selection using blog structural feature information
Alternative Author(s)
Seungmin Han
조선대학교 산업기술융합대학원
산업기술융합대학원 소프트웨어융합공학과
Awarded Date
2017. 2
Table Of Contents
Ⅰ. 서론 1
A. 연구 배경 및 목적 1
B. 연구 내용 및 구성 3

Ⅱ. 관련 연구 4
A. 문서 내 키워드 추출 및 동시 출현 단어 분석 4
B. 빈발 패턴 마이닝 기법 9

Ⅲ. 연관규칙을 이용한 블로그 레이블 선정 13
A. 시스템 구성도 13
B. 후보 키워드 추출 15
1. 키워드 추출 15
2. 키워드 가중치 19
C. FP - Growth 알고리즘을 이용한 레이블 선정 22
1. FP-tree 구축 22
2. FP-tree를 이용한 연관규칙 생성 26

Ⅳ. 실험 및 평가 28
A. 데이터 수집 28
B. 데이터 셋 30
1. 후보 키워드 집합 30
2. 연관 규칙을 이용한 대표 레이블 선정 31
C. 실험 평가 방법 및 결과 32

Ⅴ. 결론 및 제언 36

참고문헌 37
조선대학교 산업기술융합대학원
한승민. (2016). 블로그 구조적인 특징정보를 활용한 클러스터 레이블 선정에 관한 연구
Appears in Collections:
Engineering > Theses(Master)(산업기술창업대학원)
Authorize & License
  • AuthorizeOpen
Files in This Item:

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.