CHOSUN

SNS 특징정보를 활용한 단문텍스트의 카테고리 분류에 관한 연구

Metadata Downloads
Author(s)
나성희
Issued Date
2016
Abstract
Recently, developments in wireless internet and the spread of mobile devices, social networking services (SNS) are rapidly progressing. SNS allows the user to place one's offline social connections onto an online platform, which in turn, strengthens and widens the user's personal network. As the number of SNS users rapidly increases, SNS is being transformed into a platform that allows the user to share one's interests with an unspecified mass, which, in the process, is changing the objective of SNSs from strengthening user's personal networks into an informational platform that freely shares interests or information.
Due to some characteristics of SNS, including the real-time generation and rapid spread of data, there has been a phenomenon of decreased efficiency in information acquisition using search engines, largely because of the overload of available information. To alleviate the problem of information overload, there have been experiments that explore the realm of document sorting, search engines, text translation, spam mail filtering, and many others. Data that generated on the internet or by social networking do not have a consistent standard, and because the contents of said data include short sentences, it is hard to sort the data based just on the content of the document.
Therefore, this dissertation attempted to solve the phenomenon of overload of information due to the large amount of data generated real time, and to separate the texts used on the internet and other SNSs for effective data management. Short texts were collected from Twitter, which most often displays sentences in short formats. The collected short texts were analyzed to extract certain features. Words were weighted based on the extracted features to examine the correlation between the collected dataset and unsorted data, and the short texts were categorized. From the results of the experiment proposed in this dissertation, it was found that text categorization showed better performance when features were weighted using feature frequencies. Result of short text classification utilizing the feature frequency , it shows the 90.7 percent .
Alternative Title
A study on the categorization of the short text using the SNS feature informations
Alternative Author(s)
Na, Sung Hee
Department
산업기술융합대학원 소프트웨어융합공학과
Advisor
김판구
Awarded Date
2016-02
Table Of Contents
ABSTRACT

Ⅰ. 서론 1
A. 연구 배경 및 목적 1
B. 연구 내용 및 구성 3

Ⅱ. 관련 연구 4
A. 문서분류 4
B. 단문텍스트 문서분류 7

Ⅲ. 단문텍스트의 특징 가중치를 이용한 카테고리 분류방법 11
A. 시스템 구성도 11
B. 카테고리 선정 13
C. 학습데이터 구축 16
1. 단문텍스트 수집 16
2. 특징 정의 17
3. 특징 추출 19
D. 단문텍스트 카테고리 분류 27
1. 상관성 분석을 이용한 분류 방법 27

Ⅳ. 실험 및 결과 29
A. 데이터 수집 29
B. 데이터 셋 31
1. 학습 데이터 셋 31
2. 실험 데이터 셋 32
C. 실험 평가 방법 및 결과 분석 33
1. 실험 평가 방법 33
2. 실험 결과 분석 35

Ⅴ. 결론 및 제언 39

참고문헌 40
Degree
Master
Publisher
조선대학교
Citation
나성희. (2016). SNS 특징정보를 활용한 단문텍스트의 카테고리 분류에 관한 연구.
Type
Dissertation
URI
https://oak.chosun.ac.kr/handle/2020.oak/16486
http://chosun.dcollection.net/common/orgView/200000265217
Appears in Collections:
Engineering > 3. Theses(Master)
Authorize & License
  • AuthorizeOpen
  • Embargo2016-02-25
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.