연관규칙 탐사기법과 SVM을 이용한 악성코드 탐지방법
- Issued Date
- As open application programming interfaces (APIs) become popular, new industries such as cloud computing and Internet of Things (IoT) combined with network technologies have advanced. However, malicious codes have also been constantly increasing to steal information and attack other systems. Existing anti-virus programs detect malicious codes based on signature detection technology. However, new types of malicious codes are increasing, but the number of signatures is insufficient, thus requiring a long time to detect variant malicious codes. Furthermore, existing studies on detection of malicious and normal files through machine learning classify files based on the frequency of APIs, but they have a high false positive rate.
Therefore, this study proposes a new method of detecting malicious codes through machine learning to distinguish between malicious and normal files by extracting the APIs that control functions in application programs and using association rule patterns. The API was extracted through static analysis from portable executable (PE) files and an association rule pattern of the API was extracted using the direct hashing and pruning (DHP) algorithm. Then, the API was trained using support vector machine (SVM), which is one of the machine learning techniques, to classify malicious and normal files. The proposed method improves the detection rate of malicious and normal files by applying the lift, which is the result of the association rule pattern, as a weight when the files are classified through the SVM.
When the proposed method was used, the sensitivity and precision results that determined malicious codes using the SVM model only were 71% and 77%, respectively. On the contrary, when the results of the association rule pattern were used together with the proposed method, the sensitivity and precision results were 77% and 81%, indicating improved classification performance. This study also proved that when not only the SVM classification model proposed in this paper but also other classification models were performed, they showed better performance than that of a single classification model. The reason for the low classification performance of normal files compared to that of malicious files was that malicious files employ many APIs that are used in normal files for program execution, and the number of APIs that are extracted from normal files is larger than the number of APIs executed and extracted from malicious files. As a result, the detection rate of normal files is somewhat lower. Thus, it is necessary to study an extended hybrid model that can overcome the drawback of the classification model to detect malicious codes.
This study proposed a classification method using association rule patterns to detect malicious codes, which exhibited improved performance compared to the classification of malicious files through a single classification model. If a large amount of pattern data is activated through the proposed method, it can provide criteria for detecting malicious codes to identify the behaviors of malicious files that rapidly mutate and to recognize the abnormal behaviors of malicious files.
- Authorize & License
- Files in This Item:
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.