Paper
28 July 2022 Malicious code classification method based on API sequence and text-CNN
Min Liu, Hailong Li
Author Affiliations +
Proceedings Volume 12303, International Conference on Cloud Computing, Internet of Things, and Computer Applications (CICA 2022); 123030T (2022) https://doi.org/10.1117/12.2642611
Event: International Conference on Cloud Computing, Internet of Things, and Computer Applications, 2022, Luoyang, China
Abstract
Focused on the issue that the feature information extracted in malicious code cannot fully explain the behavior functionof malicious code, and the neural network used cannot extract spatial features and time series features at the same time, amalicious code classification method based on API sequences and text convolutional neural network Text-CNNwasproposed. Firstly, the method used the binary file analysis tool Angr to reversely analyze the malicious code binaryfile, obtained its data structure and control flow information, and automatically generated the control flow graph and functioncall graph. On this basis, an API call sequence extraction algorithm was proposed, which could generate the API call sequences according to the sequence of API functions used by malicious code. Secondly, an API call sequencevectorization model was established by using the word2vec model to vectorize the API call sequence, so that eachAPIfunction could obtain a vector representation of itself. Then, a malicious code API call sequence was transformedintoamalicious code API matrix, which was used as the input of the classification model. Finally, drawing on the idea of text classification, a malicious code classification model MM-Text-CNN was proposed. This model combinedone-dimensional convolution operation and two-dimensional convolution operation. It was not only suitable for input dataofdifferent sizes, but also can simultaneously extract spatial and temporal features of input data. The experimental resultsshowed that the classification model proposed in this paper can complete the malicious code classification task, andtheaccuracy rate could reach 97.83%.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Min Liu and Hailong Li "Malicious code classification method based on API sequence and text-CNN", Proc. SPIE 12303, International Conference on Cloud Computing, Internet of Things, and Computer Applications (CICA 2022), 123030T (28 July 2022); https://doi.org/10.1117/12.2642611
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Convolution

Binary data

Convolutional neural networks

Data modeling

Statistical modeling

Analytical research

Classification systems

Back to Top