Malicious code classification method based on API sequence and text-CNN

Min Liu; Hailong Li

doi:10.1117/12.2642611

28 July 2022 Malicious code classification method based on API sequence and text-CNN

Min Liu, Hailong Li

Proceedings Volume 12303, International Conference on Cloud Computing, Internet of Things, and Computer Applications (CICA 2022); 123030T (2022) https://doi.org/10.1117/12.2642611
Event: International Conference on Cloud Computing, Internet of Things, and Computer Applications, 2022, Luoyang, China

Abstract

Focused on the issue that the feature information extracted in malicious code cannot fully explain the behavior functionof malicious code, and the neural network used cannot extract spatial features and time series features at the same time, amalicious code classification method based on API sequences and text convolutional neural network Text-CNNwasproposed. Firstly, the method used the binary file analysis tool Angr to reversely analyze the malicious code binaryfile, obtained its data structure and control flow information, and automatically generated the control flow graph and functioncall graph. On this basis, an API call sequence extraction algorithm was proposed, which could generate the API call sequences according to the sequence of API functions used by malicious code. Secondly, an API call sequencevectorization model was established by using the word2vec model to vectorize the API call sequence, so that eachAPIfunction could obtain a vector representation of itself. Then, a malicious code API call sequence was transformedintoamalicious code API matrix, which was used as the input of the classification model. Finally, drawing on the idea of text classification, a malicious code classification model MM-Text-CNN was proposed. This model combinedone-dimensional convolution operation and two-dimensional convolution operation. It was not only suitable for input dataofdifferent sizes, but also can simultaneously extract spatial and temporal features of input data. The experimental resultsshowed that the classification model proposed in this paper can complete the malicious code classification task, andtheaccuracy rate could reach 97.83%.

Citation Download Citation

Min Liu and Hailong Li "Malicious code classification method based on API sequence and text-CNN", Proc. SPIE 12303, International Conference on Cloud Computing, Internet of Things, and Computer Applications (CICA 2022), 123030T (28 July 2022); https://doi.org/10.1117/12.2642611

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
10 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Convolution

Binary data

Convolutional neural networks

Data modeling

Statistical modeling

Analytical research

Classification systems

Show All Keywords

Keywords/Phrases

Search In:

Publication Years