Event Details
Vulnerability Detection in Assembly Language Source Code Using Deep Learning
Presenter: Karthiga Thangavelu
Supervisor:
Date: Wed, January 11, 2023
Time: 11:00:00 - 00:00:00
Place: ZOOM - Please see below.
ABSTRACT
Zoom meeting link:
https://uvic.zoom.us/j/7049885119?pwd=aWdTQmdsSUtvWnkweEJaNGJVOGRJUT09
One tap mobile
+17789072071,,7049885119#,,,,0#,,797290# Canada
+16475580588,,7049885119#,,,,0#,,797290# Canada
Dial by your location
+1 778 907 2071 Canada
+1 647 558 0588 Canada
Meeting ID: 704 988 5119
Password: 797290
Find your local number: https://uvic.zoom.us/u/kefS2noc3n
Meeting ID: 704 988 5119
Password: 797290
Note: Please log in to Zoom via SSO and your UVic Netlink ID
Summary: Language modelling for source code is a state-of-the-art method developing significantly in recent years. Their applications are found in code completion, programming from one language to another, translating text documents to code, finding vulnerabilities in source code, etc. Unlike other source code modelling such as C, C++, or Python, modelling assembly language is a tedious process. Most of the approaches involved in feature engineering are manual in assembly language. In this project, the pattern of assembly language is recognized, and malicious code is classified from non-malicious code. The strings of jumps are introduced into the assembly code to make it non-malicious. The pattern recognition and classification process consist of 3 main tasks. Firstly, the strings of jumps are introduced to the assembly code and tokenize the assembly code. Secondly, converting instructions to vector embeddings based on assembly language instruction embedding using the BERT language transformer method which minimizes the manual processes of the pre-processing dataset. The final task is a downstream task where the instruction embeddings are fed into the LSTM network for classifying malicious code from non-malicious code using an assembly language dataset. The performance of the model is evaluated using various evaluation metrics such as accuracy, confusion matrix, recall, precision, and F1 score.