Date Approved

1-10-2023

Embargo Period

1-11-2023

Document Type

Thesis

Degree Name

M.S. Computer Science

Department

Computer Science

College

College of Science & Mathematics

Advisor

Vahid Heydari, Ph.D.

Committee Member 1

Shen-Shyang Ho, Ph.D.

Committee Member 2

Silvija Kokalj-Filipovic, Ph.D.

Keywords

malware detection, machine learning, model agnostic language

Subject(s)

Malware (computer software)

Disciplines

Computer Sciences

Abstract

The adoption of the internet as a global platform has birthed a significant rise in cyber-attacks of various forms ranging from Trojans, worms, spyware, ransomware, botnet malware, rootkit, etc. In order to tackle the issue of all these forms of malware, there is a need to understand and detect them. There are various methods of detecting malware which include signature, behavioral, and machine learning. Machine learning methods have proven to be the most efficient of all for malware detection. In this thesis, a system that utilizes both the signature and dynamic behavior-based detection techniques, with the added layer of the machine learning algorithm with model explainability capability is proposed. This hybrid system provides not only predictions but also their interpretation and explanation for a malware detection task. The layer of a machine learning algorithm can be Logistic Regression, Random Forest, Naive Bayes, Decision Tree, or Support Vector Machine. Empirical performance evaluation results on publicly available datasets and manually acquired samples (both benign and malicious) are used to compare the five machine learning algorithms. DALEX (moDel Agnostic Language for Exploration and explanation) is integrated into the proposed hybrid approach to support the interpretation and understanding of the prediction to improve the trust of cyber security stakeholders in complex machine learning predictive models.

Share

COinS