Research Area:  Machine Learning
Dynamic analysis and pattern matching techniques are widely used in industry, and they provide a straightforward method for the identification of malware samples. Yara is a pattern matching technique that can use sandbox memory dumps for the identification of malware families. However, pattern matching techniques fail silently due to minor code variations, leading to unidentified malware samples. This paper presents a two-layered Malware Variant Identification using Incremental Clustering (MVIIC) process and proposes clustering of unidentified malware samples to enable the identification of malware variants and new malware families. The novel incremental clustering algorithm is used in the identification of new malware variants from the unidentified malware samples. This research shows that clustering can provide a higher level of performance than Yara rules, and that clustering is resistant to small changes introduced by malware variants. This paper proposes a hybrid approach, using Yara scanning to eliminate known malware, followed by clustering, acting in concert, to allow the identification of new malware variants. F1 score and V-Measure clustering metrics are used to evaluate our results.
Keywords:  
Dynamic analysis
Malware Variant Identification
Incremental Clustering
Deep Learning
Machine Learning
F1 score
V-Measure clustering metric
Author(s) Name:  Paul Black, Iqbal Gondal, Adil Bagirov and Md Moniruzzaman
Journal name:  Computer Science & Engineering
Conferrence name:  
Publisher name:  MDPI
DOI:  https://doi.org/10.3390/electronics10141628
Volume Information:  volume 10(14)
Paper Link:   https://www.mdpi.com/2079-9292/10/14/1628