Malware Detection in PDF Files Using Machine Learning

Abstract : In this report we present how we used machine learning techniques to detect malicious behaviours in PDF files. At this aim, we first set up a SVM (Support Machine Vector) classifier that was able to detect 99.7% of malware. However, this classifier was easy to lure with malicious PDF, we forged to make them look like clean ones. We first proposed a very naive attack, that was easily stopped by the establishment of a threshold. We also implemented a gradientdescent attack to evade this SVM. This attack was almost 100% successful. In order to fix this problem, we provided counter-measures to the latter attack. A more elaborated features selection, and the use of a threshold, allowed us to stop up to 99.99% of these attacks. Finally, using adversarial learning techniques, we were able to prevent gradient descent attacks by iteratively feeding the SVM with malicious forged PDF. We found that after 3 iterations, every gradient-descent forged PDF were detected, completely preventing the attack.
Document type :
[Research Report] REDOCS. 2018
Liste complète des métadonnées

Cited literature [9 references]  Display  Hide  Download
Contributor : Claire Delaplace <>
Submitted on : Thursday, February 8, 2018 - 5:07:06 PM
Last modification on : Wednesday, February 21, 2018 - 1:31:45 AM


Files produced by the author(s)


  • HAL Id : hal-01704766, version 1


Bonan Cuan, Aliénor Damien, Claire Delaplace, Mathieu Valois. Malware Detection in PDF Files Using Machine Learning. [Research Report] REDOCS. 2018. 〈hal-01704766〉



Record views


Files downloads