Malware Detection in PDF Files Using Machine Learning

Abstract : In this report we present how we used machine learning techniques to detect malicious behaviours in PDF files.At this aim, we first set up a SVM (Support Machine Vector) classifier that was able to detect 99.7% of malware. However, this classifier was easy to lure with malicious PDF, we forged to make them look like clean ones. We first proposed a very naive attack, that was easily stopped by the establishment of a threshold. We also implemented a gradientdescent attack to evade this SVM. This attack was almost 100% successful. In order to fix this problem, we provided counter-measures to the latter attack. A more elaborated features selection, and the use of a threshold, allowed us to stop up to 99.99% of these attacks.Finally, using adversarial learning techniques, we were able to prevent gradient descent attacks by iteratively feeding the SVM with malicious forged PDF. We found that after 3 iterations, every gradient-descent forged PDF were detected, completely preventing the attack.
Type de document :
Rapport de contrat
[Research Report] Rapport LAAS n° 18030, REDOCS. 2018, 16p
Liste complète des métadonnées

Littérature citée [8 références]  Voir  Masquer  Télécharger
Contributeur : Claire Delaplace <>
Soumis le : jeudi 8 février 2018 - 17:07:06
Dernière modification le : mardi 5 février 2019 - 12:12:41
Document(s) archivé(s) le : samedi 5 mai 2018 - 11:23:51


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-01704766, version 1



Bonan Cuan, Aliénor Damien, Claire Delaplace, Mathieu Valois. Malware Detection in PDF Files Using Machine Learning. [Research Report] Rapport LAAS n° 18030, REDOCS. 2018, 16p. 〈hal-01704766v1〉



Consultations de la notice


Téléchargements de fichiers