HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Experience Report: Log Mining using Natural Language Processing and Application to Anomaly Detection

Abstract : Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose problems. However, reading does not scale: with the number of machines increasingly rising, and the complexification of systems, the task of auditing systems' health based on logfiles is becoming overwhelming for system administrators. This observation led to many proposals automating the processing of logs. However, most of these proposal still require some human intervention, for instance by tagging logs, parsing the source files generating the logs, etc. In this work, we target minimal human intervention for logfile processing and propose a new approach that considers logs as regular text (as opposed to related works that seek to exploit at best the little structure imposed by log formatting). This approach allows to leverage modern techniques from natural language processing. More specifically, we first apply a word embedding technique based on Google's word2vec algorithm: logfiles' words are mapped to a high dimensional metric space, that we then exploit as a feature space using standard classifiers. The resulting pipeline is very generic, computationally efficient, and requires very little intervention. We validate our approach by seeking stress patterns on an experimental platform. Results show a strong predictive performance (≈ 90% accuracy) using three out-of-the-box classifiers.
Complete list of metadata

Cited literature [25 references]  Display  Hide  Download

Contributor : Carla Sauvanaud Connect in order to contact the contributor
Submitted on : Tuesday, August 22, 2017 - 5:56:11 PM
Last modification on : Tuesday, April 5, 2022 - 3:44:08 AM


Files produced by the author(s)


  • HAL Id : hal-01576291, version 1


Christophe Bertero, Matthieu Roy, Carla Sauvanaud, Gilles Trédan. Experience Report: Log Mining using Natural Language Processing and Application to Anomaly Detection. 28th International Symposium on Software Reliability Engineering (ISSRE 2017), Oct 2017, Toulouse, France. 10p. ⟨hal-01576291⟩



Record views


Files downloads