Ibéria Medeiros and Nuno Neves, LASIGE’s integrated researchers, has published a work on natual language (NLP) model for detecting vulnerabilities in web applications in the IEEE Transactions on Reliability, a high-impact journal (h5-index: 102). The paper, titled “Statically Detecting Vulnerabilities by Processing Programming Languages as Natural Languages”, is co-authored by Miguel Correia, INESC-ID researcher.
The goal of the work was to present an alternative approach to traditional static analysis tools, in which tools learn to detect web vulnerabilities automatically by resorting to NLP. The approach employs a sequence model, more concretely a Hiden Markov Model (HMM), to learn to characterize vulnerabilities based on an annotated corpus. Afterwards, the model is utilized to discover and identify vulnerabilities in the source code. It was implemented in the DEKANT tool and evaluated experimentally with a large set of PHP applications and WordPress plugins.
The paper is available here.