Microsoft has announced that it has developed a new system that is able to correctly distinguish between security and non-security software bugs 99 percent of the time. The system is also able to accurately identify critical, high-priority security bugs on average 97 percent of the time.
Microsoft used a data set of 13m work items and bugs from 47,000 of its developers stored across AzureDevOps and GitHub repositories to develop a process and machine learning model that correctly distinguishes between security and non-security bugs. In the coming months, the company plans to open source the methodology on GitHub along with example models and other resources so that the system can be used to help support human experts.
While developing its model, security experts approved the training data and the statistical sampling that was used to provide them with a manageable amount of data to review. This data was then encoded into representations called feature vectors as researchers at Microsoft went about designing the system using a two-step process.
The model first learned to classify security and non-security bugs and then it learned to apply security labels (critical, important or low-impact) to those bugs.
Identifying security bugs
In order to make its bug predictions, Microsoft’s model leverages two techniques.
The first is an information retrieval approach called frequency-inverse document frequency algorithm (TF-IDF) which identifies how many times a word appears in a document and then checks how relevant the word is in a collection of titles. According to Microsoft, its bug titles are usually quite short and contain around 10 words.
The second technique the software giant uses is a logistic regression model that utilizes a logistic function to model the probability of a certain class or event existing.
In its blog post announcing the new system, Microsoft explained how it used machine learning models and security experts to better identify security bugs, saying:
“Every day, software developers stare down a long list of features and bugs that need to be addressed. Security professionals try to help by using automated tools to prioritize security bugs, but too often, engineers waste time on false positives or miss a critical security vulnerability that has been misclassified. To tackle this problem data science and security teams came together to explore how machine learning could help. We discovered that by pairing machine learning models with security experts, we can significantly improve the identification and classification of security bugs.”
Microsoft’s new bug detecting system has already been deployed in its internal production and it is also continually retrained with data approved by the company’s security experts who monitor how many bugs are generated during software development.