Build failure prediction in continuous integration workflows

Rausch, Thomas

doi:10.34726/hss.2016.37419

Record link:

https://doi.org/10.34726/hss.2016.37419
http://hdl.handle.net/20.500.12708/3329

Title:

Build failure prediction in continuous integration workflows

Citation:

Rausch, T. (2016). Build failure prediction in continuous integration workflows [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2016.37419

reposiTUm DOI:

10.34726/hss.2016.37419

CatalogPlus:

AC13351644

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Rausch, Thomas

Advisor:

Schulte, Stefan

Organisational Unit:

E184 - Institut für Informationssysteme

Date (published):

2016

Number of Pages:

128

Keywords:

Empirical Software Engineering; Machine Learning; Predictive Analytics; Continuous Integration; Build Failure Prediction

Abstract:

Continuous integration (CI) is a practice where developers integrate their work into the main stream of development frequently. A CI server monitors the source code repository of a project and automatically executes the software build process when new changes are checked in. If a build fails, developers have to identify and fix the cause of the broken build, leading to a delay in the integration process and stalling further development. Large software projects often have long running builds that exacerbate this problem. Despite the widespread use of CI, little is known about the multiplicity of errors that cause builds to fail. Yet, understanding when and why build errors occur is an important step towards improving developer productivity in the CI workflow. By identifying characteristics of development practices that cause build failures, we can predict preliminary results for an integration. This helps developers react to possible problems even before a build is initiated, thereby saving time and resources. In this thesis, we introduce CInsight, a framework for analyzing CI workflows and build failures. We conduct an empirical study on real-world data from 14 open source software projects. Data from source code repositories and build systems are explored to gather qualitative and quantitative evidence about the multiplicity and frequency of CI build errors. Statistical methods are used to examine the relationship between development practices and build failures. Based on the results, we devise a method for CI build failure prediction. Our results show that failing unit-tests and violations of code quality rules are the main causes for build failures. The statistical analyses reveal that the type and amount of previous errors are the strongest predictor for future failures. Our best prediction models yield average recall and precision values of 0.82 and 0.80, respectively. Furthermore, our approach allows to update a prediction during the execution of a build.

Additional information:

Zusammenfassung in deutscher Sprache

License:

In Copyright

Appears in Collections:

Thesis