Bibliographic Metadata

Boosting classifications with imbalanced data / von Philipp Rudolf Bauer
Additional Titles
Boosting Classifications with Imbalanced Data
AuthorBauer, Philipp Rudolf
CensorFilzmoser, Peter
PublishedWien, 2017
Description90 Seiten : Illustrationen
Institutional NoteTechnische Universität Wien, Diplomarbeit, 2017
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
Document typeThesis (Diplom)
Keywords (EN)Statistics / Classification / Boosting
URNurn:nbn:at:at-ubtuw:1-104275 Persistent Identifier (URN)
 The work is publicly available
Boosting classifications with imbalanced data [0.8 mb]
Abstract (English)

Boosting is an ensemble method which uses a “weak” classifier to create a “strong” one, based on the theory of Robert Schapires work in 1990 (see Schapire 1990). It appears similar to bagging yet is fundamentally different. This thesis will start with a short introduction followed by a chapter describing the theory and methodology behind boosting. This is followed by a chapter presenting a set of boosting algorithms, applicable to binary, multi-class and regression problems. The major focus of this thesis is to examine the performance of boosting algorithms on imbalanced data sets. The issue with these data sets is that classifiers tend to emphasize the larger classes, which leads to significant class distribution skews. An established general solution to this issue is to apply sampling methods. After introducing these, the simulations chapter demonstrates that boosting algorithms work well with minority sampling in binary classification, whereas majority sampling appears to be preferable in the multi-class problem. However, it will be shown that in the multi-class setting the inbuilt re-weighting of hard to classify problems of the boosting algorithms AdaBoost.M1 and SAMME, is sufficient to handle imbalances in the data set, without any sampling necessary.

The PDF-Document has been downloaded 24 times.