Boosting classifications with imbalanced data

Bauer, Philipp Rudolf

doi:10.34726/hss.2017.45341

DC Field

Value

Language

dc.contributor.advisor

Filzmoser, Peter

dc.contributor.author

Bauer, Philipp Rudolf

dc.date.accessioned

2020-06-29T05:20:17Z

dc.date.issued

2017

dc.date.submitted

2017-11

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Bauer, P. R. (2017). <i>Boosting classifications with imbalanced data</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2017.45341</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2017.45341

dc.identifier.uri

http://hdl.handle.net/20.500.12708/5289

dc.description

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

dc.description.abstract

Boosting is an ensemble method which uses a “weak” classifier to create a “strong” one, based on the theory of Robert Schapire’s work in 1990 (see Schapire 1990). It appears similar to bagging yet is fundamentally different. This thesis will start with a short introduction followed by a chapter describing the theory and methodology behind boosting. This is followed by a chapter presenting a set of boosting algorithms, applicable to binary, multi-class and regression problems. The major focus of this thesis is to examine the performance of boosting algorithms on imbalanced data sets. The issue with these data sets is that classifiers tend to emphasize the larger classes, which leads to significant class distribution skews. An established general solution to this issue is to apply sampling methods. After introducing these, the simulations chapter demonstrates that boosting algorithms work well with minority sampling in binary classification, whereas majority sampling appears to be preferable in the multi-class problem. However, it will be shown that in the multi-class setting the inbuilt re-weighting of hard to classify problems of the boosting algorithms AdaBoost.M1 and SAMME, is sufficient to handle imbalances in the data set, without any sampling necessary.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Statistics

dc.subject

Classification

dc.subject

Boosting

dc.title

Boosting classifications with imbalanced data

dc.title.alternative

Boosting Classifications with Imbalanced Data

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2017.45341

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Philipp Rudolf Bauer

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E105 - Institut für Stochastik und Wirtschaftsmathematik

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC14500523

dc.description.numberOfPages

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-104275

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.advisor.orcid

0000-0002-8014-4682

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.languageiso639-1

item.openaccessfulltext

Open Access

item.openairetype

master thesis

item.grantfulltext

open

crisitem.author.dept

E105 - Institut für Stochastik und Wirtschaftsmathematik

crisitem.author.parentorg

E100 - Fakultät für Mathematik und Geoinformation

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(821.34 kB)

In Copyright

Show simple item record

Page view(s)

167

checked on Dec 1, 2023

Download(s)

checked on Dec 1, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM