Feature generation and selection in hyperspectral imaging

Ganglberger, Wolfgang

doi:10.34726/hss.2018.54200

DC Field

Value

Language

dc.contributor.advisor

Lohninger, Johann

dc.contributor.author

Ganglberger, Wolfgang

dc.date.accessioned

2020-06-27T22:30:55Z

dc.date.issued

2018

dc.date.submitted

2018-05

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Ganglberger, W. (2018). <i>Feature generation and selection in hyperspectral imaging</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.54200</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2018.54200

dc.identifier.uri

http://hdl.handle.net/20.500.12708/1809

dc.description

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

dc.description.abstract

Diese Diplomarbeit präsentiert AutoFeature, einen neuen Algorithmus, der materialspezifische spektroskopische Charakteristika aus annotierten Infrarotspektroskopie-Daten völlig automatisch zu extrahieren vermag. Mithilfe dieser Charakteristika können anschließend die jeweiligen Materialien in hyperspektralen Bildern identifiziert werden. Eine Expertise in spektroskopischen Eigenschaften der Materialien ist demnach für den Anwender nicht nötig. Der AutoFeature Algorithmus generiert einerseits tausende Features mittels Template Matching und wählt andererseits, basierend auf statistischen Methoden und maschinellem Lernen, die vielversprechendsten Features aus. Für das Template Matching wurden vier Arten von Templates konzipiert: Dreiecke, Gauß’sche Glockenkurven, allgemeine Gauß’sche Glockenkurven und Geraden. Das Template Matching erfolgt an allen Positionen des Infrarotspektrums und beruht auf dem Pearson Korrelationskoeffizienten. Die anschließende Auswahl der relevanten Features erfolgt methodisch entweder durch Fast Function Extraction, Embedded Random Forest Modelling oder durch eine der drei Filtermethoden ReliefF, Fisher Score und HSIC Lasso. Die Studie untersucht zunächst das Verhalten des AutoFeature Algorithmus hinsichtlich Datensatzgröße und Rauschen mithilfe künstlicher Daten. Anschließend werden Features aus drei realen Datensätzen aus Mikroplastik- und Hautgewebeproben automatisch extrahiert. Diese werden für das Erstellen von Random Forest Modellen verwendet, anhand derer im ersten Experiment fünf Polymere, im zweiten Experiment Melanoma und Nicht-Melanoma und im dritten Experiment Bindegewebe und Nicht-Bindegewebe klassifiziert werden. Bei den künstlichen Datensätzen mit Samplegröße 16 konnte der Algorithmus die korrekten Features bis zu einem Rauschniveau von 10% erkennen, bei Samplegröße 100 bis zu einem Rauschniveau von 25%. Für reale Daten wurden Features aller vier Templates extrahiert, die sich ausschließlich in charakteristischen Absorptionsbändern befinden. Die genauen Positionen und Breiten mancher Features fallen dennoch unerwartet aus. Die Validierung der Random Forest Modelle mit Testdaten resultierte in einer Klassifikationsgenauigkeit von mindestens 99.6% im Fall der Polymere und in perfekten Klassifikationen bei den Melanoma- und Bindegewebsdaten. Mittels unterschiedlicher Selektionsmethoden wurden Features mit variablen Dichteeigenschaften ausgewählt, die jedoch alle eine überzeugende Unterscheidbarkeit der Klassen aufweisen. Insgesamt konnten mithilfe des AutoFeature Algorithmus sowohl bei künstlichen als auch bei realen Daten Features automatisch extrahiert werden, die nicht nur chemisch sinnvoll, sondern auch für Klassifikationen geeignet sind. Um das Potential des AutoFeature Algorithmus festzustellen, bedarf es weiterer Untersuchungen mit vielfältigeren Datensätzen. Durch das Erstellen zusätzlicher Templates und die Anpassung der Selektionsparameter ist eine algorithmische Weiterentwicklung möglich.

dc.description.abstract

This master’s thesis presents Autofeature, a novel algorithm that enables the automatic extraction of material specific spectroscopic characteristics from an annotated infrared spectroscopy dataset. With these characteristics the material can then be identified in hyperspectral images. Accordingly, no expertise of the user in the spectroscopic properties of the material is necessary. On the one hand, the AutoFeature algorithm generates thousands of features based on template matching and on the other hand, selects the most promising features based on statistical and machine learning methods. Four types of templates are designed: triangles, Gaussian bells, general Gaussian bells and straight lines. The matching is performed at all possible infrared spectrum positions by employing the Pearson correlation coefficient. The subsequent feature selection is carried out with fast function extraction, embedded random forest modelling or with one of the following three filter selection methods ReliefF, Fisher score and HSIC lasso. The study first investigates the properties of the AutoFeature algorithm concerning sample size and noise. Next, features are automatically extracted from three real-world data sets containing microplastic and skin tissue specimens. These features are then used to train random forest classification models for class predictions of five polymers in the first experiment, melanoma and non-melanoma in the second experiment, and connective tissue and non-connective tissue in the third experiment. For artificial data, the algorithm was able to extract correct features for noise levels of 10% for a sample size of 16 respectively 25% for sample size 100. For real-world data, features of all four types are extracted and the features are only located at characteristic absorption bands of the substances being investigated. The exact positions and widths of some features are unexpected though. The validation of the random forest models with unseen test data yielded classification accuracies of 99.6% or higher for the polymer predictions and a perfect classification for the melanoma and connective tissue predictions. While the different selection methods result in features with different probability density functions, they all yield features with convincing class discrimination properties. Overall, the AutoFeature algorithm was able to automatically extract features that were chemically meaningful and suited for prediction tasks for both artificial and real-world data. To evaluate further potential of the algorithm, examinations with datasets of greater variety need to be performed. We believe, by designing additionaltemplates and adapting parameters of the selection methods, further algorithmic progress can be made.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Hyperspectral Imaging

dc.subject

HSI

dc.subject

Chemometrie

dc.subject

Hyperspectral Imaging

dc.subject

HSI

dc.subject

Chemometrics

dc.title

Feature generation and selection in hyperspectral imaging

dc.title.alternative

Automatische Feature-Erzeugung und -Auswahl bei Hyperspectral Imaging-Anwendungen

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2018.54200

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Wolfgang Ganglberger

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E164 - Institut für Chemische Technologien und Analytik

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC15055796

dc.description.numberOfPages

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-109887

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.languageiso639-1

item.openaccessfulltext

Open Access

item.openairetype

master thesis

item.grantfulltext

open

crisitem.author.dept

E164 - Institut für Chemische Technologien und Analytik

crisitem.author.parentorg

E150 - Fakultät für Technische Chemie

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(18.43 MB)

In Copyright

Show simple item record

Page view(s)

165

checked on Dec 1, 2023

Download(s)

checked on Dec 1, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM