<div class="csl-bib-body">
<div class="csl-entry">Hofstätter, S. (2018). <i>Adaptierung von Word Embeddings für domänenspezifisches Information Retrieval</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.50325</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2018.50325
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/1842
-
dc.description.abstract
Search engines rank documents based on their relevance to a given query – using only exact word matches might miss results. Expanding a document retrieval query with similar words gained from a word embedding offers great potential for better query results. The expansion of the search space allows to retrieve relevant documents, even if they do not contain the actual query. An additional word improves the query results only if it is relevant to the topic of the search. As observed by previous studies, an essential problem in using an out-of-box word embedding for document retrieval is that some of the added similar words have a negative impact on the retrieval performance. We create word embedding based similarity models, which are used to expand query words in domain-specific Information Retrieval. For this we adapt an existing word embedding with additional information gained from different contexts -- we incorporate them into a Skip-gram word embedding with Retrofitting. We experiment with different external resources: Latent Semantic Indexing, semantic lexicons. We also study various techniques to combine two different external resources. We first analyze changes in the local neighborhoods of query terms and global differences between the original and retrofitted vector spaces. We then evaluate the effect of the changed word embeddings on domain-specific retrieval test collections. We report improved results on some test collections. In conclusion, we show that in two out of three test collections, incorporating external resources significantly improves the results over using an out-of-the-box word embedding.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Informationsrückgewinnung
de
dc.subject
Word Embeddings
de
dc.subject
Word2Vec
de
dc.subject
Globaler Kontext
de
dc.subject
Verwandte Wörter
de
dc.subject
Information Retrieval
en
dc.subject
Word Embeddings
en
dc.subject
Word2Vec
en
dc.subject
Global context
en
dc.subject
Related terms
en
dc.title
Adaptierung von Word Embeddings für domänenspezifisches Information Retrieval
en
dc.title.alternative
Adapting Word embeddings for domain-specific information retrieval
de
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2018.50325
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Sebastian Hofstätter
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
dc.contributor.assistant
Rekabsaz, Navid
-
tuw.publication.orgunit
E188 - Institut für Softwaretechnik und Interaktive Systeme
-
dc.type.qualificationlevel
Diploma
-
dc.identifier.libraryid
AC15057909
-
dc.description.numberOfPages
68
-
dc.identifier.urn
urn:nbn:at:at-ubtuw:1-108442
-
dc.thesistype
Diplomarbeit
de
dc.thesistype
Diploma Thesis
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.assistant.staffStatus
staff
-
tuw.advisor.orcid
0000-0002-7149-5843
-
item.fulltext
with Fulltext
-
item.cerifentitytype
Publications
-
item.mimetype
application/pdf
-
item.openairecristype
http://purl.org/coar/resource_type/c_bdcc
-
item.languageiso639-1
en
-
item.openaccessfulltext
Open Access
-
item.openairetype
master thesis
-
item.grantfulltext
open
-
crisitem.author.dept
E194-04 - Forschungsbereich E-Commerce
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering