<div class="csl-bib-body">
<div class="csl-entry">Rekabsaz, N. (2018). <i>Word representation for text analysis and search : Document retrieval, sentiment analysis, and cross lingual word sense disambiguation</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.62352</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2018.62352
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/1909
-
dc.description
Zusammenfassung in deutscher Sprache
-
dc.description.abstract
Semantics in language is a fundamental aspect of human cognition and in great extent defines our understanding and knowledge. Word representation methods suggest a computational model to capture semantics by providing vectors as proxies to the meaning of terms, known as word embedding. Recent advancements of the models using neural network approaches open an exciting perspective, and urge further research on understanding and making use of semantic representation models in language and text processing. In this thesis, we introduce novel methodologies to exploit word representation models in various text analysis tasks. We also provide in-depth analyses of the concept of term relatedness in semantic models. The thesis contributes to basic research in the area of Information Retrieval and word representation interpretability, as well as applied research in Cross-Lingual Word Sense Disambiguation (CL-WSD), and sentiment analysis. We cover several tasks in Information Management such as document retrieval, gender bias detection, CL-WSD for language with scarce resources, and volatility prediction, studied in the news, health, finance, and social science domains. In the first task Our evaluations on various retrieval test collections show significant improvements in search performance by using the generalized translation models in comparison to strong, state of the art baselines. The next topic approaches the interpretability of word embedding by introducing a novel neural-based representation model. The model transfers dense word embedding to sparse vectors where the semantic concepts of the representations are explicitly specified. As a case-study, we use these explicit representations to quantify the degree of the existence of gender bias in the Wikipedia articles. Our analysis shows strong bias in a few specific occupations (e.g. nurse) to female. The next task regards CL-WSD for low-resource languages/domains (English to Persian in our work). We approach this task using the semantic similarity of the translation terms in their contexts, showing the benefits of exploiting word representation for CL-WSD, specially in the absence of reliable resources. Finally, we contribute to the state-of-the-art of sentiment analysis, by exploiting the generalized translation models to predict volatility in financial markets. Our approach, when combined with factual market data, outperforms state-of-the-art methods, and shows the advantages of using textual data together with semantic methods for volatility forecasting.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
text processing
en
dc.subject
word representation
en
dc.subject
word embedding
en
dc.subject
information retrieval
en
dc.subject
search engines
en
dc.subject
sentiment analysis
en
dc.subject
financial reports
en
dc.subject
cross lingual word sense disambiguation
en
dc.subject
gender bias quantification
en
dc.title
Word representation for text analysis and search : Document retrieval, sentiment analysis, and cross lingual word sense disambiguation
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2018.62352
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Navid Rekabsaz
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering
-
dc.type.qualificationlevel
Doctoral
-
dc.identifier.libraryid
AC15271641
-
dc.description.numberOfPages
128
-
dc.identifier.urn
urn:nbn:at:at-ubtuw:1-120193
-
dc.thesistype
Dissertation
de
dc.thesistype
Dissertation
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.advisor.orcid
0000-0002-7149-5843
-
item.fulltext
with Fulltext
-
item.cerifentitytype
Publications
-
item.mimetype
application/pdf
-
item.openairecristype
http://purl.org/coar/resource_type/c_db06
-
item.languageiso639-1
en
-
item.openaccessfulltext
Open Access
-
item.openairetype
doctoral thesis
-
item.grantfulltext
open
-
crisitem.author.dept
E194-01 - Forschungsbereich Information und Software Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering