<div class="csl-bib-body">
<div class="csl-entry">Bösch, H. (2017). <i>Reproducible ranking lists for retrieval from evolving document collections : how column-store technology enhances the capability of inverted indicees</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2017.24819</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2017.24819
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/5118
-
dc.description
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
-
dc.description.abstract
The core structure of (probabilistic) information retrieval systems lacks the ability to make retrieval result rankings reproducible. When the underlying data changes, IR indices change over time and especially the history of tf-idf values is hard to preserve. Thus, the same query might produce different results when the collection has been updated in the meantime. Only little research is directed to reproducibility in IR, though it would be desirable in fields of research or patent applications. The first step into this direction is to have subsets of documents in a dynamically evolving data environment unambiguously identifiable. This can be achieved with structured data and a data schema suitable for scalable data citation (cf. section 3). It suggests maintaining a history of evolving data by tagging data records with timestamps and keeping a version history for each update on the collection. Conventional row-stores cannot deal with this large volume data and statistics aggregations, as it would be required for IR applications. Yet, the column-store architecture is designed for analytical workloads and has already been proposed for IR-prototyping (cf. section 4.2), an approach for building retrieval indices on top of RDBMS. This thesis combines the concepts of IR-prototyping with data citation in order to enhance retrieval indices to achieve reproducibility. It addresses questions on how database schemes have to be shaped and if these models are efficient to deal with today¿s requirements on retrieval systems. The results hold promises for the future.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Informationssuche
de
dc.subject
Textsuche
de
dc.subject
Reproduzierbare Ergebnisliste von Dokumenten
de
dc.subject
Spaltenbasierte Datenbanken
de
dc.subject
Inverted Index
de
dc.subject
Retrieval Index Performance
de
dc.subject
MonetDB
de
dc.subject
Apache Lucene
de
dc.subject
Zitieren von Daten
de
dc.subject
BM25
de
dc.subject
Wikipedia
de
dc.subject
Dokument-Zerteilung
de
dc.subject
IR Prototyping
de
dc.subject
Information Retrieval
en
dc.subject
Text Retrieval
en
dc.subject
Reproducible Retrieval Ranked Lists
en
dc.subject
Column-Store Database
en
dc.subject
Inverted Index
en
dc.subject
Retrieval Index Performance
en
dc.subject
MonetDB
en
dc.subject
Apache Lucene
en
dc.subject
Data Citation
en
dc.subject
BM25
en
dc.subject
Wikipedia
en
dc.subject
Document Parsing
en
dc.subject
IR Prototyping
en
dc.title
Reproducible ranking lists for retrieval from evolving document collections : how column-store technology enhances the capability of inverted indicees
en
dc.title.alternative
Reproduzierbarkeit von Rankings in einem dynamischen Dokumentkorpus
de
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2017.24819
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Hannes Bösch
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E188 - Institut für Softwaretechnik und Interaktive Systeme