Intelligent video annotation and retrieval techniques

Sorschag, Robert

Record link:

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-50967
http://hdl.handle.net/20.500.12708/12947

Title:

Intelligent video annotation and retrieval techniques

Citation:

Sorschag, R. (2012). Intelligent video annotation and retrieval techniques [Dissertation, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-50967

CatalogPlus:

AC07814297

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Sorschag, Robert

Advisor:

Eidenberger, Horst

Co-advisor:

Scherp, Ansgar

Organisational Unit:

E188 - Institut für Softwaretechnik und Interaktive Systeme

Date (published):

2012

Number of Pages:

146

Keywords:

Videoanalyse; Objekterkennung; Visuelle Bildeigenschaften; Automatische Beschlagwortung; Videosuchmaschinen

content-based video analysis; object recognition; visual features; automatic annotation; video search engines

Abstract:

Videos sind ein wesentlicher Bestandteil moderner Informationssysteme und des Webs. Seit der Einführung der ersten Videoportale Mitte des letzten Jahrzehnts gibt es ein stetiges Wachstum der verfügbaren Videos und damit einhergehend die Notwendigkeit effizienterer Videosuche. Aktuelle Suchsysteme arbeiten hauptsächlich auf manuell erzeugten Metadaten, die den Nachteil haben, dass sie Videoinhalte oft nur grob und ungenau beschreiben. eswegen sollen Videoannotationssysteme, die auf inhaltsbasierte Analyse setzen, Abhilfe schaffen und die Videosuche auf ein ähnliches Niveau bringen, wie man es heute von der Online-Suche nach Textdokumenten und Webseiten gewohnt ist. Die vorliegende Dissertation beschäftigt sich mit der Verwendung automatischer Objekterkennung für die Annotation von Personen, Objekten und Orten. Nach der Beschlagwortung können Videoszenen dieser Objekte mit Google-ähnlichen Suchanfragen gefunden werden. Durch eine ausgeklügelte Präsentation der gefundenen Videoszenen wird die Relevanz einzelner Suchresultate sofort sichtbar. Die vorgestellten Annoationstechniken basieren auf einem neuen Objekterkennungs-Framework, das in verschiedenste Videoumgebungen eingebunden werden kann und Objekterkennung mit einer flexiblen Verwendung von visuellen Features, Vergleichsalgorithmen und Techniken des maschinellen Lernens ermöglicht. Neue Methoden können mit geringem Entwicklungsaufwand in dieses Framework integriert werden. Dies erlaubt eine schnelle Verwendung neuer Entwicklungen und kann deswegen speziell für zukünftige Forschungen einen wichtigen Beitrag leisten. Desweiteren bietet das Framework eine automatische Konfigurationsauswahl die es möglich macht verschiedene Algorithmen für die Annotation von verschiedenen Objekten zu verwenden. Die Unterstützung verteilter Computersysteme und das kompakte Speichern der erzeugten Daten gewährleisten außerdem hohe Effizienz. Im Laufe des Projektes wurde mit den vorgestellten Techniken ein Videoannotations-Prototyp Entwickelt um eine umfassende Fallstudie durchzuführen.Dieser Prototyp ist ebenso wie die verwendeten Videodaten und einige der resultierenden Publikationen öffentlich verfügbar. Weitere wissenschaftliche Beiträge der Dissertation behandeln bewegungsbasierte und segmentations-basierte Features, welche für spezielle Einsatzgebiete wie die automatische Actionszenenerkennung geeignet sind. Zwischen 2010 und 2012 haben wir darüberhinaus bei TRECVID, dem größten internationalen Wettbewerb für inhaltsbasierte Videosuche, teilgenommen und dabei vielversprechende Ergebnisse erzielt.

Videos are an integral part of current information technologies and the web. The demand for efficient retrieval rises with the increasing number of videos, and thus better annotation tools are needed as today's retrieval systems mainly rely on manually generated metadata. The situation is even more critical when it comes to user-generated videos where rough and inaccurate annotations are the common practice. Attempts to employ content-based analysis for video annotation and retrieval already exist, but they are still in an infant stage compared to the retrieval of web documents. In this work, we address the use of object recognition techniques to annotate what is shown where in videos. These annotations are suitable to retrieve specific video scenes for object related text queries, thought the manual generation of such metadata would be impractical and expensive. A sophisticated presentation of the retrieval results is further exploited that indicates the relevance of the retrieved scenes at a first glance. The presented semi-automatic annotation approach can be used in an easy and comfortable way, and it builds on a novel framework with following outstanding features. First, it can be easily integrated into existing video environments. Second, it is not based on a fixed analysis chain but on an extensive recognition infrastructure that can be used with all kinds of visual features, matching and machine learning techniques. New recognition approaches can be integrated into this infrastructure with low development costs and a configuration of the used recognition approaches can be performed even on a running system. Thus, this framework might also benefit from future advances in computer vision. Third, we present an automatic selection approach to support the use of different recognition strategies for the annotation of different objects. Moreover, visual analysis can be performed efficiently on distributed, multi-processor environments and the resulting video annotations and low-level features can be stored in a compact form. We demonstrate the proposed annotation approach in an extensive case study with promising results. A video object annotation prototype as well as the generated scene classification ground-truth are freely available to foster reproducible research. Additional contributions of this work consider the generation of motion-based and segmentation-based features and their use for specific annotation tasks, such as the detection of action scenes in professional and user-generated video. Furthermore, we participated at the two tasks instance search and semantic indexing of the TRECVID challenge in the three consecutive years 2010, 2011, and 2012.

Additional information:

Zsfassung in dt. Sprache

License:

In Copyright

Appears in Collections:

Thesis