Bibliographic Metadata

Transformation and interpolation of language varieties for speech synthesis / by Dipl.-Ing. Markus Toman, BSc
Additional Titles
Akustische Modellierung, Transformation und Interpolation von Sprachvarietäten für Sprachsynthese
AuthorToman, Markus
CensorRauber, Andreas
PublishedWien, 11.1.2016
Descriptionx, 124 Seiten : Illustrationen, Diagramme
Institutional NoteTechnische Universität Wien, Dissertation, 2016
Zusammenfassung in deutscher Sprache
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
Bibl. ReferenceOeBB
Document typeDissertation (PhD)
Keywords (EN)Speech Processing / Speech Synthesis / Hidden Markov Model / Language Varieties / Dialects / Voice Conversion
URNurn:nbn:at:at-ubtuw:1-1503 Persistent Identifier (URN)
 The work is publicly available
Transformation and interpolation of language varieties for speech synthesis [9.38 mb]
Abstract (English)

This thesis aims to advance the field of speech synthesis by investigating and developing new concepts for acoustic modeling, transformation and interpolation of language varieties (i.e. dialects, sociolects, foreign accents). The goal is to enable systems with speech output to adapt to individual needs and preferences of their users. Transformation of language varieties aims to convert a voice model from one variety to a model in another variety while retaining the voice characteristics. Between multiple voice models of different varieties, interpolation allows to generate intermediate varieties. Both approaches are used to widen the range of speaking styles available to speech output systems. Further, two specific applications are investigated in this thesis: foreign accent reduction and the generation of intelligible fast speech for visually impaired users. All presented methods are evaluated through listening tests and objective measures where appropriate. To conduct these experiments, phone sets and recording scripts for three Austrian German dialects have been created and speech corpora from selected native dialect speakers have been recorded in studio quality. We present a method for unsupervised dialect interpolation and show that listeners are able to correctly perceive the changes in degree of dialect for different settings of the interpolation parameter. We show that transformation of dialects while retaining the original speaker characteristics is possible with the methods presented here. We also compare different approaches for generation of fast synthetic speech. Our experiments show that linearly compressed, natural speech signals are more intelligible than naturally produced fast speech produced by our professional speakers. Overall, this thesis shows how adaptive modeling can be applied to control and modify the language variety of a speech synthesis system.

The PDF-Document has been downloaded 48 times.