Bibliographic Metadata

Ausfallsicherheitsmechanismen in Datenstromverarbeitungssystemen / von Bernhard Knasmüller
Additional Titles
On fault tolerance in stream processing systems
AuthorKnasmüller, Bernhard
CensorHochreiner, Christoph ; Schulte, Stefan
PublishedWien, 2018
Descriptionxiii, 120 Seiten
Institutional NoteTechnische Universität Wien, Diplomarbeit, 2018
Zusammenfassung in deutscher Sprache
Text in englischer Sprache
Document typeThesis (Diplom)
Keywords (DE)Verteilte Systeme / Datenstromverarbeitung / Fehlertoleranz / Circuit Breaker / DSPE / Operator / Funktionale Redundanz
Keywords (EN)Distributed Systems / Stream Processing / Fault Tolerance / Circuit Breaker / DSPE / Operator / Functional Redundancy
URNurn:nbn:at:at-ubtuw:1-110172 Persistent Identifier (URN)
 The work is publicly available
Ausfallsicherheitsmechanismen in Datenstromverarbeitungssystemen [2.73 mb]
Abstract (English)

Stream processing is a practice where continuous data streams are processed and aggregated in near real-time, ultimately resulting in the discovery of new information. Stream processing applications (SPAs) are used to analyse data streams and are often deployed in a distributed manner for performance reasons. When faced with partial failures or network communication outages, fault tolerance mechanisms must ensure a continuous operation. Due to the near-real-time requirements, these mechanisms have to balance the need for consistency (i.e., producing correct results) and availability (i.e., producing results fast enough) in case of failures since fullling both at the same time is impossible. The key concept of fault tolerance is redundancy. Existing fault tolerance approaches for SPAs implement redundancy by replicating operators, the building blocks of an SPA. We argue that this approach is not sufcient and present a novel fault tolerance model which focuses on functional redundancy on the level of paths (sequences of operators). Based on a concrete motivational scenario, we identify requirements of Pathnder, our new fault tolerance framework, and evaluate it based on our motivational scenario. Pathnder addresses the shortcomings of existing approaches by allowing SPA developers to specify functional redundancy. At runtime, Pathnder reacts to faults by switching to a fault-free path with a similar functionality. To restore the main path once the failed operator has recovered, Pathnder uses the circuit breaker pattern which has been proven in the domain of microservices. By comparing our approach to a fully redundant replication, we show that 30% of total operational costs can be saved while achieving a similar level of availability. Finally, several experiments show that Pathnders failure detection and fault tolerance mechanisms are working as expected and only add a minimal performance overhead.

The PDF-Document has been downloaded 27 times.