Bibliographic Metadata

Title
Distributed big data frameworks / by Moritz Becker
Additional Titles
Verteilte Big Data Frameworks: Eine Übersicht
AuthorBecker, Moritz
CensorPichler, Reinhard
PublishedWien, 2017
DescriptionIX, 143 Blätter : Illustrationen
Institutional NoteTechnische Universität Wien, Diplomarbeit, 2017
Annotation
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
LanguageEnglish
Document typeThesis (Diplom)
Keywords (DE)verteilte Verarbeitungsmethoden / Big Data
Keywords (EN)distributed processing methods / Big Data
URNurn:nbn:at:at-ubtuw:1-103714 Persistent Identifier (URN)
Restriction-Information
 The work is publicly available
Files
Distributed big data frameworks [1.49 mb]
Links
Reference
Classification
Abstract (English)

The ever increasing amount of data that the modern internet society produces poses challenges to corporations and information systems that need to store and process this data. In addition, novel trends like the internet of things even adumbrate a prospectively steeper increase of the data volume than in the past, thereby supporting the relevance of big data. In order to overcome the gap between storage capacity and data access speed while maintaining the economic feasibility of data processing, the industry has created frameworks that allow the horizontal scaling of data processing on large clusters of commodity hardware. The plethora of technologies that have since been developed makes the entrance to the field of big data processing increasingly hard. Therefore, this thesis identifies the major types of big data processing along with the programming models that have been designed to cover them. In addition, an introductory overview of the most important open source frameworks and technologies along with practical examples of how they can be used is given for each processing type. The thesis concludes by pointing out important extension projects to the presented base systems and by suggesting the conduction of a performance-centric comparison of Apache Spark and Apache Hadoop that can help to establish a more profound understanding of the nature of these systems and to identify potential novel research topics.

Stats
The PDF-Document has been downloaded 40 times.