Bibliographic Metadata

Hybrid simulation models for data-intensive systems / by Martin-Stefan Barisits
AuthorBarisits, Martin-Stefan
CensorKühn, Eva
PublishedWien, 2017
Description212 Seiten : Illustrationen, Diagramme
Institutional NoteTechnische Universität Wien, Dissertation, 2017
Zusammenfassung in deutscher Sprache
Bibl. ReferenceOeBB
Document typeDissertation (PhD)
Keywords (EN)Data-intensive systems / Simulation / System modelling / Machine learning
Keywords (GND)CERN / ATLAS <Teilchendetektor> / Datenverarbeitung / Hybrides System / Modellierung / Simulation
URNurn:nbn:at:at-ubtuw:1-97036 Persistent Identifier (URN)
 The work is publicly available
Hybrid simulation models for data-intensive systems [7.67 mb]
Abstract (English)

Data-intensive systems are used to access and store massive amounts of data by combining the storage resources of multiple data-centers, usually deployed all over the world, in one system. This enables users to utilize these massive storage capabilities in a simple and effcient way. However, with the growth of these systems it becomes a hard problem to estimate the effects of modifcations to the system, such as data placement algorithms or hardware upgrades, and to validate these changes for potential side effects. This thesis addresses the modeling of operational data-intensive systems and presents a novel simulation model which estimates the performance of system op- erations. The running example used throughout this thesis is the data-intensive system Rucio, which is used as the data management system of the ATLAS experiment at CERN's Large Hadron Collider. ^Existing system models in literature are not applicable to data-intensive work ows, as they only consider computational work ows or make assumptions which do not hold for operational systems. A hybrid modeling approach is pro- posed which addresses the limits of these models. It partitions the system into discrete components, creates models for these components, and combines them into one concise system model. However, each component model is only built on observed data metrics, such as system traces. The identifcation of which system components to model and which ones to omit is based on a quantitative system analysis of the Rucio data-intensive system. The storage, network, data integrity validation, and services components were identifed. An existing model from literature was utilized for the network component. For the other compon- ents models based on machine learning techniques are created and evaluated against historic workloads from the running example. ^The component models are unifed in an event simulator and evaluated agains historic workloads from the Rucio data-intensive system. The median relative evaluation error of the hybrid system model is demonstrated with 22%.

The PDF-Document has been downloaded 27 times.