Hybrid simulation models for data-intensive systems / von Martin-Stefan Barisits
Weitere Titel
Hybrid simulation models for data-intensive systems
VerfasserBarisits, Martin-Stefan
Begutachter / BegutachterinKühn, Eva
ErschienenWien, 2017
Umfang212 Seiten
HochschulschriftTechnische Universität Wien, Dissertation, 2017
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprueft
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
Schlagwörter (EN)Data-intensive systems / Simulation / System modelling / Machine learning
URNurn:nbn:at:at-ubtuw:1-97036 Persistent Identifier (URN)
 Das Werk ist frei verfügbar
Hybrid simulation models for data-intensive systems [7.67 mb]
Zusammenfassung (Englisch)

Data-intensive systems are used to access and store massive amounts of data by combining the storage resources of multiple data-centers, usually deployed all over the world, in one system. This enables users to utilize these massive storage capabilities in a simple and effcient way. However, with the growth of these systems it becomes a hard problem to estimate the effects of modifcations to the system, such as data placement algorithms or hardware upgrades, and to validate these changes for potential side effects. This thesis addresses the modeling of operational data-intensive systems and presents a novel simulation model which estimates the performance of system op- erations. The running example used throughout this thesis is the data-intensive system Rucio, which is used as the data management system of the ATLAS experiment at CERN's Large Hadron Collider. Existing system models in literature are not applicable to data-intensive work ows, as they only consider computational work ows or make assumptions which do not hold for operational systems. A hybrid modeling approach is pro- posed which addresses the limits of these models. It partitions the system into discrete components, creates models for these components, and combines them into one concise system model. However, each component model is only built on observed data metrics, such as system traces. The identifcation of which system components to model and which ones to omit is based on a quantitative system analysis of the Rucio data-intensive system. The storage, network, data integrity validation, and services components were identifed. An existing model from literature was utilized for the network component. For the other compon- ents models based on machine learning techniques are created and evaluated against historic workloads from the running example. The component models are unifed in an event simulator and evaluated agains historic workloads from the Rucio data-intensive system. The median relative evaluation error of the hybrid system model is demonstrated with 22%.