Autonom skalierendes, verteiltes Simulationsframework für TCAD-Anwendungen

Demel, Harald

doi:10.34726/hss.2016.23969

Record link:

https://doi.org/10.34726/hss.2016.23969
http://hdl.handle.net/20.500.12708/5907

Title:

Autonom skalierendes, verteiltes Simulationsframework für TCAD-Anwendungen

Citation:

Demel, H. (2016). Autonom skalierendes, verteiltes Simulationsframework für TCAD-Anwendungen [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2016.23969

reposiTUm DOI:

10.34726/hss.2016.23969

CatalogPlus:

AC13285288

Publication Type:

Thesis - Diplomarbeit

Language:

German

Authors:

Demel, Harald

Advisor:

Göschka, Karl Michael

Co-advisor:

Karner, Markus
Stanojević, Zlatan

Organisational Unit:

E184 - Institut für Informationssysteme

Date (published):

2016

Number of Pages:

Keywords:

Simulation; Wissenschafliche Berechnungen; Cloud Computing; GRID Computing

Simulation; Scientific Computing; Cloud Computing; GRID Computing

Abstract:

In Batch-Queuing-Systemen werden häufig Berechnungen durchgeführt, die aus einer großen Anzahl von ähnlichen, voneinander unabhängigen Teilaufgaben bestehen. Durch parallele Abarbeitung kann die Berechnungszeit reduziert werden, allerdings nur im Rahmen der zur Verfügung stehenden Hardware. Die Reduktion der Berechnungszeit ist besonders relevant, wenn auf Terminvorgaben hingearbeitet wird. Eine permanente Erhöhung der Rechenkapazität für Lastspitzen ist jedoch nicht effizient. Im Zuge dieser Arbeit wird ein Batch-Queuing-System um die Fähigkeit zur autonomen Skalierung erweitert indem jeweils eine Anbindung an ein Grid-Computing-System und eine IaaS-Cloud entwickelt wird. Der entstehende Performance-Overhead wird analysiert und mit anderen Arbeiten verglichen. Durch das stundenweise Abrechnungsmodell von IaaS-Clouds und die Hochlaufzeit der Rechenknoten ist es nicht kosteneffizient alle Teilaufgaben einer Berechnung parallel zu berechnen, wenn deren Laufzeit deutlich unter einer Stunde liegt. Eine Heuristik welche die Berechnungszeit minimiert und dabei die Kosten minimal hält wird entwickelt. Die Fähigkeit zur autonomen Skalierung kann außerdem in einem Hybridsystem eingesetzt werden. Dabei wird das Grid-Computing-System um die Fähigkeit erweitert bei unzureichenden Ressourcen in die IaaS-Cloud zu skalieren. Anhand von Berechnungen aus dem Einsatzgebiet Nano-TCAD werden Berechnungsdauer und Kosten bei Verwendung von manuell gestarteten Rechenknoten, Grid-Computing-System, IaaS-Cloud und Hybridsystem gegenübergestellt. Dabei zeigt sich, dass die Berechnungszeit durch die Berechnung in der IaaS-Cloud, trotz Verwendung der Heuristik zur Kostenminimierung, reduziert werden kann. Ein Hybridsystem kann schneller Ergebnisse liefern als die Nutzung von Grid-Computing-System oder Cloud-Computing alleine und verursacht dabei geringere Kosten als die ausschließliche Nutzung von Cloud-Ressourcen.

Batch queuing systems are frequently used for calculations consisting of a large number of similar, independent tasks. The total runtime can be reduced by parallel execution of tasks within the limits of the resources available. Runtime minimization is especially important when working towards deadlines, but it's not economic to increase the amount of resources for load peaks. In this work, a batch queuing system is extended by autonomous scaling capabilities by developing interfaces for grid-computing and a cloud-based system. The requirements for security and monitoring are analyzed and the performance overhead is examined and compared to related works. Due to the hourly billing model of most cloud providers and the boot time of the nodes executing all tasks in parallel is not cost efficient for task lengths significantly below one hour. An algorithm is needed which minimizes the time until job completion while maintaining minimum cost-overhead. For an ideal solution, a-priori knowledge about the runtime of each task is required. In practice, however, task runtime cannot be known beforehand. Several heuristics to handle this problem are developed and compared. The autonomous scaling capabilities can also be used in a hybrid setup where a grid computing system is extended by scaling out to the cloud when too little resources are available on the grid. For applications in the domain of nano-TCAD runtime and cost are compared for manually started nodes, grid-computing, a cloud-based and a hybrid system. These benchmarks show a significant reduction of the total runtime with the cloud-based system even when using the heuristic to minimize cost. The hybrid system delivers results faster than pure cloud or pure grid systems and costs less than the pure cloud system. These benefits only play out with small input files though, otherwise the file transmission time outweighs the benefits. This work demonstrates that an autonomously scaling batch queuing system can deliver results faster by short term use of additional resources and maintain minimal cost at the same time. The presented hybrid system can reduce total runtime and cost compared to a pure cloud system.

Additional information:

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
Zusammenfassung in englischer Sprache

License:

In Copyright

Appears in Collections:

Thesis