Behavior based malware analysis and detection

Kolbitsch, Clemens

Record link:

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-40527
http://hdl.handle.net/20.500.12708/9417

Title:

Behavior based malware analysis and detection

Citation:

Kolbitsch, C. (2011). Behavior based malware analysis and detection [Dissertation, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-40527

CatalogPlus:

AC07811160

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Kolbitsch, Clemens

Advisor:

Krügel, Christopher

Co-advisor:

Kirda, Engin

Organisational Unit:

E183 - Institut für Rechnergestützte Automation

Date (published):

2011

Number of Pages:

108

Keywords:

Schadsoftware; Programm Analyse; Botnet; Virenerkennung; Verhaltensmuster; Code Extrahierung

Malware; Program analysis; Botnet; Virus detection; Behavioral detection; Code extraction

Abstract:

Eine der größten Bedrohungen für Internetbenutzer stellt heutzutage unerwünschte Software, auch Schadsoftware oder Malware (aus dem Englischen malicious software, bösartige Programme) genannt, dar. Die zugrundeliegende Ursache einer Großzahl der alltäglichen Probleme, wiezum Beispiel unerwünschte Spam Emails oder Denial-of-Service Angriffe, bei denen die Erreichbarkeit von Internetseiten oder Servern beeinträchtigt wird, ist ebendiese Schadsoftware. Infizierte Computer schließen sich zu so genannten Botnetzen, einer Gruppe ferngesteuerter Systeme, zusammen, mit denen dessen Besitzer, der Botmaster, die Möglichkeit besitzt beliebige Ziele anzugreifen. Da sich diese Attacken über die letzten Jahre mehr und mehr gehäuft haben, beschäftigt sich eine Vielzahl an Forschungsgruppen mit dem Erkennen solcher Programme. Die traditionelle Herangehensweise von Anti-Viren Programmen zeigt hierbei jedoch die fundamentale Schwäche, dass sich deren Erkennungsmuster auf einzelne Instanzen beziehen und somit inhärent anfällig für Verschlüsselung oder Polymorphismus sind. Neuartige Systeme versuchen im Gegensatz dazu Sequenzen von Systemaufrufen zu erkennen. Dabei sind diese jedoch häufig ebenso leicht umgehbar wie ihre Vorgängersysteme, da Autoren von Schadsoftware die Abfolge dieser Aufrufe in ihren Programmen recht einfach abändern können. Um diese Nachteile zu vermeiden versuchen neuartige so genannte dynamische Anti-Viren Programme eine verhaltensorientierte Herangehensweise bei der Erkennung einzusetzen. Die Erkennungsrate solcher Systeme ist vielversprechend, allerdings erweisen sich diese Programme in der Praxis oft als zu langsam und benötigen zusätzlich komplizierte Virtualisierungstechnologien. In dieser Arbeit konzentrieren wir uns anfangs darauf, eine effektive und gleichzeitig effiziente Erkennungstechnologie zu entwickeln. Diese soll ohne spezielle Virtualisierungstechnologie funktionieren und in Zukunft traditionelle Anti-Viren Software vervollständigen. Dabei wird das Verhalten der Schadsoftware zuerst in einer kontrollierten Analyseumgebung präzise beobachtet um Datenflüsse zwischen individuellen Systemaufrufen zu identifizieren. Unser System extrahiert anschließend Slices aus diesen Datenflüssen, die spezifisches Verhalten der Schadsoftware darstellen. Auf Anwendersystemen kann ein speziell dafür entwickelter Scanner das Verhalten eines unbekannten Programmes mit in Slices gespeicherten Verhalten vergleichen und somit eine mögliche Infektion des Systems erkennen. Unsere Experimente zeigen, dass dieser Ansatz in der Praxis gute Ergebnisse liefert ohne dabei unnötig viel Rechenleistung zu beanspruchen. Eine weitere Hauptkomponente im Kampf gegen bösartige Software stellt die präzise Analyse dieser Programme dar. Nur indem Analysten das Verhalten solcher Systeme vollständig verstehen, können wirksame Gegenmaßnahmen getroffen werden. Heutzutage werden die wichtigsten Algorithmen, wie Beispielsweise verschlüsselte Updateverfahren oder das Erzeugen von DNS Domänen zum Auffinden von Command&Control Servern, oft manuell analysiert beziehungsweise rekonstruiert. Der zweite Teil dieser Arbeit beschäftigt sich damit solche Algorithmen automatisch aus Binärprogrammen zu extrahieren. Ein ausgewähltes Verhalten wird dabei in so genannte Gadgets extrahiert, die sämtliche dafür benötigten Programmteile enthalten und unabhängig vom ursprünglichen Programm ausgeführt werden können. Gadgets sind somit nützliche Bestandteile einer jeden Analyse von Schadsoftware in der Praxis, da durch deren Verwendung aufwändige manuelle Arbeitsschritte vermieden werden können. Experimente mit realer Schadsoftware zeigen, dass unser Ansatz vielseitige und praktische Anwendungsmöglichkeiten bietet. Sowohl unsere Technik zum Erkennen von Schadsoftware als auch das Extrahieren von Gadgets beruhen auf dynamischer Analyse. Die Vergangenheit hat jedoch gezeigt, dass auf neue Technologien häufig schnelle Reaktionen von Entwicklern dieser Schadsoftware erfolgen, weshalb bereits jetzt Methoden zur Umgehung der dynamischen Analyse in Malware Verwendung finden. Aus diesem Grund beschäftigt sich der abschließende Teil dieser Arbeit mit solchen Umgehungsmethoden. Ein Beispiel dafür ist stalling code, der typischerweise vor dem eigentlichen bösartigen Verhalten ausgeführt wird und dieses eine gewisse Zeit verzögert. Da die Analyse eines Programmes in der Regel nach kurzer Zeit abgebrochen wird, ermöglicht dieses Aufschieben, dass weder Erkennungsmuster noch Gadgets extrahiert werden können. Um dies zu verhindern stellt diese Arbeit einen neuartigen Ansatz zum Erkennen und Umgehen von stalling code dar indem blockierende Coderegionen ignoriert oder übersprungen werden. Unser Prototyp HASTEN erweitert das Analysesystem ANUBIS und zeigte praxistaugliche Ergebnisse in einem experimentellen Setup mit neuartiger Schadsoftware.

Malware is one of the most serious security threats on the Internet today. In fact, most Internet problems such as spam emails and denial of service attacks have malware as their underlying cause. That is, computers that are infected with malware are often networked together to form botnets, and many attacks are launched using these malicious, attacker-controlled networks. With the increasing significance of malware in Internet attacks, much research has concentrated on developing techniques to mitigate malicious code. Unfortunately, current host-based detection approaches (i.e., anti-virus software) suffer from ineffective detection models. These models concentrate on the features of a specific malware instance, and are often easily evadable by obfuscation or polymorphism. Also, detectors that check for the presence of a sequence of system calls exhibited by a malware instance can be evaded by system call reordering. In order to address the shortcomings of ineffective models, several dynamic detection approaches have been proposed that aim to identify the behavior exhibited by a malware family. Although promising, these approaches are unfortunately too slow to be used as real-time detectors on the end host, and they often require cumbersome virtual machine technology. In a first part of this thesis, we propose a novel malware detection approach that is both effective and efficient, and thus, can be used to replace or complement traditional anti-virus software at the end host. Our approach first analyzes a malware program in a controlled environment to build a model that characterizes its behavior. Such models describe the information flows between the system calls essential to the malware's mission, and therefore, cannot be easily evaded by simple obfuscation or polymorphic techniques. Then, we extract the program slices responsible for such information flows. For detection, we execute these slices to match our models against the runtime behavior of an unknown program. Our experiments show that our approach can effectively detect running malicious code on an end user's host with a small overhead. Another important component in the fight against malicious software is the analysis of malware samples: Only if an analyst understands the behavior of a given sample, she can design appropriate countermeasures. Manual approaches are frequently used to analyze certain key algorithms, such as downloading of encoded updates, or generating new DNS domains for command and control purposes. In a second part, we present a novel approach to automatically extract, from a given binary executable, the algorithm related to a certain activity of the sample. We isolate and extract these instructions and generate a so-called gadget, i.e., a stand-alone component that encapsulates a specific behavior. We make sure that a gadget can autonomously perform a specific task by including all relevant code and data into the gadget so that it can be executed in a self-contained fashion. Gadgets are useful entities in analyzing malicious software: In particular, they are valuable for practitioners, as understanding a certain activity that is embedded in a binary sample (e.g., the update function) is still largely a manual and complex task. Our evaluation with several real-world samples demonstrates that our approach is versatile and useful in practice. Both systems, our malware detection technique and HASTEN alike, heavily rely on dynamic analysis of a sample. However, the past has show that whenever an anti-malware solution becomes popular, malware authors promptly react and modify their programs to evade these defense mechanisms. For example, recently, malware authors have increasingly started to create malicious code that can evade dynamic analysis. Thus, in a last part, we concentrate on evasion techniques that target these analysis systems. One recent form of evasion is stalling code. Stalling code is typically executed before any malicious behavior. The attacker's aim is to delay the execution of such activity long enough so that an automated dynamic analysis system fails to extract the interesting behavior. This work presents the first approach to detect and mitigate malicious stalling code, and to ensure forward progress within the amount of time allocated for the analysis of a sample. We built a prototype implementation, HASTEN, for our dynamic analysis systems ANUBIS. Experimental results show that our system works well in practice, and that it is able to detect additional malicious behavior in real-world malware samples.

Additional information:

Zsfassung in dt. Sprache

License:

In Copyright

Appears in Collections:

Thesis