Bridging the Introspection Semantic Gap

Virtual Machine Introspection (VMI) is a powerful form of Intrusion Detection Systems (IDS – virus/malware detection). Standard IDSs take two forms, in-host and down-stream traffic interpretation. In-host IDS have direct access to interpret the OS state and directly identify intrusions, but this leaves the IDS open to direct attack by the intruding virus/malware.

Down stream solutions are not susceptible to being directly attacked by malware attacking the system being observed, but also lack direct access to OS state data making it more difficult to detect intrusions.

VMI leverages Virtual Machine architecture to create a VMI based IDS which has access to OS state, without being located in-host, making VMI IDSs capable of direct intrusion detection without being susceptible to direct attack by the malware attacking the monitored host.

Abstract

In Virtual Machine Intrusion Detection(VMID), Introspection Intrusion Detection Systems(IIDS) provide the potential for maintaining both high visibility, and isolation from attacks on the system being monitored.  This is accomplished through the use of hypervisors to view low level VM virtual machine state.  However, this low level view of the system being monitored does not provide direct high level semantic knowledge of the guest-OS.  The gap between the low level guest-OS machine state and high level semantic knowledge is known as Introspection Semantic Gap(ISG).

Since VMI’s conceptions in 2003[1], many methods have been proposed to bridge the ISG both efficiently and automatically.  In the first section of this survey I will explain the basics of VMI, and the specific challenges of ISG.  In section two I will provide a formal description of the basic ways in which high level semantic knowledge is obtained.  In the third section I will describe a number of IIDSs.  In the fourth section I will compare the IIDSs to each other, giving some projections on the fields future, and finish the survey off with a brief conclusion on the various methods in the fifth section.

1. Introduction

In computer security, intrusion detection refers to identifying when potentially malicious action are being performed by any part of a computer system.  In traditional intrusion detection systems(IDS) the IDS is located inside of the computer system being monitored, or in some cases data packet monitoring and filtering is done at a router upstream from the system being monitored.

Both of these approaches, and VMI, are motivated similarly; to monitor and analyze changes in the guest-OS during deployment[2].  However, both of these approaches have significant flaws that VMI addresses.  When located within the monitored system(in-OS) the IDS is vulnerable to direct attacks from the malware attacking the system.  When located upstream in the network(network monitor), the IDS has poor visibility of the internal state of the monitored system.  By locating the IDS outside the system, and using the hypervisor to monitor system state, VMI maintains the isolation of network monitors, and the visibility of in-OS IDSs[1].

However, being located outside of the monitored OS does require that VMI have some way of monitoring a system while it runs.  This problem is classically addressed in Forensic Machine Analysis(FMA)[11], which attempts to determine high level semantic state from machine level hardware state.  To use this approach, an IIDS must be capable of accessing the low level virtual hardware state of a guest-OS.  This is possible through the use of the hypervisor, which is able to view this state while the guest-OS is running[1].

Which brings us to the problem of ISG.  Because the hypervisor only has access to machine level state information, and because of differences in the hardware being emulated and a guest-OSs kernel between different guest-OSs, any IIDS must be capable of bridging the gap between the machine level state data, and the high level semantic state[3].  There are many different potential solutions to this problem, but to understand them we must first understand the semantic knowledge that is required.

2. Obtaining Semantic Knowledge

Bridging the ISG requires being able to obtain high level semantic knowledge for a monitored guest-OS.  In VMI, semantic knowledge is used to generate ‘views’ of the guest-OS high level state.  This view generation is possible through semantic knowledge of what certain bits at the machine state level mean.  There are however, different types of semantic knowledge, which can be classified into three different primary patterns[2].

  • Out-of-Band Delivery(OBD)
  • Derivation
  • In-Band Delivery(IBD)

OBD patterns refers to when the method for generating views uses pre-determined guest software architecture specific semantic knowledge before VMI begins[2].  For example, this includes implementations such as compiling the guest-OS with debugging symbols[1].  This pattern can be described as making use of a priori knowledge of the guest-OS and software architecture to generate views.  However, this requires that the IIDS have guest-OS software specific a priori knowledge to work, which can require significant work to develop for each new OS[3].

Derivation patterns refers to when the method for generating views uses pre-determined guest hardware architecture specific semantic knowledge.  This includes approaches such as monitoring virtual CPU control registers.  This pattern can be described as deriving views from hardware architecture semantic knowledge only[2].

IBD patterns are different from the other two, in that they don’t actually bridge the semantic gap.  IBD patterns generate views inside the guest-OS and delivers the view to the IIDS through the hypervisor.  This method of view generation makes use of the guest-OSs inherent semantic knowledge to avoid the gap.  This method does have significant disadvantages, in that the in-Guest component is susceptible to attack.  However, when combined with the other methods it can become valuable in bridging the semantic gap automatically.

It is also important to understand that view generation is actually only the first step of VMI intrusion detection.  Once a view is generated, common practice is to use standard in-OS IDS methods, implemented as policy engines that use the generated view.  While the specific policy engines used by the implementations I will discuss are outside the scope of this survey, it is never the less still important to understand this important distinction.  Many modern implementations of these policy engines are actually FMA tools[4].

3. Implementations

3.1 Livewire

It would be difficult to survey ISG solutions, without discussing the original work on VMI.  Livewire[1] was the original implemented prototype that proved VMI possible.  Livewire solved the IGS problem through the use of a Linux crash debugging tool to generate semantic views[1].  Their approach was purely OBD in terms of semantic knowledge.  This approach did require the authors to slightly modify crash, and the specific version of Linux used had to be recompiled with debugging symbols[1].

The strength of this approach was that it worked very effectively, and only required a small amount of effort on the authors part.  However, tools like crash typically only exist for open-source OSs, and the OS had to be recompiled with debugging symbols.  This translates to significant manual effort for every OS to bridge the ISG for open-source OSs, and reverse engineering and development of crash like tools for closed-source OSs[4].

3.2 Virtuoso

Virtuoso[5] is a tool designed to automatically develop tools that bridge the ISG with far less manual effort.  This is accomplished by analyzing dynamic traces of in-guest programs they are able to gain semantic knowledge in the form of x86 instruction sets.  This approach is a form of automatic gap bridging, achieved by running small pieces of code used by in-OS IDSs inside the OS in a safe environment.  Virtuoso then observes the execution at the hardware state level and builds a tool that performs the same function[5].  The tool can then be run in using FMA[4] tools such as Volatility[5].

This approach is similar to a combination of OBD and IBD, in that it requires a report of the activities inside the guest-OS to build OBD semantic knowledge.  It’s strength is that it requires fairly minimal repeated effort per OS, and they successfully generated tools for both open and closed-source OSs relatively automatically.  It did however, require some manual effort, and it’s performance was not fast enough for online monitoring[5].

3.3 VM-Space Traveler(VMST)

VMST achieves automatic ISG bridging by redirecting programs in a guest-OS to consume data from a different guest-OS, while observing the machine state similar to in Virtuoso.  Essentially, this automatically generates IIDS tools by redirecting a programs data accesses to a predefined “trusted” set of data.  It then runs every program in the OS automatically one at a time, including kernel level system calls, to automatically generate any high level semantic tool either preexisting, or written by the user[6].

This method is primarily OBD and IBD, in the same way as Virtuoso.  It’s strength is that it automatically produces ISG bridging VMI tools for any given OS.  It’s performance even begins to approach the efficiency of in-OS tools, and only 9.3 time increased IDS overhead[6].  A drawback to this system is that it’s not able to achieve redirection for all code, some kernel level code should not be redirected, and so they must identify when redirection should occur(dynamic taint tracker).  This means that the automatic tool creation incurs significant overhead.  Their system also requires a trusted version of the guest-OS to redirect to during tool generation, and was only tested in an offline mode[6].

3.4 Hybrid-Bridge

Hybrid-Bridge[7] builds on the work of VMST[6] and Virtuoso[5], recognizing their shortcomings and addressing them.  Specifically, Virtuosos’s slowdown estimated at 140 time[7], and VMST’s taint tracker being implemented on an emulator contributing to significant slowdown of hundreds of times.   Hybrid-Bridge makes use of Virtuoso to develop and memoize training data into meta-data for use as VMST’s cached online taint analysis tool.

This is again something of a combination of ODB and IDB semantic knowledge.  The real strength of this method is that it is fully automated, can be run for any OS, and has begun to address slowdown concerns due to emulation and on-demand tool generation overhead which were mostly glossed over in previous methods.  The drawback to Hybrid-Bridge is that when the metadata is incomplete, it must fall back on the previous Virtuoso method at run time.  On average it’s performance was about 10 times slowdown compared to a standard in-OS tool when it had the needed metadata[7].

3.5 min-c

Min-c is another IIDS that considered the contributions of FMA[4], towards IGS bridging[8].  In their own research of VMI and FMA techniques they determined that IGS bridging techniques were too specific and required too much manual effort, and their implementation of min-c sought to be general and automated.  They accomplished this by making their semantic interpreter OS independent.  From their they map sections of the guest-OS onto the host’s memory space, and use a c parser to build byte-code, and semantic interpreters to determine object type and memory layout information.  By then using the c parser, users can write scripts in c to perform introspection tasks[8].

This approach is primarily Derivation based, but uses a small amount of OBD to determine general OS information at runtime to allow for derivation base semantic interpretation.  The biggest strength of this approach is that it is very general, requiring only the ability to identify if a running OS is Windows, Linux, etc…  It also allows for the use of the C language to write generic introspection tools as scripts.  It’s biggest weakness is that many policy engine tools have not been written for it at the time the article was written, so its real efficiency cannot be tested.

3.6 InSight

InSight[9, 10] is something of a special case, in VMI semantic gap bridging, in that its goal was to provide a complete view of all kernel objects.  Their core concept was to interpret the guest-OS kernel state the same way the guest-OS does.  Their method is similar to LiveWire in that it makes use of debugging symbols, but also parses kernel source code for data type inference[9].  It’s able to determine used as relationships through static code analysis[10].  Objects are then delivered to the user defined sections of code as lists.

This approach is a combination of OBD and Derivation, making use of OS specific knowledge and and virtual hardware specific knowledge.  The strength of InSight is that it provides knowledge of all kernel objects, not just addresses being monitored for specific action allowing for a more general approach to security concerns.  It is therefore a potentially powerful FMA tool as well[9].  It’s biggest weakness however, is that it is extremely slow, and does not implement very many tools that approach specific problems.  Their tool also only works for Linux kernel systems with open source code.

4. Comparison and Projections

Among the various implementations it’s important to consider the various strengths, and purposes of each.  While it is true that a small number of implementations are beginning to approach being fast enough for live use, none have actually achieved this goal yet.  So I would actually propose two sets of criteria to be evaluate for two separate purposes, VMI application and FMA application.  As previously mentioned, FMA is closely related to VMI, both having to solve the ISG problem, and FMA is a form of introspection.  It should also be noted that VMI and FMA solutions are frequently being use in place of the other[4].

Both purposes, however, have different end requirements.  In VMI, semantic gap bridging efficiency is the most important criteria, but the level of manual effort required is also a strong indicator of value.  In FMA, efficiency is less important because introspection is usually performed on non-live distributions.  However, the completeness of the view, and the amount of manual effort required are very important.

In the case of the VMI criteria it has to be said that Hybrid-Bridge is definitely the fastest modern implementation, with full automation requiring virtually no manual effort on the users part.  Obviously, it has the benefit of being built on top of both Virtuoso and VMST, and thus benefits from both their strengths.  However, the act of caching training data, as memoized meta-data, to determine redirect behavior was novel to both approaches and the main source of the increase in efficiency.

As the FMA criteria however, Hybrid-Bridge is less useful than its constituent parts.  Virtuoso’s ability to map user written tools make it possible to observe any object interactions automatically.  It’s view, given sufficient programming by the user, actually exceeds the scope of InSight by being able to observe user program objects in addition to kernel level objects.  Min-c also more fairly falls into the FMA use category is that it can map any desired address space.  However, its scope is limited to code that can be interpreted in terms of C code.

Using the given criteria for the two categories, I would have to say that Hybrid-Bridge is the fastest surveyed VMI.  While Virtuoso is meant more as a VMI, it achieves the criteria of FMA best among the surveyed implementations.  Both implementations also only require minimal manual effort.  This comparison also helps with projecting future trends in ISG bridging techniques.

In the future of VMI bridging techniques, based upon the best solutions surveyed, I would project that VMI IDSs will use more generalized approaches to gaining semantic knowledge similar to Virtuoso’s custom tool generation and Hybrid-Bridge’s caching the results for future reuse.  Also, based upon the issues that VMST had with emulation, we may see an increase in efficient scripting used to run introspection solutions, rather than the emulation present in commonly use FMA tools.

For FMA bridging solutions, I would project a higher emphasis on efficiency and the use of IBD patterns on trusted OS copies to obtain semantic knowledge.  As shown in the case of Virtuoso, making using of some IBD in a trusted environment makes it possible to quickly develop tools that can introspect nearly any part of memory to obtain high level semantics.  As with VMI, I also think FMA tools will move away from emulation solutions, and towards more efficient scripting tools.

5. Conclusions

In ISG bridging, there are many concerns from efficiency to the completeness of the generated view.  There are many implementations attempting to solve these concerns, many building off previous ideas occurring recently.  For the purpose of efficient VMI tool creation and use, Hybrid-Bridge has made some incredible advancements.  In the case of being able to develop more custom tools, Virtuoso has provided a framework with very little manual effort required.  The field of ISG bridging still has a ways to go for both purposes, but we do appear to be approaching a time when VMI will become viable in online applications.


References

[1] Garfinkel, T., & Rosenblum, M. (2003, February 1). Http://suif.stanford.edu/papers/vmi-ndss03.pdf. Retrieved April 20, 2015.
[2] Pfoh, J., Schneider, C., & Eckert, C. (2009, November 9). A formal model for virtual machine introspection. VMSec -09, 1-10.
[3] More, A., & Tapaswi, S. (2014). Virtual machine introspection: Towards bridging the semantic gap. Journal of Cloud Computing, 3(16). Retrieved April 20, 2015, from http://link.springer.com/article/10.1186/s13677-014-0016-2#
[4] B Dolan-Gavitt, BD Payne, W Lee. Leveraging Forensic Tools for Virtual Machine Introspection. Technical Report. Georgia Institute of Technology, GT-CS-11-05, 2011
[5] Brendan Dolan-Gavitt, Tim Leek, Michael Zhivich, Jonathon Giffin, and Wenke Lee. 2011. Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection. In Proceedings of the 2011 IEEE Symposium on Security and Privacy (SP '11). IEEE Computer Society, Washington, DC, USA, 297-312. DOI=10.1109/SP.2011.11 http://dx.doi.org/10.1109/SP.2011.11
[6] Yangchun Fu; Zhiqiang Lin, "Space Traveling across VM: Automatically Bridging the Semantic Gap in Virtual Machine Introspection via Online Kernel Data Redirection," Security and Privacy (SP), 2012 IEEE Symposium on , vol., no., pp.586,600, 20-23 May 2012 doi: 10.1109/SP.2012.40
[7] Saberi, A., Fu, Y., & Lin, Z. (2014). HYBRID-BRIDGE: Efficiently Bridging the Semantic Gap in Virtual Machine Introspection via Decoupled Execution and Training Memoization. NDSS Symposium 2014. Retrieved April 19, 2015, from http://www.internetsociety.org/doc/hybrid-bridge-efficiently-bridging-semantic-gap-virtual-machine-introspection-decoupled
[8] Inoue, H., Adelstein, F., Donovan, M., & Brueckner, S. (2011, June 7). Automatically Bridging the Semantic Gap using C Interpreter. Symposium on Information Assurance, 51-58.
[9] Schneider, C., Pfoh, J., & Eckert, C. (2011). A Universal Semantic Bridge for Virtual Machine Introspection. Conference: Information Systems Security - 7th International Conference. Retrieved April 18, 2015, from http://www.researchgate.net/publication/221160737_A_Universal_Semantic_Bridge_for_Virtual_Machine_Introspection
[10] Schneider, C., Pfoh, J., & Eckert, C. (n.d.). Bridging the Semantic Gap Through Static Code Analysis. Proceedings of EuroSec'12, 5th European Workshop on System Security. Retrieved April 20, 2015, from https://www.sec.in.tum.de/assets/staff/schneider/eurosec_schneider2012.pdf