Data Interception through Broken Concurrency in Kernel Land Julian L. Rrushi Centre for Cybersecurity British Columbia Institute of Technology 3700 Willingdon Avenue, Burnaby, BC V5G 3H2, Canada [email protected]

Abstract—We present a kernel data interception technique that is undetectable by existing approaches to malware detection, and propose practical methods to detect it. The technique is based on breaking concurrency in a way that enables the attack code to take over the synchronization established by target kernel modules. That level of control allows the attack code to interpose between those modules, and thus intercept sensitive data. We illustrate the overall technique as applied to intercepting keystrokes from a computer keyboard on Windows 7, and demonstrate it in practice through an attack kernel driver that we dubbed kbdinterceptor. The technique has no reliance on function hooking, machine code replacement, direct access to I/O bus, or attachment to any device driver stack whatsoever. In the paper, we capture the salient characteristics of the attack technique to devise a defensive approach that can accurately detect the corresponding attack code through dynamic analysis. Keywords-Keylogger, concurrency, Windows kernel

I. I NTRODUCTION Synchronization objects such as a semaphore, mutex, or condition variable regulate concurrency between multiple threads or processes with respect to shared resources. By definition, a synchronization object requires the concurrent threads or processes to follow a protocol of behavior with regards to taking and releasing ownership of a shared resource. We have discovered that breaking that protocol, namely breaking the concurrency model, can be leveraged by attack code in kernel land to intercept sensitive data. In this paper we present our applied research on attack and defense that involve broken concurrency, and thus reason at the machine code level as represented by the Assembly language. We convey to the reader the main ideas behind an exploitation of broken concurrency for data interception by means of kbdinterceptor, which is a proof of concept that we developed to demonstrate and illustrate the attack. kbdinterceptor breaks concurrency by acting on a condition variable of the Windows kernel. kbdinterceptor then builds on that broken concurrency such as to intercept keystrokes from a keyboard attached to the target system. While the focus is mostly put on the attack technique, our goal as researchers is to draw lessons from this work such as to engineer a better defense from future malware that breaks and exploits concurrency. The testbed that we configured

and used for this research consisted of a 32-bit Windows 7 image running on an Intel processor on VMware Workstation. We developed kbdinterceptor code in Visual Studio 2013, version 12.0.21005.1. For testing and code analysis purposes, including detection studies on kbdinterceptor, we used the IDA Pro tool, version 6.4.130821. We ran IDA Pro on the physical machine, and thus attached its companion debugger, namely WINDBG, to the kernel of the Windows 7 image through a serial named pipe. The remaining of this paper is organized as follows. In Section II we describe the general form of the kernel data interception attack that is based on breaking of the concurrency in kernel space. In Section III we explore the low-level technical characteristics of an applied form of the attack, namely we dive into the inner workings of kbdinterceptor. In Section IV we propose two approaches to detecting a concurrency breaker such as kbdinterceptor. In Section V we discuss related research on data interception attacks and defenses, and provide a comparison between those works and the research that we discuss in this paper. In Section VI we summarize our contribution, and conclude the paper. II. C ONCURRENCY B REAKING In its general form, the attack seeks a synchronization object that will put a target thread in a waiting state. The stalling of the target thread provides a window of opportunity for the attack code to win the time race and thus access the target resource while it still has the data. The rules of inter-thread synchronization require that a thread complies with the semantics of a synchronization object that regulates access to a shared resource. The threads that compete for access to a shared resource are expected to be well-behaved. The concurrent threads check on the synchronization object periodically to see whether they are allowed to proceed with accessing the shared resource. If the synchronization object suggests otherwise, by design the concurrent threads assume that the shared resource is busy and thus simply wait until the next check. The attack code screens a synchronization object of interest to detect resource availability. Once the synchronization object indicates so, the attack code quickly takes ownership of that synchronization object,

thereby stalling any threads that were supposed to be taking the data from the corresponding shared resource. After winning the time race against concurrent threads in the kernel, the attack code is able to grab the data from, or through, the shared resource. The interaction with the shared resource, however, in most cases requires following a certain sequence of rules. Based on our experience with the research that we discuss in this paper, a lack of compliance with those rules leads to the creation of all kinds of errors in the kernel, which of course by definition bring the whole system to a halt. The most reliable way of conducting this attack was for the attack code to mimic the target thread in terms of functionality. Clearly the attack code is not required to reuse code from the target thread as that provides a signature that is usable for the purpose of detection. The attack code may use polymorphism or metamorphism to change its underlying instructions while behaving similarly to the target thread. Alternatively, the attack code may implement the requested functionality of the target threat in its own proprietary way. After extracting the data from the shared resource, the attack code releases the synchronization object to the concurrent threads. In some cases, the data will no longer reside on the shared resource, and hence the need to break concurrency in order to get to the data first. Nevertheless, most of the concurrent threads that we analyzed in this research were prepared to deal with that situation by having instructions that check for presence and validity of data on the shared resource. If the data was found absent or invalid, the concurrent thread skipped any processing that was scheduled for those data and thus jumped towards the end of the code of that thread. Owning the synchronization object indefinitely, and thus forcing the concurrent threads to remain in a waiting state indefinitely, proved to be error prone, as we discuss in detail later on in this paper. The general form of the attack is depicted in Algorithm 1.

Algorithm 1: Leverages a synchronization object for seamless kernel data capturing Input: Operating system kernel Output: Data interception enabled by broken concurrency 1 x ← synchronization object 2 y ← target thread 3 z ← target resource 4 while attack = active do 5 if x = released then 6 take ownership of x 7 if y = active and waiting then 8 mimic y 9 access data in or through z 10 release x 11

sleep for 200 milliseconds

12

return intercepted kernel data

technical articles such as [7]. Nevertheless, we discuss in this subsection the various steps that kbdinterceptor takes to achieve that goal, in order to provide the reader with insight into the inner workings of kbdinterceptor. We also use this discussion to prepare the ground for the main contribution of this paper as implemented in kbdinterceptor. kbdinterceptor reads the interrupt descriptor table register (IDTR), which provides the base address of the IDT along with its size in bytes as illustrated in listing 1. kbdinterceptor executes the sidt instruction such as to place the base address and size of the IDT into a memory address itself stored at %ebp - 0x28. The fword data type in listing 1 allocates 6 bytes of storage, namely 32 bits for the base address of the IDT and 16 bits for its size. Listing 1.

III. KBDINTERCEPTOR We now discuss the concrete techniques behind our approach to breaking an instance of concurrency in the Windows kernel such as to enable interception of keystrokes from a keyboard on a test system. We elaborate on the approach in a technical language, while keeping the discussion well connected to actual and pertinent machine code from both kbdinterceptor and the Windows kernel itself. The reader is referred to [14] for detailed technical background on Windows kernel internals. A. Setting the Lock in the Interrupt Descriptor Table kbdinterceptor accesses and manipulates a specific data structure member, namely ActualLock, related to the Interrupt Descriptor Table (IDT) [5]. The technique used by kbdinterceptor to locate the IDT, and thus access the member of interest is not new. It was adopted mainly from [5] and

kbd : 8 FC01330 mov kbd : 8 FC01333 s i d t

kbdinterceptor code

eax , [ ebp+ v a r 2 8 ] fword p t r [ eax ]

By analyzing the IDT manually through a debugger such as WINDBG, which we attach to the Windows kernel, we notice that the vector number for the IDT entry that pertains to keyboard hardware interrupts is 0x71, or 113 in decimal. Figure 1 depicts several entries of the IDT, with the entry of interest being highlighted. The sidt instruction stores the base address and size of the IDT together, one next to the other. As kbdinterceptor needs to operate on the base address of the IDT individually, it accesses the results of the previous execution of the sidt instruction as illustrated in listing 2. Notice the access to the stack at %ebp - 0x28. That memory location holds the address of the base address and size of the IDT. In the last instruction, kbdinterceptor skips the size of the IDT and thus retrieves the base address of the IDT.

Figure 1.

Excerpt from a manual listing of the IDT entries with WINDBG operating through the Windows kernel

Listing 2.

kbd : 8 FC0133D mov kbd : 8 FC01340 mov

kbdinterceptor code

eax , [ ebp+ v a r 2 8 ] ecx , [ eax + 2]

Table I E XCERPT FROM THE KINTERRUPT DATA STRUCTURE OF IDT Member name

At this point, namely in listing 3, kbdinterceptor needs to access the gate descriptor that corresponds to the handling of keyboard hardware interrupts. Each IDT entry, i.e., gate descriptor, is comprised of 8 bytes. Since the vector number of interest is 0x71, the offset of the gate descriptor of interest relative to the IDT base is 0x388, i.e., 113 x 8. Listing 3.

kbd : 8 FC01343 add kbd : 8 FC01349 mov

Type Size InterruptListEntry ServiceRoutine MessageServiceRoutine MessageIndex ServiceContext SpinLock TickCount ActualLock

kbdinterceptor code

Offset 0x00 0x02 0x04 0x0c 0x10 0x14 0x18 0x1c 0x20 0x24

ecx , 388 h [ ebp+ var 2C ] , e c x

From [5], we know that the higher 16 bits of an address of interest, which we will explain in a bit, are at offset 6 relative to the beginning of the gate descriptor. Always with reference to [5], the lower 16 bits of that specific address are at offset 0 relative to the beginning of the gate descriptor. In listing 4, register %ecx points to the beginning of the gate descriptor that pertains to the handling of keyboard hardware interrupts. kbdinterceptor adds 6 to register %ecx such as to reach the address of the higher 16 bits of the address of interest in question, and thus retrieves those higher 16 bits. Shifting those 16 bits to the left by 16, i.e., 0x10 in hexadecimal, and then adding the result to the lower 16 bits of the address of interest yields exactly the address of interest. We conducted machine code analysis of the Windows kernel, and observed that the address of interest reconstructed as discussed previously, points to the end of the KINTERRUPT data structure of the Windows kernel. The size of the KINTERRUPT data structure is 88 bytes, which is 0x58 in hexadecimal. By subtracting 0x58 to the

address of interest, kbdinterceptor obtains the address of the KINTERRUPT data structure, as shown in listing 4. Listing 4.

kbd : 8 FC01382 kbd : 8 FC01385 kbd : 8 FC01389 kbd : 8 FC0138C kbd : 8 FC0138F kbd : 8 FC01392 kbd : 8 FC01396

mov movzx shl mov movzx lea mov

kbdinterceptor code

ecx , [ ebp+ v a r 2 0 ] edx , word p t r [ e c x + 6 ] edx , 10 h eax , [ ebp+ v a r 2 0 ] ecx , word p t r [ eax ] edx , [ edx+ ecx −58h ] [ ebp+ v a r 3 0 ] , edx

From Table I we can see that the offset of the ActualLock member of the KINTERRUPT data structure of IDT is 0x24. Consequently in listing 5, kbdinterceptor adds 0x24 to the address of the KINTERRUPT data structure such as to reach ActualLock. Although ActualLock is in fact a pointer to a flag, for the sake of simplicity we refer to it as a flag with values 1 or 0 in the rest of the paper. kbdinterceptor performs a few propagations of that pointer on stack, and then finally sets to 1 the value pointed by it.

Listing 5.

kbd : 8 FC01399 kbd : 8 FC0139C kbd : 8 FC0139F kbd : 8 FC013A2 kbd : 8 FC013A5 kbd : 8 FC013A8 kbd : 8 FC013AB kbd : 8 FC013AD kbd : 8 FC013B0 kbd : 8 FC013B3

mov add mov mov mov mov mov mov mov mov

kbdinterceptor code

eax , [ ebp+ v a r 3 0 ] eax , 24 h ; [ ebp+ v a r 3 4 ] , eax ecx , [ ebp+ v a r 3 4 ] [ ebp+ v a r 3 8 ] , e c x edx , [ ebp+ v a r 3 8 ] eax , [ edx ] [ ebp+ v a r 2 4 ] , eax ecx , [ ebp+ v a r 2 4 ] dword p t r [ e c x ] , 1

B. Selective Mimicry Coding We now discuss some of the main parts of our analysis of the Windows kernel code that is executed when IRQ1 is raised, and point out both code that is mimicked by kbdinterceptor and code that is bypassed. With reference to listing 6, that is where code from the KiInterruptDispatch routine accesses the ActualLock member of the KINTERRUPT data structure of IDT to check whether it is set to 1. The lock prefix is there to allow the bit test and set (bts) instruction to execute atomically. In the bts instruction, the base points to ActualLock, while the offset is clearly 0. Consequently, the current value of ActualLock is placed on the Carry flag of the FLAGS register, which in turn is inspected by the jb instruction. Irregardless of the previous value of ActualLock, the bts instruction will now set ActualLock to 1. If the previous value of ActualLock was 0, which means that the Carry flag of the FLAGS register is 0, the current instance of KiInterruptDispatch owns ActualLock until voluntarily releasing it when done. In that case, the condition of the jb instruction is not met, therefore the execution flow proceeds with invoking the interrupt service routine (ISR) of the i8042prt driver. That routine, which is called I8042KeyboardInterruptService, reads the scan code from I/O port 0x60, and thus places it in the I/O request packet (IRP) directed towards the upper layers of the keyboard driver stack. When kbdinterceptor actively locks ActualLock, the jb instruction finds the Carry flag of the FLAGS register to be 1. The condition of the jb instruction is met, and thus the execution flow enters a loop through memory location loc 82A3D. Listing 6.

Kernel code

n t : 8 2 A3D77D mov e s i , [ e d i +24 h ] n t : 8 2 A3D780 l o c k b t s dword p t r [ e s i ] , 0 n t : 8 2 A3D785 j b loc 82A3D

The loop iterates through instructions that include those given in listing 6, and thus checks whether ActualLock has been released. In the meantime, kbdinterceptor proceeds with mimicking parts of the KiInterruptDispatch routine along with some of the routines that the latter invokes. In doing so, kbdinterceptor does not include the code in listing 6. kbdinterceptor proactively owns ActualLock, and thus is in the position of progressing with code execution towards blocks of code that read the scan code from

I/O port 0x60. More specifically, kbdinterceptor mimicks the transition from the KiInterruptDispatch routine into the I8042KeyboardInterruptService routine of the i8042prt driver. That invocation is reproduced in listing 7 for the purpose of illustration. Listing 7.

n t : 8 2 A567A5 n t : 8 2 A567A5 n t : 8 2 A567A8 n t : 8 2 A567A9 n t : 8 2 A567AA

Kernel code

loc 82A567A5 : mov eax , [ e d i +18 h ] push eax push edi call dword p t r [ e d i +0Ch ]

In listing 7, register %edi points to the KINTERRUPT data structure of the IDT entry with index 0x71. The code adds to register %edi the offset of the ServiceRoutine member, namely 0x0c as indicated on Table I, and thus reaches the address of the ServiceRoutine member. In the Windows kernel, that specific address is exactly the beginning of the I8042KeyboardInterruptService routine of the i8042prt driver. The code then issues the call to I8042KeyboardInterruptService, i.e., the interrupt handler for IRQ1. That is the point where past malware actively hijacked the execution flow of the overall interrupt processing code from the Windows kernel into their own code, and then back to the Windows kernel. Typical techniques involved overwriting the address of the ServiceRoutine member of the KINTERRUPT data structure, or overwriting the interrupt gate descriptor of interest in the IDT as discussed by mammon in [9]. Despite its effectiveness in enabling malware to interpose between routines involved in interrupt processing in the Windows kernel, the hooking of function pointers in kernel data structures is quite detectable by research such as [17]. kbdinterceptor does not use any form of hooking at all. kbdinterceptor stalls the continuity of the execution flow of the overall interrupt processing code in the Windows kernel by the means of setting the ActualLock member of the KINTERRUPT data structure of IDT, as discussed earlier in this paper. As that execution flow stalls at the KiInterruptDispatch routine, kbdinterceptor gains a window of action and thus has the freedom of proceeding with the invocation of the I8042KeyboardInterruptService routine of the i8042prt driver or selectively mimicking it. Although we opted for the latter, kbdinterceptor does not incorporate any direct reads from I/O port 0x60, which is where the hardware places the key scan codes that correspond to the keys pressed and released by the user on the target keyboard device. A snippet of mimicry coding of the I8042KeyboardInterruptService routine of the i8042prt driver in kbdinterceptor is captured in Figure 2. Code similarity is maintained for the purpose of illustration. That similarity can be easily removed by code metamorphism, therefore it is not usable as a kbdinterceptor signature for the purpose of detection. The main objective behind mimicry

coding is to provide interrupt handling functionality as close to the one provided by the operating system as possible, while enabling kbdinterceptor to intercept the key scan codes as they are generated by the underlying hardware. kbdinterceptor does so without having to modify the instructions of the mimicked routines. Mimicry coding creates a controlled replica that runs independently of the mimicked routines. No interference between the two is created in that the mimicked routines are stalled, while the mimicking routines proceed with the execution alone. In order to avoid carrying direct reads from I/O port 0x60 and any other instructions that would raise flags with state–of–the–art antivirus software systems, kbdinterceptor performs controlled jumps into specific blocks of mimicked routines. With a controlled jump we mean a transfer of the execution flow to instructions of the mimicked routines, with the stack configured such as to return the execution flow to kbdinterceptor code once the destination code completes its execution. The code in listing 7 is a concrete example of a block of instructions from the Windows kernel that kbdinterceptor intermingles with. The code belongs to the I8xGetByteAsynchronous routine of the i8042prt driver, which is of interest in that it is invoked by the I8042KeyboardInterruptService routine of that same driver. In the code, %eax+0xa0 is the address that stores the I/O port number used for key scan codes, namely 0x60. The call instruction transfers the execution flow towards the hardware abstraction layer (HAL) kernel module. Listing 8.

Kernel code

i 8 0 4 2 p r t : 8 C375601 push dword p t r [ eax +0A0h ] i 8 0 4 2 p r t : 8 C375607 c a l l

off 8C37D0CC

The specific HAL routine that kbdinterceptor needs to be able to reach is READ PORT UCHAR, which is given in listing 9. The I/O port, namely 0x60, is passed as a parameter to the READ PORT UCHAR routine. It is placed in the register %edx, and then used by the I/O port read (in) instruction. The key scan code that is read from I/O port 0x60 is placed in the register %eax, and remains there intact upon return. That makes it easier for kbdinterceptor to intercept the key scan code in that it does not have to infer or otherwise determine the address of any memory locations where the key scan code may be stored afterwards. kbdinterceptor simply reads the key scan code from register %eax before releasing the execution flow to legitimate modules of the Windows kernel, after which kbdinterceptor no longer has any visibility into the key scan code. Listing 9.

h a l : 8 2 E2D094 h a l : 8 2 E2D094 h a l : 8 2 E2D094 h a l : 8 2 E2D096 h a l : 8 2 E2D09A h a l : 8 2 E2D09B

Kernel code

hal READ PORT UCHAR : xor mov in retn

eax , eax edx , [ e s p + 4] a l , dx 4

The code in listing 10 is an instance of kbdinterceptor’s intermingling with Windows kernel code. The code passes the I/O port number 0x60 via stack, and then directs the execution flows towards a code path that leads to the invocation of the READ PORT UCHAR routine of the HAL kernel module. The latter reads the key scan code, and hence stores it in register %eax, as mentioned earlier in this section. Since the activation record created by the code in listing 10 contains a return address that resides within the kbdinterceptor code, kbdinterceptor resumes control after the READ PORT UCHAR routine and related code complete. It is the return of control that puts kbdinterceptor in the position of accessing the key scan code from the %eax register, all without kbdinterceptor accessing the I/O bus itself. Listing 10.

kbd : 9 4 4 7 6 4 4 2 kbd : 9 4 4 7 6 4 4 2 kbd : 9 4 4 7 6 4 4 4 kbd : 9 4 4 7 6 4 4 9

kbdinterceptor code

loc 94476442 : push 60 h mov eax , o f f s e t off 8C37D0CC call dword p t r [ eax ]

kbdinterceptor emulates the processing of IRQ1 through a combination of mimicked code and its own code periodically every 250 milliseconds. In our future work we may consider discovering ways to optimize such a periodic query. We say that we have an I/O read hit when kbdinterceptor finds a valid key scan code in register %eax, after reaching the completion of its own involvement of the READ PORT UCHAR routine of the HAL kernel module. If no valid key scan code is found in register %eax at that specific point, we conclude that we have an I/O read miss. kbdinterceptor implements the periodic query with the help of a waitable timer object [10]. The object is created and activated via invocations to the CreateWaitableTimer and SetWaitableTimer functions, respectively. In the latter invocation, kbdinterceptor specifies one of its functions as the completion routine, which is executed at each iteration when the timer is signaled. It is that specific function to initiate the periodic query. Each keystroke typically results in generation of two key scan codes by the keyboard controller, namely one when the key is pressed and another one when the key is released [11]. In the case of an I/O read hit, if the key scan code corresponds to a key being pressed, then kbdinterceptor does not attempt to intercept the key scan code generated by the release of that same key on the keyboard. kbdinterceptor simply clears the ActualLock member of the KINTERRUPT data structure of IDT, and thus allows the Windows kernel code to proceed with the hardware interrupt handling. In most cases, the release key scan code can be inferred from the press down key scan code since the majority of the key scan codes do not overlap with each other. kbdinterceptor returns to set the ActualLock member again in order to prepare for capturing the next keystroke.

Figure 2.

Mimicry coding of the ISR of the i8042prt driver in kbdinterceptor

Clearing and setting the ActualLock serves also the purpose of countering the creation of interrupt storms, while retaining control of the path to I/O port 0x60. C. Avoiding an Interrupt Storm Given that kbdinterceptor brings the hardware interrupt handling to a stall at the KiInterruptDispatch routine, the interrupt handling code is prevented from instructing the underlying hardware to release the interrupt signal. As the user continues to type on the keyboard, that situation results in an interrupt storm. Interrupt storms are clearly noticeable by the user when they occur. Consequently, an interrupt storm would make kbdinterceptor quite detectable. In order to avoid an interrupt storm from being created, kbdinterceptor briefly clears the ActualLock member of the KINTERRUPT data structure of IDT after intercepting a valid key scan code. That will enable the KiInterruptDispatch routine to exit from the loop shown in listing 6, and thus proceed with instructing the underlying hardware

to release the interrupt signal. That is also something that kbdinterceptor can do itself by intermingling with kernel code. Nevertheless, allowing the kernel code to perform that specific task is a better solution in that it also avoids the creation of multiple loops that wait on ActualLock to clear. Several instances of those endless iterations would utilize most of the CPU, and thus, once again, make kbdinterceptor detectable. No further pointer calculations are conducted. kbdinterceptor reuses the pointer to the ActualLock member of the KINTERRUPT data structure of IDT when allowing the KiInterruptDispatch routine of the interrupt handling code to exit its stalling loop, and hence proceed towards invoking the I8042KeyboardInterruptService routine of the i8042prt driver. The code in listing 11 illustrates the release of ActualLock by kbdinterceptor. The pointer in question is stored on stack at an address equal to %ebp - 0x24. That is the specific memory location where the pointer was placed by the kbdinterceptor code that took ownership of

ActualLock in the first place. Storing a four-byte NULL into the memory location pointed to by the pointer at hand initiates the actual release, carried out by the last instruction. After the release, kbdinterceptor waits for a few milliseconds through a second waitable timer object, after which it enters a loop similar to that of the KiInterruptDispatch routine. Listing 11.

kbdinterceptor code

CODE XREF : k b d i n t e r c e p t o r r 9 8 2 6 0 L o c k e r +E4 kbd : 8 FC013E5 loc 8FC013E5 : kbd : 8 FC013E5 mov eax , [ ebp+ v a r 2 4 ] kbd : 8 FC013E8 mov dword p t r [ eax ] , 0

The waitable timer object is utilized by kbdinterceptor such as to enable the KiInterruptDispatch routine to set the ActualLock member of the KINTERRUPT data structure of IDT, complete, and then at the end clear ActualLock. The loop entered by kbdinterceptor checks whether ActualLock has transitioned from 1 to 0, which is an indicator that the KiInterruptDispatch routine is now complete. If that is the case, kbdinterceptor exits its loop, and thus begins the next iteration of key scan code interception by setting ActualLock and applying the overall attack logic discussed in this paper. The presence of the second waitable timer object in kbdinterceptor has a side effect that impacts negatively the kbdinterceptor’s ability to regain control of ActualLock. It creates a race between kbdinterceptor and the KiInterruptDispatch routine of the interrupt handling code the next time a hardware interrupt is raised. The two compete for ActualLock. It is a fact that kbdinterceptor may very well lose the race. In that case, kbdinterceptor misses the key scan code as that byte is read and hence taken, processed, and propagated by kernel code. Given that kbdinterceptor does not engage in any form of function hooking in order to avoid detection, once the key scan code is in possession of the kernel code, kbdinterceptor as defined in this paper has no way of recovering it. Nevertheless, after losing a race, kbdinterceptor still has a second opportunity, namely intercepting the second key scan code associated with the one missed. For example, if kbdinterceptor misses the key scan code of a key pressed on the keyboard, kbdinterceptor may intercept the key scan code generated when that key is released. Since in most cases only one of the two key scan codes associated with a keystroke is needed to recover the character typed through the keyboard, kbdinterceptor only needs to win the race against the KiInterruptDispatch routine once every two keyboard hardware interrupts. IV. D ETECTION One of the ways by which a concurrency breaker such as kbdinterceptor can be detected is to analyze its controlled dives into existing code of the Windows kernel through dynamic analysis. While advanced attack code can protect

itself from dynamic analysis, we leave the treatment of techniques such as anti-debug & anti-virtual machines and procrastination to [6] and [8], respectively. kbdinterceptor will not access I/O buses itself, nor will it invoke kernel routines that do. With reference to keystroke interception, for example, kbdinterceptor will not contain any in instructions in its own code. kbdinterceptor will also avoid invoking directly routines such as the READ PORT UCHAR routine of the HAL kernel module. Instead, kbdinterceptor will leverage code paths in Windows that include invocations to routines of interest such as READ PORT UCHAR. kbdinterceptor will try to intermingle with such existing legitimate code in a way that allows it to regain control of the execution flow once the legitimate code has acquired the target data. It is quite feasible for dynamic analysis of machine code to identify branches into existing code of the Windows kernel. The defender’s task then is to determine whether the segment of legitimate code traversed by the code under analysis includes accesses to data sources such as I/O buses. Both those tasks are fully implementable against kbdinterceptor, and the overall dynamic analysis of kbdinterceptor and Windows kernel code can be performed with a moderate level of effort. The amount of effort however might vary in unknown concurrency breakers, depending on their size and the length of the Windows kernel code segments traversed by them. Nevertheless, we believe that the behavioral signature of those concurrency breakers will have to stay near the definition given in this paper if avoiding detection remains a priority. Another way to detect a concurrency breaker consists of dynamic analysis to identify the synchronization objects that are accessed by the code under analysis. The next step then is to determine whether the code under analysis is allowed to access those objects. In the case of kbdinterceptor, for example, careful dynamic analysis will find that the code under analysis accesses the condition variable ActualLock. Those multiple accesses are inherent to kbdinterceptor. Without those accesses, kbdinterceptor cannot break any concurrencies, and thus cannot function. The question for the defender then becomes whether kbdinterceptor is in the white list of kernel modules that are allowed to access the ActualLock member of the KINTERRUPT data structure of IDT. A quick review of the function of ActualLock reveals that the KiInterruptDispatch routine of the nt module is a legitimate party, while clearly an unknown kernel module such as kbdinterceptor is not. The task of determining what module is accessing what specific synchronization objects requires marking the boundaries of each module involved. In our case, module boundaries hold in that a concurrency breaker such as kbdinterceptor does not overwrite legitimate kernel code with its own code. While that is, again, in order to avoid detection, the self-containment property of concurrency breakers helps the attribution of synchronization

object accesses to kernel modules, which in turn translates to an accurate identification of allowed modules and disallowed modules as well. V. R ELATED W ORK The direct access to I/O buses is a technique utilized by various malware to read sensitive data such as key scan codes. The Trojan Feutel-S is an example of malware that incorporates an in instruction to record the keystroke directly from the keyboard controller by accessing I/O port 0x60 [18]. Such malware is easily detectable by the presence of the in instruction in its code. kbdinterceptor is not susceptible to that limitation in that it does not use the in instruction directly. There is no in instruction in the kbdinterceptor code. Another technique that is utilized by real world malware to intercept key scan codes consists of modifying the code of Windows drivers that are involved in hardware interrupt handling, such as for example the Windows keyboard port driver i8042prt. The malware that relies on that technique is referred to as type 1 malware by Rutkowska in [15], given that it changes contents that should remain static, namely kernel code. As pointed out in [15], type 1 malware is easy to detect based on integrity checks. Any changes to the i8042prt driver, for example, can be easily detected through a hash function. Any code modification will change the hash, and thus get detected. kbdinterceptor does not perform the slightest modification to any of the kernel code, therefore is not susceptible to detection based on integrity checks. The hooking of function pointers was a widespread and effective technique utilized by malware to intercept sensitive kernel data [9]. Function hooking, however, is quite detectable by advances in malware detection. The HookSafe approach by Wang et al. is built as a hypervisor that is able to protect thousands of kernel function pointers in a guest operating system [19]. Wang et al. observed that, after being initialized, a function pointer in the kernel is mostly accessed in reading and rarely in writing. The authors relocate those function pointers to a dedicated page-aligned memory space, and then control the accesses to those pointers with hardware-based page-level protection. Yin et al. developed a tool called HookScout that can detect function pointer modifications [17]. HookScout is comprised of an analysis subsystem and a detection subsystem. The analysis subsystem generates policies for hook detection based on static and dynamic analyses of image files. The detection subsystem, which is actually deployed on a user’s machine, enforces those policies such as to detect hooks at runtime as they occur. Neither HookSafe nor HookScout have the capability to detect a concurrency breaker such as kbdinterceptor, simply becase kbdinterceptor does not hook any function pointers. Abadi et al. propose the use of a control-flow graph (CFG) to detect exploits and malware that divert the execution

flow of an executable [1]. The CFG is determined ahead of time. The executable is instrumented such as to check that, whenever a machine instruction transfers control, the destination of that transfer follows a valid path as determined by the CFG. Nevertheless, it is nontrivial to precompute a precise CFG for the kernel code due to its complexity, which involves control structures, various levels of interrupt handling, and concurrency [12]. Because of that reason, Petroni and Hicks propose a state-based monitoring technique that checks a kernel module periodically rather than closely tracing the execution of that module [12]. The monitor examines the states of a CFG only, and thus does not follow the actual state transitions. kbdinterceptor is not detectable by the enforcement of a CFG as kbdinterceptor does not change the control flow of legitimate kernel code in a thread in its entirety. kbdinterceptor causes an execution stall of kernel code by forcing that code to enter a loop and by controlling the loop exit condition. The loop that the kernel code is forced to enter into, however, is totally part of the normal execution flow of that code. The loop is part of the CFG, like any other legitimate component of the kernel code. Carbone et al. devised a technique that provides for kernel integrity checking by mapping dynamic kernel data. Carbone et al. reason upon a white list of trusted kernel modules and drivers. Firstly, the authors check if any of the modules in the white list was modified. If that is the case, each modified module is marked as untrusted. The authors then leverage their mapping of dynamic kernel data to search for function pointers that do not point to trusted modules or data structures that were identified by their mapping [3]. kbdinterceptor is not detectable by the approach of Carbone et al. as it does not violate kernel integrity by any means. Baliga et al. propose a technique that can detect modifications of non-control data in addition to those that affect function pointers and control data in general [2]. That work is of interest to this paper in that kbdinterceptor modifies non-control data, namely the ActualLock member of the KINTERRUPT data structure of IDT. Baliga et al. observe the kernel code execution, and based on that learning, create hypotheses on invariants on kernel data structures. Any violation of those invariants is then used as an indicator of malware presence. The non-control data, which kbdinterceptor operates on, is supposed to make 0 to 1 and 1 to 0 transitions over time as part of the hardware interrupt handling in the kernel. kbdinterceptor leverages those legitimate data transitions within a time window in which those transitions should be taking place, and thus does not violate any non-control data invariants. Other defenses focus on ascertaining that only approved code can execute in the kernel. Seshadri et al. propose a hypervisor called SecVisor, which enforces the write and execute property of memory pages to prevent unauthorized code from running in the kernel [16]. Riley et al. devised a

virtual machine monitor, namely NICKLE, which maintains a shadow physical memory. The authors authenticate kernel code in real-time, which they store in the shadow memory. At runtime, only code from the shadow memory is allowed to execute [13]. Trojanized software is provided to, and installed by, a user as a desired utility, therefore SecVisor and NICKLE will mark that code as trusted. There have been several public cases of Trojan horse code, which have received broad media coverage [4]. VI. C ONCLUSIONS

[4] Computer World. Researchers Spot Mac Trojan in the Wild. [Online; accessed April 6th, 2014]. http://www.computerworld.com/s/article/9101898/Researchers spot Mac Trojan in the wild [5] Intel. Intel 64 and IA-32 Architectures Software Developers Manual. Volume 3A: System Programming Guide, Part 1, May 2011. [Online; accessed February 15th, 2014]. http://www.intel.com/Assets/en US/PDF/manual/253668.pdf [6] A. Issa. Anti-virtual Machines and Emulations. Journal in Computer Virology and Hacking Techniques, pp. 141–149, vol. 8, issue 4, November 2012.

Breaking concurrency in kernel land is an effective way of enabling sensitive data interception. In this paper we discussed the main ideas behind the attack as applied to keylogging, with practical references to a proof of concept kernel driver called kbdinterceptor. A concurrency breaker such as kbdinterceptor manipulates synchronization objects with the objective of stalling the active side of the code execution. The stalling allows the concurrency breaker to take over the code execution, and thus proceed with it on its own until receiving the data harvested by it. The attack code supports itself by intermingling with blocks of instructions from the Windows kernel. Overall, the concurrency breaker does not have a code composition or perform any operations that are known to be utilized by other attack codes, consequently it is undetectable by existing approaches to malware detection. Based on the research we have conducted so far, we propose two ways of detecting a concurrency breaker such as kbdinterceptor. One approach consists of locating controlled dives into existing code of the Windows kernel, and then determining whether the code paths explored that way include functionality that accesses I/O buses or any other sources of sensitive data. The other approach consists of identifying the synchronization objects that are accessed by the code under analysis, and then determining whether the access to those objects is legitimate. The two detection approaches can be applied jointly to increase the likelihood of detecting concurrency breaks and hence unauthorized data captures.

[10] Microsoft. Waitable Timer Objects. [Online; accessed February 15th, 2014]. http://msdn.microsoft.com/enus/library/windows/desktop/ms687012%28v=vs.85%29.aspx

R EFERENCES

[14] M. Russinovich, D. Solomon, A. Ionescu. Windows Internals. Parts 1 and 2, Microsoft Press, March and September 2012.

[1] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. Controlflow Integrity. In Proceedings of the ACM Conference on Computer and Communications Security, Alexandria, VA, USA, November 2005.

[15] J. Rutkowska. Rootkit Hunting vs. Compromise Detection. Black Hat Federal, Washington D.C., January 2006.

[7] kad. Handling Interrupt Descriptor Table for Fun and Profit. vol. 11, issue 59. [Online; accessed February 15th, 2014]. http://www.phrack.org/issues.html?issue=59&id=4 [8] C. Kolbitsch, E. Kirda, and C. Kruegel. The Power of Procrastination: Detection and Mitigation of Execution-Stalling Malicious Code. In Proceedings of the 18th ACM conference on Computer and Communications Security, pp. 285–296, Chicago, Illinois, USA, October 2011. [9] Mammon . Hooking Interrupt and Exception Handlers in Linux. [Online; accessed February 15th, 2014]. http://mammon.github.io/Text/linux hooker.txt

[11] Microsoft. Keyboard Scan Code Specification. Revision 1.3a, March 16, 2000. [Online; accessed February 15th, 2014]. [12] N.L. Petroni, and M. Hicks. Automated Detection of Persistent Kernel Control-flow Attacks. In Proceedings of the ACM Conference on Computer and Communications Security, pp. 103–115, Alexandria, VA, USA, October 2007. [13] R. Riley, X. Jiang, and D. Xu. Guest-Transparent Prevention of Kernel Rootkits with VMM-Based Memory Shadowing. In Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection, pp. 1–20, Boston, MA, USA, September 2008.

[2] A. Baliga, V. Ganapathy, and L. Iftode. Automatic Inference and Enforcement of Kernel Data Structure Invariants. In Proceedings of the 24th Annual Computer Security Applications Conference, Anaheim, California, USA, December 2008.

[16] A. Seshadri, M. Luk, N. Qu, and A. Perrig. SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles, pp. 335–350, Stevenson, WA, USA, October 2007.

[3] M. Carbone, W. Cui, L. Lu, W. Lee, M. Peinado, and X. Jiang Mapping Kernel Objects to Enable Systematic Integrity Checking. In Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 555–565, Chicago, IL, USA, November 2009.

[17] H. Yin, P. Poosankam, S. Hanna, and D. Song. HookScout: Proactive Binary-Centric Hook Detection. In Proceedings of the 7th Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA), pp. 1–20, Bonn, Germany, July 2010.

[18] A. Vasudevan. Re-inforced Stealth Breakpoints. In Proceedings of the 4th International Conference on Risks and Security of Internet and Systems , pp. 59–66, October 2009. [19] Z. Wang, X. Jiang, W. Cui, and P. Ning. Countering Kernel Rootkits with Lightweight Hook Protection. In Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 545–554, Chicago, Illinois, USA, November 2009.

Data Interception through Broken Concurrency in ...

definition, a synchronization object requires the concurrent threads or ... access the target resource while it still has the data. ... access data in or through z. 10.

163KB Sizes 4 Downloads 124 Views

Recommend Documents

Data Management and Concurrency Control in ...
The Intel produced PXA270 processor [16] is a widely used processor in PDAs which ..... Acharya et. al proposes IPP (Interleaved Push Pull) [12] which uses a ...

Data Management and Concurrency Control in ...
Data Management and Concurrency Control in. Broadcast based ... environments is not big enough. ... The standard approach for data broadcast is to use a flat ...

Data Management and Concurrency Control in ...
As part of the survey, we made a simulation platform and implemented several data management techniques and four concurrency protocols, BCC-TI, FBOCC,.

Data Management and Concurrency Control in ...
In Illustration 1, T1 would be validated at the end and ... the-art protocols including our contribution partial restart. ... Illustration 2: Transaction time of FBOCC with.

Hacking Suite for Governmental Interception - Technician's Guide.pdf
Page 1 of 180. Page 1 of 180. Page 2 of 180. Page 2 of 180. Page 3 of 180. Page 3 of 180. Hacking Suite for Governmental Interception - Technician's Guide.pdf.

Event-Driven Concurrency in JavaScript - GitHub
24 l. Figure 2.6: Message Passing in Go. When, for example, Turnstile thread sends a value over counter ...... Is JavaScript faster than C? http://onlinevillage.blogspot. ... //people.mozilla.com/~dmandelin/KnowYourEngines_Velocity2011.pdf.

Information Technology (Procedure and Safeguards for Interception ...
Information Technology (Procedure and Safeguards for ... n, Monitoring and Decryption of Data) Rules, 2016.pdf. Information Technology (Procedure and ...

Detecting Artifacts in Clinical Data through Projection Retrieval
from artifacts in multivariate vital signs data collected ... The problem of recovering simple projections for classi- .... Projection Recovery Framework (RIPR).