A tool for the

Symbolic Execution of Linux binaries

About Symbolic Execution ● Dynamically explore all program branches. ● Inputs are considered symbolic variables. ● Symbols remain uninstantiated and become constrained at execution time. ● At a conditional branch operating on symbolic terms, the execution is forked. ● Each feasible branch is taken, and the appropriate constraints logged.

Input space >> Number of paths int main( ) { int val; read(STDIN, &val, sizeof(val) ); if ( val > 0 ) if ( val < 100 ) do_something( ); else do_something_else( ); }

This is used for: ● Test generation and bug hunting. ● Reason about reachability. ● Worst-Case Execution Time Analysis. ● Comparing different versions of a func. ● Deobfuscation, malware analisys. ● AEG: Automatic Exploit Generation. Whaat?!

State of the art ● Lots of academic papers: ○ 2008-12-OSDI-KLEE ○ Unleashing MAYHEM on Binary

● Several implementations: ○ SymDroid, Cloud9, Pex, jCUTE, Java PathFinder, KLEE, s2e, fuzzball, mayhem, cbass

● Only a few work on binary : ○ libVEX / IL based ○ quemu based

Our aim ● ● ● ●

Emulate x86-64 machine code symbolically. Load ELF executables. Synthesize any process state as starting point. The final code should be readable and easy to extend. ● Use as few dependencies as possible: ○ pyelftools, distorm3 and z3 ● Analysis state can be saved and restored. ● Workload can be distributed (dispy)

Basic architecture

Instructions Frequency in GNU LIBC ● ● ● ●

336 different opcodes 160218 total instructions 37% of them are MOV or ADD currently 185 instruction implemented

CPU ● Based on distorm3 DecomposeInterface. ● Most instructions are very simple, ex. @instruction def DEC(cpu, dest): res = dest.write( dest.read() - 1 ) #Affected Flags o..szapc cpu.calculateFlags('DEC', dest.size, res)

Memory

class Memory: def mprotect(self, start, size, perms): … def munmap(self, start, size): … def mmap(self, addr, size, perms): … def putchar(self, addr, data): … def getchar(self, addr): …

Operating System Model (Linux) class Linux: def exe(self, filename, argv=[], envp=[]):… def syscall(self, cpu):… def def def def def

sys_open(self, cpu, buf, flags, mode):… sys_read(self, cpu, fd, buf, count):… sys_write(self, cpu, fd, buf, size):… sys_close(self, cpu, fd):… sys_brk(self, cpu, brk):…

Symbols and SMT solver class Solver: def getallvalues(self, x, maxcnt = 30): def minmax(self, x, iters=10000): def check(self): def add(self, constraint): #Symbols factory def mkArray(self, size, name ):… def mkBool(self, name ):… def mkBitVec(self, size, name ):…

Operation over symbols is almost transparent >>> from smtlibv2 import * >>> s = Solver() >>> a = s.mkBitVec(32) >>> b = s.mkBitVec(32) >>> s.add(a + 2*b > 100) >>> s.check() 'sat' >>> s.getvalue(a), s.getvalue(b) (101, 0)

The glue: Basic Initialization 1. Make Solver, Memory, Cpu and Linux objects. 2. Load ELF binary program in Memory, Initialize cpu registers, initialize stack. solver = Solver() mem = SMemory(solver, bits, 12 ) cpu = Cpu(mem, arch ) linux = SLinux(solver, [cpu], mem, ... ) linux.exe(“./my_test”, argv=[], env=[])

The glue: Basic analysis loop states = [‘init.pkl’] while len(states) > 0 : linux = load(state.pop()) while linux.running: linux.execute() if isinstance( linux.cpu.PC, Symbol): vals = solver.getallvalues(linux. cpu.PC) -- generate states for each value -break

Micro demo python system.py -h usage: system.py [-h] [-sym SYM] [-stdin STDIN] [-stdout STDOUT] [-stderr STDERR] [-env ENV] PROGRAM ... python system.py -sym stdin

my_prog

stdin: PDF-1.2++++++++++++++++++++++++++++++

Symbolic inputs. We need to mark whic part of the environment is symbolic: ● STDIN: a file partially symbolic. Symbols marked with “+” ● STDOUT and STDERR are placeholders. ● ARGV and ENVP can be symbolic

A toy example int main(int argc, char* argv[], char* envp[]){ char buffer[0x100] = {0}; read(0, buffer, 0x100); if (strcmp(buffer, "ZARAZA") == 0 ) printf("Message: ZARAZA!\n"); else printf("Message: Not Found!\n"); return 0; }

Conclusions, future work ● Push all known optimizations: solver cache, implied values, Redundant State Elimination, constraint independence, KLEE-like cex cache, symbol simplification. ● Add more cpu instructions (fpu, simd). ● Improve Linux model, add network. ● Implement OSX loader and os model. ● https://github.com/feliam/pysymemu

Gracias.

g

Contacto: [email protected]

Symbolic Execution - GitHub

SymDroid, Cloud9, Pex, jCUTE, Java PathFinder,. KLEE, s2e ... The final code should be readable and easy to ... PDF-1.2++++++++++++++++++++++++++++++ ...

757KB Sizes 4 Downloads 209 Views

Recommend Documents

Efficient State Merging in Symbolic Execution - Dependable Systems ...
analysis, where the resulting merged state over-approximates the individual ...... The bounded model checker in the Varvel/F-Soft verification platform uses ...

A Symbolic Execution Framework for JavaScript
Abstract—As AJAX applications gain popularity, client-side. JavaScript code is becoming increasingly complex. However, few automated vulnerability analysis tools for JavaScript are available. In this paper, we describe the first system for explorin

GNU Bash code execution vulnerability in path completion - GitHub
Jan 17, 2017 - In the commit, a call to directory exists replaces this check (both ... Following that call, we observe that the parameter dirname is dequoted.

Execution of Execution of Asynchronous Substitution ...
2Assistant Professor, Department of ECE,Velalar College of Engineering and Technology, Anna University. Chennai ... substitution box design essentially matches all the important security properties. ... using Mentor Graphics EDA (Electronic Design Au

Symbolic Exchange and Death.pdf
Page 1 of 196. cover. file:///M|/Users/NEWNEW/Desktop/0803983999/files/__joined.html[16-12-2009 11:07:03]. title: Symbolic Exchange and Death Theory, ...

Vector Symbolic Architectures - Washington and Lee University
We provide an overview of Vector Symbolic Architectures (VSA), a class ... Perhaps more so than any other sub-field of computer science, artificial intelligence .... The introduction of noise requires that the unbinding process employ a “cleanup.

Symbolic Logic Requirement(2013).pdf
All candidates for the Ph.D. in Philosophy are required to have a degree of proficiency. in logic equivalent to that attained in a semester course in symbolic logic.

Custom execution environments in the BOINC middleware
such as VMWare[6] and Xen[2], or application level sandboxes such as Native ..... B. Customizable execution environments with virtual desktop grid computing.