A JIT Compiler for Android’s Dalvik VM Ben Cheng, Bill Buzbee May 2010

Overview • View live session notes and ask questions on Google Wave: – http://bit.ly/bIzjnF

• Dalvik Environment • Trace vs. Method Granularity JITs • Dalvik JIT 1.0 • Future directions for the JIT • Performance Case Studies • Profiling JIT’d code • Built-in Self-Verification Mode

3

Dalvik Execution Environment • Virtual Machine for Android Apps – See 2008 Google IO talk • http://www.youtube.com/watch?v=ptjedOZEXPM

• Very compact representation • Emphasis on code/data sharing to reduce memory usage • Process container sandboxes for security

4

Dalvik Interpreter • Dalvik programs consist of byte code, processed by a hostspecific interpreter – Highly-tuned, very fast interpreter (2x similar) – Typically less than 1/3rd of time spent in the interpreter – OS and performance-critical library code natively compiled – Good enough for most applications

• Performance a problem for compute-intensive applications – Partial solution was the release of the Android Native Development Kit, which allows Dalvik applications to call out to statically-compiled methods

• Other part of the solution is a Just-In-Time Compiler – Translates byte code to optimized native code at run time 5

A JIT for Dalvik - but what flavor of JIT? • Surprisingly wide variety of JIT styles – When to compile • install time, launch time, method invoke time, instruction fetch time

– What to compile • whole program, shared library, page, method, trace, single instruction

• Each combination has strengths & weaknesses - key for us was to meet the needs of a mobile, battery-powered Android device – Minimal additional memory usage – Coexist with Dalvik’s container-based security model – Quick delivery of performance boost – Smooth transition between interpretation & compiled code 6

Method vs. Trace Granularity • Method-granularity JIT – Most common model for server JITs – Interprets with profiling to detect hot methods – Compile & optimize method-sized chunks – Strengths • Larger optimization window • Machine state sync with interpreter only at method call boundaries

– Weaknesses • Cold code within hot methods gets compiled • Much higher memory usage during compilation & optimization • Longer delay between the point at which a method goes hot and the point that a compiled and optimized method delivers benefits

7

Method vs. Trace Granularity • Trace-granularity JIT – Most common model for low-level code migration systems – Interprets with profiling to identify hot execution paths – Compiled fragments chained together in translation cache – Strengths • Only hottest of hot code is compiled, minimizing memory usage • Tight integration with interpreter allows focus on common cases • Very rapid return of performance boost once hotness detected

– Weaknesses • Smaller optimization window limits peak gain • More frequent state synchronization with interpreter • Difficult to share translation cache across processes 8

Hot vs. Cold Code: system_server example Full Program 4,695,780 bytes

Hot Methods

Method JIT: Best optimization window Trace JIT: Best speed/space tradeoff

396,230 bytes 8% of Program

Hot Traces 103,966 bytes 26% of Hot Methods 2% of Program

9

The Decision: Start with a Trace JIT • Minimizing memory usage critical for mobile devices • Important to deliver performance boost quickly – User might give up on new app if we wait too long to JIT

• Leave open the possibility of supplementing with methodbased JIT – The two styles can co-exist – A mobile device looks more like a server when it’s plugged in – Best of both worlds • Trace JIT when running on battery • Method JIT in background while charging

10

Interpret until next potential trace head

Start

Dalvik Trace JIT Flow

No

Update profile count for this location

Translation Cache

Threshold? Translation Translation

Yes

Exit 0 Exit 1

Exit 0

Interpret/build trace request

No

Xlation exists?

Yes

Exit 1

Submit compilation request Translation

Compiler Thread

Install new translation Exit 0 Exit 1

11

Dalvik JIT v1.0 Overview • Tight integration with interpreter – Useful to think of the JIT as an extension of the interpreter

• Interpreter profiles and triggers trace selection mode when a potential trace head goes hot • Trace request is built during interpretation – Allows access to actual run-time values – Ensures that trace only includes byte codes that have successfully executed at least once (useful for some optimizations)

• Trace requests handed off to compiler thread, which compiles and optimizes into native code • Compiled traces chained together in translation cache 12

Dalvik JIT v1.0 Features • Per-process translation caches (sharing only within security sandboxes) • Simple traces - generally 1 to 2 basic blocks long • Local optimizations – Register promotion – Load/store elimination – Redundant null-check elimination – Heuristic scheduling

• Loop optimizations – Simple loop detection – Invariant code motion – Induction variable optimization 13

CPU-Intensive Benchmark Results Speedup relative to Dalvik Interpreter on Nexus One 6

• Linpack, BenchmarkPI, CaffeineMark & Checkers from the Android Market

4 2 0

Linpack

BenchmarkPi

CMark

SciMark3

Checkers

JIT Total Memory Usage (in kBytes)

• Measurements taken on Nexus One running pre-release Froyo build in airplane mode

300 200 100 0 14

Linpack

BenchmarkPi

CMark

SciMark3

• Scimark 3 run from command-line shell

Checkers

Future Directions • Method in-lining • Trace extension • Persistent profile information • Off-line trace coalescing • Off-line method translation • Tuning, tuning and more tuning

15

Solving Performance and Correctness Issues • How much boost will an app get from the JIT? – JIT can only remove cycles from the interpreter – OProfile can provide the insight to breakdown the workload

• How resource-friendly/optimizing is the JIT? – Again, OProfile can provide some high-level information – Use a special Dalvik build to analyze code quality

• How to debug the JIT? – Code generation vs optimization bugs – Self-verification against the interpreter

16

Google Confidential

Case Study: RoboDefense Lots of actions

17

Google Confidential

Case Study: RoboDefense Performance gain from Dalvik capped at 4.34%

Samples

18

Google Confidential

% Module

15965

73.98 libskia.so

2662

12.33 no-vmlinux

1038

4.81 libcutils.so

937

4.34 libdvm.so

308

1.42 libc.so

297

1.37 libGLESv2_adreno200.so

Case Study: Checkers JIT <3 “Brain and Puzzle”

965022

5231208 5.4x Speedup

19

Google Confidential

Case Study: Checkers Use OProfile to explain the speedup

Samples 96.45%

20

Google Confidential

975

% Module 93.57 dalvik-jit-code-cache

30

2.88 libdvm.so

28

2.69 no-vmlinux

97%

4

0.38 libc.so

3

0.09 libGLESv2_adreno200.so

3%

Solving Performance and Correctness Issues Part 2/3

• How much boost will an app get from the JIT? • How resource-friendly/optimizing is the JIT? • How to debug the JIT?

21

Google Confidential

Peek into the Code Cache Land kill -12

• Example from system_server (20 minutes after boot) – 9898 compilations using 796264 bytes • 80 bytes / compilation

– Code size stats: 103966/396230 (trace/method Dalvik) • 796264 / 103966 = 7.7x code bloat from Dalvik to native

– Total compilation time: 6024 ms • Average unit compilation time: 609 µs

22

Google Confidential

JIT Profiling Set “dalvik.vm.jit.profile = true” in /data/local.prop

count

%

offset (# insn), line

method signature Ljava/util/HashMap;size;()I

15368 1.15

0x0(+2), 283

13259 1.00

0x18(+2), 858

Lcom/android/internal/os/ BatteryStatsImpl;readKernelWakelockStats;()Ljava/util/Map;

13259 1.00

0x22(+2), 857

Lcom/android/internal/os/ BatteryStatsImpl;readKernelWakelockStats;()Ljava/util/Map;

11842 0.89

0x5(+2), 183

Ljava/util/HashSet;size;()I

11827 0.89

0x0(+2), 183

Ljava/util/HashSet;size;()I

11605 0.87

0x30(+3), 892

23

Google Confidential

Lcom/android/internal/os/BatteryStatsImpl;parseProcWakelocks; ([BI)Ljava/util/Map;

Solving Performance and Correctness Issues Part 3/3

• How much boost will an app get from the JIT? • How resource-friendly/optimizing is the JIT? • How to debug the JIT?

24

Google Confidential

Guess What’s Wrong Here A codegen bug is deliberately injected to the JIT 84): *** FATAL EXCEPTION IN SYSTEM PROCESS: android.server.ServerThread E/AndroidRuntime( 84): java.lang.RuntimeException: Binary XML file line • E/AndroidRuntime(

#28: You must supply a layout_width attribute. E/AndroidRuntime( 84): at android.content.res.TypedArray.getLayoutDimension(TypedArray.java: 491) E/AndroidRuntime( 187): *** E/AndroidRuntime( 84): at android.view.ViewGroup $LayoutParams.setBaseAttributes(ViewGroup.java:3592)

FATAL EXCEPTION IN SYSTEM PROCESS:

WindowManager

java.lang.ArrayIndexOutOfBoundsException

E/AndroidRuntime( 187): E/AndroidRuntime( 187): 661) E/AndroidRuntime(187): 435): *** E/AndroidRuntime(

at java.util.GregorianCalendar.computeFields(GregorianCalendar.java:

FATAL EXCEPTION IN SYSTEM PROCESS: at java.util.Calendar.complete(Calendar.java:807) : android.server.ServerThread E/AndroidRuntime( 435): java.lang.StackOverflowError E/AndroidRuntime( 435): at java.util.Hashtable.get(Hashtable.java:267) E/AndroidRuntime( 435): at java.util.PropertyResourceBundle.handleGetObject(PropertyResourceBundle.java:120) :

25

Google Confidential

Debugging and Verification Tools

Code generation

Optimization

26

Google Confidential

Byte code binary search

Call graph filtering

Self-verification w/ the interpreter











Bugs == Incorrect Machine States Heap, stack, and control-flow

Shadow Heap Addr

Data

==

Heap

Shadow Stack ==

PC 27

Google Confidential

==

Stack

PC

Step-by-Step Debugging under Self-Verification Divergence detected

~~~ DbgIntp(8): REGISTERS DIVERGENCE! ********** SHADOW STATE DUMP ********** CurrentPC: 0x42062d24, Offset: 0x0012 Class: Ljava/lang/Character; Method: toUpperCase Dalvik PC: 0x42062d1c endPC: 0x42062d24 Interp FP: 0x41866a3c endFP: 0x41866a3c Shadow FP: 0x22c330 endFP: 0x22c330 Frame1 Bytes: 8 Frame2 Local: 0 Bytes: 0 Trace length: 2 State: 0

28

Google Confidential

Step-by-Step Debugging under Self-Verification Divergence details

********** SHADOW TRACE DUMP ********** 0x42062d1c: (0x000e) const/16 0x42062d20: (0x0010) if-ge *** Interp Registers: (v0) 0x b5 X (v1) 0x 55 *** Shadow Registers: (v0) 0x b6 X (v1) 0x 55

29

Google Confidential

Step-by-Step Debugging under Self-Verification Replay the compilation with verbose dump

Compiler: Building trace for toUpperCase, offset 0xe 0x42062d1c: 0x0013 const/16 v0, #181 0x42062d20: 0x0035 if-ge v1, v0, #4 TRACEINFO (141): 0x42062d00 Ljava/lang/Character;toUpperCase -------- dalvik offset: 0x000e @ const/16 v0, #181 0x2 (0002): ldr r1, [r5, #4] 0x4 (0004): mov r0, #182 -------- dalvik offset: 0x0010 @ if-ge v1, v0, #4 0x6 (0006): cmp r1, r0 0x8 (0008): str r0, [r5, #0] 0xa (000a): bge 0x00000014

30

Google Confidential

Summary • A resource friendly JIT for Dalvik – Small memory footprint

• Significant speedup improvement delivered – 2x ~ 5x performance gain for computation intensive workloads

• More optimizations waiting in the pipeline – Enable more computation intensive apps

• Verification bot – Dynamic code review by the interpreter

31

Google Confidential

Q&A • http://bit.ly/bIzjnF

32

A JIT Compiler for Android's Dalvik VM (PDF) dl

A description for this result is not available because of this site's robots.txt

2MB Sizes 1 Downloads 293 Views

Recommend Documents

Dalvik VM Internals
... have started: 20 MB. • multiple independent mutually-suspicious processes. • separate address spaces, separate memory. • large system library: 10 MB jar) ...

Dalvik VM Internals
Shared Constant Pool public interface Zapper { public String zap(String s, Object o);. } public class Blort implements Zapper { public String zap(String s, Object o) ...

Inter-Block Scoreboard Scheduling in a JIT Compiler for ...
JIT compilation of CLI media processing programs exposes more instruction-level parallelism .... used as a priority list in an operation scheduler. Theorem 1 ... Theorem 3 The inter-block scoreboard scheduling data-flow analysis converges in ...

Inter-Block Scoreboard Scheduling in a JIT Compiler for ...
Classic List Scheduling ... schedules including software pipelines (cyclic schedules). Euro-Par 2008 ... propagation reminiscent of forward data-flow analysis.

Inter-Block Scoreboard Scheduling in a JIT Compiler for ...
by prepass scheduling and by software pipelining, provided register allocation ..... When the scheduling regions are reduced to basic blocks, we call ..... Conference on High Performance Embedded Architectures and Compilers, 2008. 5.

VM-3, VM-5 AP-01361.pdf
Connect more apps... Try one of the apps below to open or edit this item. VM-3, VM-5 AP-01361.pdf. VM-3, VM-5 AP-01361.pdf. Open. Extract. Open with. Sign In.

Retargeting a C Compiler for a DSP Processor
Oct 5, 2004 - C source files produce an executable file that can execute on the DSP. The only .... The AGU performs all of the address storage and address calculations ... instruction can be seen here: Opcode Operands. XDB. YDB. MAC.

Jit epartmetit of (notation
Apr 29, 2016 - Public Elementary and Secondary Schools Heads. All Others .... FL'. • co 0 F". •. 000P0,0. • oo. 0 0 0 17. 5. O co ••4 aq cc g. 0 co Et ,ct. • co. 4.

JIT CHG Application.pdf - MOBILPASAR.COM
o Patients who have a Grade 3 or 4 GVHD o Patients with Grade 1 or 3 GVHD whose skin is irritated by the CHG treatment o Patients undergoing short duration radiation therapy will not receive CHG treatment on all days in which radiation therapy is adm

(vm 18) streaming____________________________________.pdf
Try one of the apps below to open or edit this item. a – kite (ita) (v.m. 18) streaming____________________________________.pdf. a – kite (ita) (v.m. 18) ...

SUPPLY MANAGEMENT FOR JIT NOTES 1.pdf
focused production, bottleneck. management, etc. were introduced on the. shop-floor (Shingo, 1981; Monden,. 1981a; Skinner, 1985; Goldratt, 1986),. and significant increases in productivity. and quality were achieved. It is important. to note that th

A User-Friendly Methodology for Automatic Exploration of Compiler ...
define the compiler-user interface using a methodology of ... Collection (GCC) C compiler [9]. .... For example, the Intel XScale processor has a special regis-.

1o/"A '1;JtT ;JIt"l
NOW,THEREFORE,inaccordancewithArticle55(1)of the Constitution, it is hereby proclaimed as follows: I. Short Title. This Proclamation may be cited as the ...

JIT Preop Checklist.pdf
Use your initials in the checkboxes, not a checkmark or “x”. What items on the check list may be overlooked? • Nickname: List the name that patient recognizes and responds to. ... patients must be sent via cart, regular bed or crib. Unacceptabl

Benchmarking the Compiler Vectorization for Multimedia Applications
efficient way to exploit the data parallelism hidden in ap- plications. ... The encoder also employs intra-frame analysis when cost effective. ... bigger set of data.

Concurrency-aware compiler optimizations for hardware description ...
semantics, we extend the data flow analysis framework to concurrent threads. .... duce two auxiliary concepts—Event Vector and Sensitivity Vector—in section 6, ...

ola dl - GitHub
o pleax movl 0 esi m ov l. 10 edx jmpreadarg, readarg readchmovb. 0 eax edi bl cmpl 0 ebx jest argimuledx ecx subl48 ebx a d dl ebx ecx sto reargmov.

compiler design__2.pdf
Page 1 of 11. COMPILER DEDIGN SET_2 SHAHEEN REZA. COMPILER DEDIGN SET_2. Examination 2010. a. Define CFG, Parse Tree. Ans: CFG: a context ...

compiler design_1.pdf
It uses the hierarchical structure determined by the. syntax-analysis phase to identify the operators and operands of. expressions and statements. Page 1 of 7 ...

DL Brochure.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. DL Brochure.pdf.

Learning RxJava - Ebook-dl
Did you know that Packt offers eBook versions of every book published, with PDF and. ePub files available? You can upgrade ...... Observable; import java.util.concurrent.TimeUnit; public class Launcher { public static void main(String[]args) {. Obser

Guest lecture for Compiler Construction, Spring 2015
references and (user-defined) exceptions. ✓ modules, signatures, abstract types. The CakeML language. = Standard ML without I/O or functors. The verified machine-code implementation: parsing, type inference, compilation, garbage collection, bignums

Manual volvo vm 270 pdf
Sign in. Page. 1. /. 20. Loading… Page 1 of 20. Page 1 of 20. Page 2 of 20. Page 2 of 20. Page 3 of 20. Page 3 of 20. Manual volvo vm 270 pdf. Manual volvo vm ...

Albiway-DL-16_7_23.pdf
Connect more apps... Try one of the apps below to open or edit this item. Albiway-DL-16_7_23.pdf. Albiway-DL-16_7_23.pdf. Open. Extract. Open with. Sign In.