NVM Heaps for Accelerating Browser-based Applications

Viewer
Transcript

NVM Heaps for Accelerating Browser-based Applications Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan, Center for Experimental Research in Computer Systems, Georgia Tech Sanjay Kumar, Intel Labs

Motivation ●

Browsers have become an indispensable computing platform

●

Used in devices ranging from mobile, to tablets, to PCs

●

●

●

Rich browser-based client apps. with growing processing capabilities ● e.g., Google native client (NaCl), Intel parallel JavaScript Increasing web application support to access system resources ● WebGL support to access GPUs ● HTML5 I/O, Native Client pepper I/O for storage access Increasing local data access/storage needs ● Simple key value store, ● JavaScript (JS) based SQLite interface, ● Synchronous and asynchronous POSIX I/O for large blobs

Motivation ●

●

●

●

Recent studies on poor end-client storage performance blame ● Flash storage performance variation across devices ● Poor random write performance of Flash (Kim et al. FAST' 12) ● Replacing Flash with 100x faster NVM (PCM) should speedup apps? Question: Problem solved by replacing Flash with NVM? Answer: No! Reason: Multiple levels of indirection impact storage performance ● Specifically: sandboxing overheads for browser I/O Contributions: ● Use NVM as persistent heap to reduce sandboxing cost in browser ●

Develop appropriate OS and User level library data structures

Sandboxing ●

●

●

Isolates applications from code and data of other applications, needed for untested and untrusted code Different methods of sandboxing ● Rule-based, virtual machine emulation (Android), static profiling Native Client (NaCl) ● Sandboxing technology ●

●

Allows running native code from a web browser

Sandboxing methods in Native Client (NaCl) ●

●

Inner sandbox - binary validation with static analysis - restricts unsafe instructions Outer sandbox - system calls intercepted by trusted region

Sandboxing in NaCl – Multiple Levels of Indirections Untrusted Components HTML & JavaScript WriteBuff(bytes)

SRPC/IMC

NaCl (.nexe) app

Utility libs

fwrite(fd, bytes)

stack/context switch

write(secure_fd, bytes)

OS

Trusted service runtime

User-Kernel switch Expensive system call – Stack switch + User-Kernel switch time

=> Frequent system resource access will affect performance

NaCl Sandboxing I/O impact

B ro w s e r I/O v s . N a tiv e I/O 1 40 00

T im e (m ic ro s e c )

1 20 00 1 00 00 8000

N a tiv e B ro w se r W rite c h u n k s iz e -5 1 2 b y te s

6000 4000 2000 0

B y te s w ritte n

Proposed Solution ●

Key goal is to reduce multiple levels of indirection

●

Expose NVM as persistent heap rather than block storage

●

●

Applications access heap with byte addressable interface avoiding frequent user-kernel and stack switching Rely on NVM hardware page protection by enforcing what untrusted browser applications can access

NVM as a heap – Reducing Multiple levels of Indirections Untrusted Components HTML & JavaScript

SRPC/IMC

WriteBuff(bytes)

NaCl (.nexe) app

Utility libs

nvmalloc(bytes, id)

OS User-Kernel switch

Trusted service runtime

stack/context switch

Programming Model /*NVM persistent allocation*/ Image**imgdb = nvmalloc(“img_root“,size); for each new image: Image *imgdb[cnt]= nvmalloc(size, NULL); cnt++; …… /* persistent read, implicit load of all child ptrs*/ img = nvread (“img_root“,&size);

NVM as a Heap – High Level Design Chrome browser (Native Client) NVM user lib sys_nvmmap() Kernel layer

Mem. Mgr DRAM NVM Shared LLC

DRAM Node

Mem. Bus

NVM Node Persistent Region

Non Persistent Region

Design - OS Support for NVM Heap ●

NVM as a special `node’ in a heterogeneous memory system

●

Custom Linux-based NVM manager to control page allocation

●

Maintains per process persistent page tree (metadata)

●

Page tree loaded during application/restart

●

Persistent pages accessed when application experiences faults

●

Exports nvmmap system calls for higher layers

●

Every nvmmap call results in creation of compartments

●

Compartments are similar to VMA structure and provide isolation among threads

Design - OS Support for NVM Heap ●

Every compartment contains a RB page tree

●

Application hints if compartments of threads can be merged

●

Provides isolation for browser threads (e.g., main browser and ad threads, inspired from Firefox user allocation) Process 1

Compartment 1

Process 2

Process 3

Compartment2

Pages Uses process id, compartment id and fault address to identify the page RB tree 1 bit for each NVM page flag and 1 bit flush flag

Design - User level Support for NVM Heap ●

Transitions between NaCl trusted-untrusted component expensive

●

Solution: NVM allocator split across two components

●

Untrusted allocator component:

●

Provides byte-addressable heap interfaces to applications ● nvmalloc(), nvfree(), nvcommit()

●

Manages untrusted application's persistent memory state

●

NVM heap reference obtained from trusted component

●

Untrusted component restricted from direct OS system calls ● e.g., allocator cannot call nvmmap() directly

Design - User level Support for NVM Heap Accessible NVM Permission address range

Guard

10000 Untrusted Components Browser NaCl app. nvmalloc()

20000 Guard

0x10000 - 0x20000

Read/Write

0x20000 - 0x25000

Read

Untrusted trusted context switch User libraries NVM allocator

nacl_mmap()

sys_mmap Browser thread/app specific compartments (memory VMA’s)

mapped NVM range to app. access table Trusted Component User-kernel switch NVM Kernel Manager

Def. DRAM Manager

Design - User level Support for NVM Heap ●

Trusted allocator component

●

Provides indirect access to system level NVM interfaces

●

Maintains per application NVM access region table

●

Table contains address range with different protection levels

●

Access tables are persistent and identified using unique keys

●

Same unique keys supplied by application across restarts

●

Handles 'out-of-bound' memory access protection faults

●

After every map/unmap operation, address region in the access region tables are updated

Experimental Goals ●

Is storage device primarily responsible for slow browser I/O ?

●

Impact of storage interfaces on a sandboxed environment?

●

Benefits of treating NVM as a non-volatile heap as opposed to a block storage device? Methodology:

●

Dual-core 1.66 GHz D510 Atom-based development kit

●

2GB DDR2 DRAM, Intel 520 120GB SSD, 1 MB L2 Cache

●

Pin-based binary instrumentation for NVM load, store analysis

●

Hardware counters for NVM access misses (in the paper)

●

Currently, we use MACSim based simulation

Experimental Workloads 1. WebShootbench - Open source NaCl benchmark from Google ●

●

Derived from the Computer Language Benchmarks Game For Storage analysis, we use ● Fasta (FS) – generates random DNA sequences ● Revcomp (RC) – reverse-complement of DNA sequence ● kNucleotide (KN) – generates hashtable from DNA sequence ● Spell Check (SC) – wordnet dictionary (16 MB*4 dictionaries)

2. Snappy Compression ● High performance compression/decompression library ●

Preference over speed than compression size

●

Ported to NaCl in ~2 hours, uses 500 MB of browser cache

Experimental Workloads 3. User Personalization: Email Classifier ●

●

●

●

Bayesian-based email classifier with learning data CMU text learning group dataset for user personalization ● Contains 10 news-group email categories like sports, economics, movies, etc. ● We randomly choose 100 emails as input ● Learning data generated from prior classifications Extracts feature points from new emails,loads training data and compares the input feature points and training data set Evaluation abbreviations: NV – NVM, RD – RamDisk

Benchmark Analysis – Storage Device Impact Benchmark

I/O time (%)

Fasta

●

●

41.2

Revcomp

49.33

kNucleotide

12.32

SpellCheck

19.89

Reducing I/O calls in applications can reduce sandboxing overheads substantially Benefits due to fast storage alone is relatively less (RD vs. SSD)

Application – Snappy Access Interface Evaluation

~2.5 x reduction compared to RamDisk

Evaluation – Snappy User-Kernel Transitions

User-Kernel transition for mmap vs. block I/O ●

●

●

When using mmap, every file to be compressed, needs to mapped, compressed and unmapped mmap is a system call, every map/munmap call results in user-kernel switch POSIX block I/O's are library calls, not all calls cause user-kernel transition

Evaluation – Snappy Stack Switching Overhead

Why is RD Block slower than RD Block ? ●

●

RD Block has lesser user-kernel transition but higher stack switching overhead Stack switching is an expensive operation in Sandboxed codes.

P a g e L o a d T im e (m s )

Email Classifier – Impact on Web page Load Time 1 1 1 1

6 4 2 0 8 6 4 2

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

R D

2

N V M

4 8 12 # . o f E m a il C a te g o r ie s

16

Summary ●

●

In sandboxed environments like end-client browsers, ● Impact of software I/O overheads >> Hardware storage cost Using NVM as a heap shows ● Upto 2.5x improvements in browser storage performance ● Reduces sandboxing impact without compromising security

l

●

Gains are consistent across most browser workloads

Future Work ●

Studying additional applications ● E.g., Games accessing graphical as well as user data

●

Using NVM on browser components like database and cache

●

Addressing sandboxing overheads in Android

Question/ Comments?

Thank You