Experiences Scaling Use of Google's Sawzall Jeffrey D. Oldham surname at company-name.com Google, Inc. 2011-03-13
Programming, not Theory Not focus on theory. No theorems. No models. No algorithms. Focus on users' programming of parallel systems. Users write code. Not system developers. Users write tests.
Summary Sawzall eases writing map reductions. Structured Sawzall scales. Parallel system API should separate fundamental model concepts. Ex: map reduction = map + reduce + record enumeration ease writing test code.
Outline Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall
Map Reduction
MapReduce: C++ Library
Outline Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall
Sawzall: Simpler Map Reductions
Sawzall Mental Model: One Record
Sample Program Compute the query number per latitude-longitude degree. Sawzall query-location.szl: proto "querylog.proto" queries_per_degree: table sum[lat: int][lon: int] of int; log_record: QueryLogProto = input; loc: Location = locationinfo(log_record.ip); emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;
Saw + Sawzall Use Used since 2003 by 100s of Googlers in 1000s of programs to compute a lot of data that is directly or indirectly externally facing.
Outline Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall
Scaling Programs Code ecosystems support sharing tested code. + Sawzall function libraries have tests. – Programs shared by copying. – Typically untested.
Sawzall Testing Model: Map Reduction
Structured Pgms: Separate Concepts
Sample Program Compute the query number per latitude-longitude degree. Sawzall query-location.szl: proto "querylog.proto" queries_per_degree: table sum[lat: int][lon: int] of int; log_record: QueryLogProto = input; loc: Location = locationinfo(log_record.ip); emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;
Structured Sample Program Compute the query number per latitude-longitude degree. Sawzall query-location.szl: proto "querylog.proto" map: function(log: QueryLogProto, reduce: function(int, int)) { loc: Location = locationinfo(log_record.ip);
Test Structured Programs Test map functions ... one record at a time ... using mocked reduce function. Advantages: No distributed I/O. Single processor only. Not test reduce functions or order enumeration.
Summary Sawzall eases writing map reductions. Structured Sawzall scales. Parallel system API should separate fundamental model concepts. Ex: map reduction = map + reduce + record enumeration ease writing test code.
Experiences Scaling Use of Google's Sawzall Jeffrey D. Oldham surname at company-name.com Google, Inc. 2011-03-13
References Sawzall Pike et al. Open-source implementation Wikipedia article MapReduce Dean and Ghemawat (2004, 2008) Wikipedia article
Example: 20+ billion web pages x 20KB = 400+ terabytes ... ~four months to read the web. ⢠~1,000 hard drives just to .... information (e.g. # of pages on host, important terms on host). â per-host ... 23. MapReduce status: MR_Indexer-beta6-large
include additional metadata about building the project [3],. [4]. BUILD files are for the most part manually maintained, and this lack of automation can be a ...
fast algorithm that fits on a laptop, at least at annotation time. ... ever previously reported (10 million training examples ...... IEEE Computer Society, 2008.
Fiber optic technologies play critical roles in datacenter operations. ... optical cables, such as Light Peak Modules [3], will soon ... fabric for future low-latency and energy proportional .... The Quantum dot (QD) laser provides an alternative.
collaborative filtering on data from sites such as Amazon or. NetFlix, the ... network, and computing pairs of similar queries among the 5 ...... Degree distribution of the Orkut social network. 100. 1000. 10000. 100000. 1e+006. 1e+007. 1. 10. 100.
ebrates these efforts and emphasizes the development of polished experiences that ..... Conclusion. AIIDE is a meeting ground between entertainment software.
infrastructure, allowing new network services and bug fixes to be rapidly and safely .... as shown in figure 1, realizing the benefits of SDN in that network without ...
41, NO. 1 www.usenix.org. BeyondCorp. Design to Deployment at Google ... internal networks and external networks to be completely untrusted, and ... the Trust Inferer, Device Inventory Service, Access Control Engine, Access Policy, Gate-.
coding and parallel processing friendly data partitioning; section 8 .... 4. REFERENCE FRAMES. VP8 uses three types of reference frames for inter prediction: ...
Feb 27, 2013 - and delete memory allocation API requiring matching calls. This situation is further ... process to find memory leaks in Section 3. In this section we ... bile devices, such as Chromebooks or mobile tablets, which typically have less .
translation system for these language pairs, although online dictionaries exist. ..... http://www.unesco.org/culture/ich/index.php?pg=00206. Haifeng Wang, Hua ...
on the first page. To copy otherwise, to republish, to post on servers or to redistribute ..... quite pleasant to use as a library without dedicated syntax. Nevertheless ...
On-call/pager response is critical to the immediate health of the service, and ... Resolving each on-call incident takes between minutes ..... The conference has.
Although most state-of-the-art approaches to speech recognition are based on the use of. HMMs and .... Figure 1.1 Illustration of the notion of margin. additional ...
A. Blum and J. Hartline. Near-Optimal Online Auctions. ... Sponsored search auctions via machine learning. ... Envy-Free Auction for Digital Goods. In Proc. of 4th ...
Dec 6, 2014 - Rather, one should assume that an internal network is as fraught with danger as .... service-level authorization to enterprise applications on a.
tion rates, including website popularity (top web- .... Several of the Internet's most popular web- sites .... can't capture search, e-mail, or social media when they ..... 10%. N/A. Table 2: HTTPS support among each set of websites, February 2017.
May 12, 2015 - Origin of the Pipeline Design Pattern. Initial Effect of Big Data on the Simple Pipeline Pattern. Challenges to the Periodic Pipeline Pattern.
We define an algorithm optimizing a convex surrogate of the ... as search engines or recommendation systems, since most users of these systems browse or ...