Map/Reduce Обзор решений Алексей Злобин [email protected]

Sample job: driver public static void main(String[] a) throws Exception { Configuration conf = new Configuration(); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(a[0])); FileOutputFormat.setOutputPath(job, new Path(a[1])); job.waitForCompletion(true); }

Sample job: mapper class M extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable k, Text v, Context ctx) { String line = v.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); ctx.write(word, one); } } }

Sample job: reducer class R extends Reducer { public void reduce(Text k, Iterable v, Context ctx) { int sum = 0; for (IntWritable val : v) sum += val.get(); context.write(k, new IntWritable(sum)); } }

Pig snippet raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, time, qry); clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(qry); clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.ToLower(qr) as query;

Hive snippet CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15'); SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a;

Spark: example val counts = lines.flatMap(line => line.split(“ “)) .map(word => (word, 1)) .reduceByKey(_ + _)

Shark example CREATE TABLE src(key INT, value STRING); LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src; SELECT COUNT(1) FROM src; CREATE TABLE src_cached AS SELECT * FROM SRC; SELECT COUNT(1) FROM src_cached;

Disco example def fun_map(line, params): for word in line.split(): yield word, 1 def fun_reduce(iter, params): for word, counts in kvgroup(sorted(iter)): yield word, sum(counts)

Disco driver job = Job().run( input=["http://discoproject.org/media/text/chekhov. txt"], map=map, reduce=reduce) for word, count in result_iterator(job.wait(show=True)): print(word, count)

References I ● ● ●





“MapReduce: Simplified Data Processing on Large Clusters” Dean, Jeffrey and Ghemawat, Sanjay “A Comparison of Join Algorithms for Log Processing in MapReduce” S. Blanas, J. Patel, V. Ercegovac, J. Rao, E. Shekita, Y. Tian “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing” Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica “Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters” Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, Ion Stoica “Shark: Fast Data Analysis Using Coarse-grained Distributed Memory” Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Haoyuan Li, Scott Shenker, Ion Stoica

References II ● ● ●

Disco Technical Overview http://disco.readthedocs.org/en/latest/overview.html Disco Distributed Filesystem http://disco.readthedocs.org/en/latest/howto/ddfs.html An efficient, immutable, persistent mapping object http://discodb.readthedocs. org/en/latest/

Map/Reduce - Computer Science Center

Apr 27, 2014 - Page 1 ... GENERATE user, time, org.apache.pig.tutorial. ... Disco Technical Overview http://disco.readthedocs.org/en/latest/overview.html.

93KB Sizes 0 Downloads 264 Views

Recommend Documents

Map/Reduce - Computer Science Center
Apr 27, 2014 - Sample job: mapper class M extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable k, Text v, Context ctx) {. String line = v.toString();. StringTokenizer

Map/Reduce - Computer Science Center
Apr 27, 2014 - public static void main(String[] a) throws Exception { ... “MapReduce: Simplified Data Processing on Large Clusters” Dean, Jeffrey and ... “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster.

Community Center for Computer Science and Library Monte Coca.pdf
Page 1 of 60. Bohol Profile. Bohol. Basic Facts. Geographic Location Bohol is nestled securely at the heart of the Central. Visayas Region, between southeast of Cebu and southwest. of Leyte. Located centrally in the Philippine Archipelago, specifical

SIGMETRICS Tutorial: MapReduce
Jun 19, 2009 - A programming model for large-scale distributed data ..... Could be hard to debug in .... Reading from local disk is much faster and cheaper.

Cloud MapReduce: a MapReduce Implementation on ...
a large-scale system design and implementation if we build on top of it. Unfortunately .... The theorem states that, of the three properties of shared-data systems ...

Cloud MapReduce: a MapReduce Implementation on ...
The theorem states that, of the three properties of shared-data systems – data ...... then copies over the results to the hard disks on the destination node when ...

Computer Training Center Schedule.pdf
Computer Training Center Schedule.pdf. Computer Training Center Schedule.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Computer Training ...

MapReduce/Bigtable for Distributed Optimization
our global model parameters, we use a distributed data-store known as Bigtable ... allows backup workers to duplicate effort without producing erroneous output.

The Future of Computer Science - Cornell Computer Science
(Cornell University, Ithaca NY 14853, USA). Abstract ... Where should I go to college? ... search engine will provide a list of automobiles ranked according to the preferences, .... Rather, members of a community, such as a computer science.

Computer Science E-259 Lectures - Computer Science E-259: XML ...
Sep 17, 2007 - most important new technology development of the last two years." Michael Vizard ... applications: what are the tools and technologies necessary to put ... XML. When. ▫ The World Wide Web Consortium (W3C) formed an XML.

Computer Science E-259
Jan 7, 2008 - Yahoo! UI Library http://developer.yahoo.com/yui/ ..... how to program in JavaScript and PHP, how to configure. Apache and MySQL, how to ...

Computer Science E-259
Nov 19, 2007 - labeling the information content of diverse data sources .... .... ELEMENT article (url, headline_text, source, media_type, cluster,.

TEXTS IN COMPUTER SCIENCE
Java — Designed as a language to support mobile programs, Java has special .... We offer a few low-level coding hints that are helpful in building quality programs. ...... cheap in selecting your table size or else you will pay the price later.

Computer Science E-259
Oct 1, 2007 - DOCTYPE students SYSTEM "student.dtd">.

Computer Science E-259
Nov 29, 2007 - these foundations, the course will explore in detail a number of case studies that utilize XML in e-business: e-commerce, web personalization, ...

Science of Spirituality Meditation Center -
12 Steps Program for Spirituality. Science of ... Don Hoes, a Social Science graduate, is a Businessman, a Mystic Poet, and an Author. He has ... and Canada.

Computer Science E-259
Oct 1, 2007 - By Definition. ▫ The result of parsing a document with a DOM parser is a. DOM tree that matches the structure of that document. ▫ After parsing is ...

COMPUTER SCIENCE - Pune University
Poona College of Arts, Science and Commerce, Pune 411 001. 7. 001. 070 ... Sinhagad Technical Education Society's B.C.S. College, Pune 411 041.( 878-.

Computer Science E-259
Dec 3, 2007 - Redefines simple and complex types, groups, and attribute groups from an external schema redefine. Describes the format of non-XML data ...

BS Computer Science - GCUF
Nov 1, 2015 - GOVERNMENT COLLEGE UNIVERSITY, FAISALABAD. 2nd MERIT LIST OF BS Computer Science (EVENING). FOR FALL, 2015-2016.

Computer Science E-259
Nov 19, 2007 - ELEMENT article (url, headline_text, source, media_type, cluster, tagline, document_url ... http://www.oasis-open.org/specs/index.php#dbv4.1.

Computer Science E-259
Oct 22, 2007 - Computer Science E-259. XML with Java. Lecture 5: ... XPath 1.0. ▫ Location Paths. ▫ Data Types ... Data Types. ▫ boolean. ▫ number. ▫ string.