Prashanth Babu V V @P7h

https://twitter.com/P7h https://github.com/P7h http://P7h.org http://about.me/Prashanth

Prerequisites for Workshop Laptop with any OS JDK v7.x installed Maven v3.0.5+ installed IDE [either Eclipse with m2eclipse plugin or IntelliJ IDEA] Created Twitter app for retrieving tweets Cloned or downloaded Storm Projects from my GitHub Account:  

https://github.com/P7h/StormWordCount https://github.com/P7h/StormTweetsWordCount

Agenda Big Data Batch vs. Real-time processing Intro to Storm

Companies using Storm Storm Dependencies Storm Concepts Anatomy of Storm Cluster

Live coding a use case using Storm Topology Storm vs. Hadoop

Data overload every second

Batch vs. Real-time Processing

Batch processing  Gathering of data and processing as a group at one time.

Real-time processing  Processing of data that takes place as the information is being entered.

Event Processing Simple Event Processing  Acting on a single event, filter in the ESP

Event Stream Processing  Looking across multiple events

Complex Event Processing  Looking across multiple events from multiple event streams

Storm Created by Nathan Marz @ BackType  Analyze tweets, links, users on Twitter

Open sourced on 19th September, 2011  Eclipse Public License 1.0  Storm v0.5.2  16k Java and 7k Clojure LOC

Latest Updates  Current stable release v0.8.2 released on 11th January, 2013  Major core improvements planned for v0.9.0  Storm will be an Apache Project [soon..]

Storm Open source distributed real-time computation system

Hadoop of real-time Fast Scalable

Fault-tolerant Guarantees data will be processed Programming language agnostic Easy to set up and operate

Excellent documentation

Polyglotism (language agnostic) – Clojure, Java, Python, Ruby, PHP, Perl, … and yes, even JavaScript

https://github.com/nathanmarz/storm-starter/blob/master/multilang/resources/splitsentence.py

https://github.com/nathanmarz/storm-starter/blob/master/multilang/resources/splitsentence.rb

Use cases Real-time analytics

Stream processing Online machine learning Continuous computation Distributed RPC Extract, Transform and Load (ETL)

http://tweitgeist.colinsurprenant.com/

Companies using Storm

https://github.com/nathanmarz/storm/wiki/Powered-By

enables the convergence of Big Data and low-latency processing. Empowers stream / micro-batch processing of user events, content feeds and application logs.

https://github.com/P7h/storm-camel-example

Storm under the hood Clojure  a dialect of the Lisp programming language runs on the JVM, CLR, and JavaScript engines

Apache Thrift  Cross language bridge, RPC; Framework to build services

ØMQ  Asynchronous message transport layer

Jetty  Embedded web server

Storm under the hood Apache ZooKeeper  Distributed system, used to store metadata

LMAX Disruptor  High performance queue shared by threads

Kryo  Serialization framework

Misc.  SLF4J, Python, Java 5+, JZMQ, JODA, Guava

Tuples Main data structure in Storm. An ordered list of objects.  (“user”, “Prashanth”, “Babu”, “Engineer”, “Bangalore“) Key-value pairs – keys are strings, values can be of any type. Tuple

Streams Unbounded sequence of tuples. Edges in the topology. Defined with a schema. Tuple Tuple

Tuple Tuple Tuple

Spouts Source of streams. Spouts are like sources in a graph. Examples are API Calls, log files, event data, queues, Kestrel, AMQP, JMS, Kafka, etc.

BaseRichSpout

Bolts Process input streams and [might] produce new streams. Can do anything i.e. filtering, streaming joins, aggregations, read from / write to databases, APIs, run arbitrary functions, etc. All sinks in the topology are bolts but not all bolts are sinks.

Tuple

Tuple

Tuple

Bolts

Topology Network of spouts and bolts. Can be visualized like a graph. Container for application logic. Analogous to a MapReduce job. But runs forever.

Sample Topology https://github.com/P7h/StormWordCount

[Sentence] RandomSentenceSpout

DBBolt / JMSBolt

[Word, Count]

SplitSentenceBolt

WordCountBolt

[Sentence] ………….. RandomSentenceSpout

SplitSentenceBolt

More such bolts

Stream Groupings Each Spout or Bolt might be running n instances in parallel [tasks]. Groupings are used to decide which task in the subscribing bolt, the tuple is sent to. Grouping Shuffle Fields All Global None Direct Local or Shuffle

Feature Random grouping Grouped by value such that equal value results in same task Replicates to all tasks Makes all tuples go to one task Makes Bolt run in the same thread as the Bolt / Spout it subscribes to Producer (task that emits) controls which Consumer will receive If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks

Storm Cluster UI Supervisor#1

ZooKeeper#1 Supervisor#2

NIMBUS

Workers

Workers

ZooKeeper#2 Supervisor#3

ZooKeeper#n

Supervisor#n

Workers

Workers

Storm Cluster Nimbus daemon is the master of this cluster.  Manages topologies.  Comparable to Hadoop JobTracker. Supervisor daemon spawns workers.  Comparable to Hadoop TaskTracker. Workers are spawned by supervisors.  One per port defined in storm.yaml configuration.

Storm Cluster

[contd..]

Task is run as a thread in workers.

Zookeeper is a distributed system, used to store metadata. UI is a webapp which gives diagnostics on the cluster and topologies. Nimbus and Supervisor daemons are fail-fast and stateless.  State is stored in Zookeeper.

Storm – Modes of operation Local mode  Develop, test and debug topologies on your local machine.  Maven is used to include Storm as a dev dependency for the project. mvn clean compile package && java -jar target/storm-wordcount-1.0-SNAPSHOT-jarwith-dependencies.jar

Storm – Modes of operation

[contd..]

Remote [or Production] mode  Topologies are submitted for execution on a cluster of machines.  Cluster information is added in storm.yaml file.  More details on storm.yaml file can be found here: https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster#fill-in-mandatory-configurationsinto-stormyaml storm jar target/storm-wordcount-1.0-SNAPSHOT.jar org.p7h.storm.offline.wordcount.topology.WordCountTopology WordCount

Storm UI – Cluster Summary

Storm UI – Topology Summary

Storm UI – Component Summary

Code Sample – Topology

Code Sample – Spout

Code Sample – Bolt#1

Code Sample – Bolt#2

Problem#1 – WordCount [if there are internet issues] https://github.com/P7h/StormWordCount

Create a Spout which feeds random sentences [you can define your own set of random sentences]. Create a Bolt which receives sentences from the Spout and then splits them into words and forwards them to next bolt. Create another Bolt to count the words.

Problem#2 – Top5 retweeted tweets [if internet works fine] https://github.com/P7h/StormTopRetweets

Create a Spout which gets data from Twitter [please use Twitter4J and OAUTH Credentials to get tweets using Streaming API].  For simplicity consider only tweets which are in English.  Emit only the stuff which we are interested, i.e. A tweet’s getRetweetedStatus().

Create another Bolt to count the count the retweets of a particular tweet.  Make an in-memory Map with retweet screen name and the counter of the retweet as the value.  Log the counter every few seconds / minutes [should be configurable].

Storm

vs.

Hadoop

Real-time processing Topologies run forever No SPOF Stateless nodes

Batch processing Jobs run to completion [Pre-YARN] NameNode is SPOF Stateful nodes

Scalable Gurantees no dataloss Open source

Scalable Guarantees no data loss Open source

Hadoop AND Storm Blended Blended View view

now

t Hadoop works great back here

Storm works here

Hadoop AND Storm at Yahoo

Personalization based on User Interests

Convergence of batch and low-latency processing

Advanced Topics [not covered in this session] Distributed RPC Transactional topologies Trident Unit testing Patterns

References This Slide deck [on slideshare] – http://j.mp/5thEleStorm_SS This Slide deck [on speakerdeck] – http://j.mp/5thEleStorm_SD My GitHub Account for code repos – https://github.com/P7h Bit.ly Bundle for Storm curated by me – http://j.mp/YrDgcs

Prashanth Babu V V Follow me on Twitter: @P7h

Prashanth Babu VV @P7h - GitHub

Sep 19, 2011 - Laptop with any OS. JDK v7.x installed. Maven v3.0.5+ installed. IDE [either Eclipse with m2eclipse plugin or IntelliJ IDEA]. Created ... Page 9 ...

3MB Sizes 13 Downloads 217 Views

Recommend Documents

vv msk uitn.sponsoravond.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. vv msk uitn.sponsoravond.pdf. vv msk uitn.sponsoravond.pdf. Open. Extract. Open with. Sign In. Main menu.

Aadhaar - ITR - Prashanth Sugathan.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Aadhaar - ITR ...

vv.43-45a 2. Know God's
Mar 26, 2017 - questions to help guide you in responding to Jesus' message from ... have you learned about yourself from the sermon and reading of the text?

VV-November-web-2012.pdf
NO communication--no Skype, no. email, not even snail mail. Mom almost. died of typhoid before Dad could get. home. Dad's unit was caught behind. Chinese ...

VV-November-web-2012.pdf
event will warm your heart and relax your body. Don't miss. this opportunity to de-stress during the hectic holiday season. Dec 13 - Dirty Santa Gift Exchange ...

vv.43-45a 2. Know God's
Mar 26, 2017 - Ask God to soften their hearts and allow their spiritual eyes to be opened to His grace and forgiveness. Ask God to reveal His unconditional love to them through you. 2. Read 1 John 4:7, Luke 6:27-32, Romans 12:14-21, Proverbs 25:21-22

vv.43-45a 2. Know God's
Mar 26, 2017 - At Home Study Guide. For the week March 26, 2017. Love for Enemies • Matthew 5:43-48. Quick Review. Continuing in our Red Letters sermon ...

VV Disposal Pickup Areas
M e s q u ite. H e ig h ts. R o a d. Falcon Ridge Parkway. Pioneer Boulevard. Hardy Way. H o rizo n. B o u le v a rd. East Mesquite Boulevard. East Old Mill Road. Hafen Lane. R iv e rs id e. R o a d. West Mesquite Boulevard. Hafen Lane. F a lc o n. R

vv. 17-20 2. what they say
Feb 26, 2017 - day reading His Word and seeking His face through prayer. 3. During ... Live in Authenfic Relafionship: acknowledge your disappointments; go.

vv. 38-39 2. Give them the
Mar 19, 2017 - questions to help guide you in responding to Jesus' message from ... have you learned about yourself from the sermon and reading of the text?

vv. 38-39 2. Give them the
Mar 19, 2017 - 3. Ask God to bring to mind a person who is acting unfairly toward you. Instead of begrudgingly doing what they ask, offer them mercy by going a second mile even when it's not required. 4. Offer grace this week to a person by giving th

vv. 17-20 2. what they say
Feb 26, 2017 - At Home Study Guide. For the week February 26, 2017. “1st Degree Murder” • Matthew 5:17-26. Quick Review. This week we continue our Red Letters sermon series from Matthew 5:17-26. This is Jesus' longest recorded sermon in the New

VV-Mar-2012-Website.pdf
Casino Royale Donations. Volunteer of the Month. 10. Father Daughter Dance. Lost & Found. 10-15. Advertising Support. Thank you to all of our. advertisers and.

Srinivasan, Prashanth - 2006 - Preferential routes of bird dispersal to ...
Page 1 of 6. 114 Indian Birds Vol. 2 No. 5 (September–October 2006). Preferential routes of bird dispersal to the Western Ghats in India: An explanation for the avifaunal peculiarities of the Biligirirangan Hills. Umesh Srinivasan & Prashanth N.S..

R Dayananda Babu - Clinical Surgery Pearls.pdf
R Dayananda Babu - Clinical Surgery Pearls.pdf. R Dayananda Babu - Clinical Surgery Pearls.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying R ...

Digital Signal Processing by Ramesh Babu and C Durai.pdf ...
Page 3 of 4. Digital Signal Processing by Ramesh Babu and C Durai.pdf. Digital Signal Processing by Ramesh Babu and C Durai.pdf. Open. Extract. Open with.

Digital Signal Processing by Ramesh Babu and C Durai.pdf ...
Page 3 of 76. Digital Signal Processing by Ramesh Babu and C Durai.pdf. Digital Signal Processing by Ramesh Babu and C Durai.pdf. Open. Extract. Open with.

ASP-.net-notes-by-MAHESH-Babu-industry-expert-pdf-tutorial.pdf
Retrying... ASP-.net-notes-by-MAHESH-Babu-industry-expert-pdf-tutorial.pdf. ASP-.net-notes-by-MAHESH-Babu-industry-expert-pdf-tutorial.pdf. Open. Extract.

R Dayananda Babu - Clinical Surgery Pearls.pdf
Page 3 of 665. Clinical Surgery Pearls. R Dayananda Babu MS MNAMS FAES. Professor of Surgery. Pushpagiri Medical College. Tiruvalla, Kerala. India.

Thong bao vv chi co tuc 2016.pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. Thong bao vv chi co tuc 2016.pdf. Thong bao vv chi co tuc 2016.pdf. Open.