Development Hadoop Development, Administration Data Science Master the Data Analysis tools like Pig and hive
& BI-‐ 0 to 100
Build a recommendation engine
Hadoop Development -‐ 0 to 100
Basics
Hands On
Development
Learn the basics of Big Data and hadoop
Play with Hadoop and hadoop ecosystem
Become a top notch hadoop developer
Hadoop Development, Administration and BI Program-‐ 0 to 100 (60 Hours) Overview of the course: Hadoop Development, Administration and BI Program is a one-‐stop course that introduces you to the domain of hadoop development as well as gives you technical knowhow of the same. At the end of this course you will be able to earn a credential of hadoop professional and you will be capable of dealing with Terabyte scale of data and analyze it successfully using mapreduce
Who this course is for and not for? For: Typically professionals with basic knowledge of software development, programming languages, and databases will find this course really helpful. Basic knowledge should be enough to succeed at this course Not For: Students who are absolute beginners at software development as a discipline will find it difficult to follow the course
1 2 3
Phase 1: Hadoop Fundamentals (20 Hours) Getting the Basics Rights
Big Data
Hadoop Ecosystem
-‐ What is Big Data -‐ Dimensions of Big Data -‐ Big Data in Advertising -‐ Big Data in Banking -‐ Big Data in Telecom -‐ Big Data in eCommerce -‐ Big Data in Healthcare -‐ Big Data in Defense -‐ Processing options of Big Data -‐ Hadoop as an option
-‐ Sqoop -‐ Oozie -‐ Pig -‐ Hive -‐ Flume
Hadoop -‐ What is Hadoop -‐ How Hadoop Works -‐ HDFS -‐ Mapreduce -‐ How Hadoop has an edge
-‐ Running an Oozie workflow -‐ Analyzing twitter data using Flume
Multinode Setup
Hadoop Hands On -‐ Setting up Hadoop on a Single node cluster -‐ Running HDFS commands -‐ Running your Mapreduce program -‐ Running Sqoop Import and Sqoop Export -‐ Creating Hive tables directly from Sqoop -‐ Creating Hive tables -‐ Querying Hive tables
-‐ Setting up Multinode setup on Amazon ec2 -‐ Setting up multimode setup on the classroom machines -‐ Setting up Cloudera Manager on the cloud -‐ Setting up Cloudera Manager on local setup
Cluster Capacity Planning Level 1: Mini Project Level 1: Evaluation Test (50 marks)
1 2 3
Phase 2: Hadoop Development (16 hours) Become a Pro developer
Program
Advanced Mapreduce
-‐ Mapreduce Code Walkthrough -‐ ToolRunner -‐ MR Unit -‐ Distributed Cache -‐ Combiner -‐ Partitioner -‐ Setup and Cleanup methods -‐ Using Java API to access HDFS -‐ Map Side joins -‐ Reduce side joins -‐ Input Types in Mapreduce -‐ Output Types in Mapreduce -‐ Custom Input Data types -‐ Custom Output Data types -‐ Multiple reducer MR program Zero Reducer Mapper
Mapreduce Design Patterns Hands On:
Advanced Mapreduce Hands On -‐ MR Unit hands On -‐ Distributed Cache hands On -‐ Partitioner hands On -‐ Combiner hands On -‐ Accessing files using HDFS API hands on -‐ Map Side joins hands on -‐ Reduce side joins hands on Mapreduce Design
Patterns: -‐ Searching -‐ Sorting -‐ Filtering -‐ Inverted Index -‐ F-‐IDF -‐ Word Co-‐occurrence
-‐ Searching Hands On -‐ Sorting Hands On -‐ Filtering Hands On -‐ Inverted Index Hands On -‐ TF-‐IDF – Hands On -‐ Word Co-‐occurrence Hands On
Evaluation Test (50 marks)
Phase 3: Hadoop BI (16 hours) Analyze data using Pig and Hive
Pig -‐ Introduction -‐ Basic Data Analysis -‐ Complex Data Analysis -‐ Multi Data Set Analysis -‐ UDFs in Pig -‐ Troubleshooting and Optimizing Pig -‐ Pig Hands On
Hive -‐ Introduction -‐ Basic Data Analysis with Hive -‐ Hive Data Management -‐ Text Processing with Hive -‐ Transformations in Hive -‐ Optimizing Hive -‐ Hive Hands On
Data Analysis Using Pentaho as a ETL tool -‐ Introduction -‐ Setting up Pentaho -‐ Loading Data to HDFS -‐ Loading Data to Hive -‐ Aggregation through Mapreduce -‐ Transforming Data with Hive -‐ Transforming Data with Pig -‐ Loading data from HDFS to RDBMS -‐ Loading Data from hive to RDBMS -‐ Reporting on HDFS Data -‐ Reporting on Hive Data
Evaluation Test
Phase 4: Hadoop BI (8 hours) Master the Hadoop Administration
Scheduling in Hadoop
Note: 60 hours is bifurcated as 40 hrs of classroom training and 20 hrs of hands on assignments
-‐FIFO Scheduling -‐Fair Scheduling
Cluster Monitoring -‐ Basic Monitoring -‐ Log Management -‐ Using Ganglia for monitoring
Cluster Maintenance
-‐ Cluster Upgrades -‐ Failover Mechanism
Hands On 60 Mark Evaluation
Trainer Profile
Experienced
Certified
8+ yrs of Enterprise Software Dev Exp.
Hadoop, Hbase and MapR certified
Customers Analysis Served customers like Accenture, HP, Genpact, Mastek, and Cisco
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. pro hadoop pdf.
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. oreilly hadoop pdf. oreilly hadoop pdf. Open. Extract. Open with.
Hadoop terinspirasi dari publikasi makalah Google MapReduce dan Google File System. (GFS) oleh ilmuwan dari Google, Jeffrey Dean dan Sanjay Ghemawat ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. hadoop pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. hadoop guide ...
File: Pro hadoop pdf. Download now. Click here if your download doesn't start automatically. Page 1 of 1. pro hadoop pdf. pro hadoop pdf. Open. Extract.
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Hadoop For ...
Page 2 of 42. 1 | a m e e r p e t m a t e r i a l s . b l o g s p o t . i n. HADOOP. 1. Introduction to Big data. Big Data Characteristics. Huge data 10000 TB's of Data ...