Evolution from Apache Hadoop to the Enterprise Data Hub Dr. Amr Awadallah (Twitter: @awadallah) Co-Founder & CTO of Cloudera SMDB 2014

1

©2014 Cloudera, Inc. All rights reserved.

Why is Big Data Happening Now?

2

©2014 Cloudera, Inc. All rights reserved.

It Isn’t Just About Web 2.0 / Social AUTOMOTIVE Auto sensors reporting location, problems

COMMUNICATIONS Location-based advertising

CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, customer service

FINANCIAL SERVICES Risk & portfolio analysis New products

EDUCATION & RESEARCH Experiment sensor analysis

HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis

LIFE SCIENCES Clinical trials Genomics

MEDIA / ENTERTAINMENT Viewers / advertising effectiveness

ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization

HEALTH CARE Patient sensors, monitoring, EHRs Quality of care

OIL & GAS Drilling exploration sensor analysis

RETAIL Consumer sentiment Optimized marketing

TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment

UTILITIES Smart Meter analysis for network capacity

LAW ENFORCEMENT & DEFENSE Threat analysis - social media monitoring, photo analysis

©2014 Cloudera, Inc. All Rights Reserved.

10TB to 10PB

IT’S ALL (BIG) DATA

4

©2014 Cloudera, Inc. All rights reserved.

Apache Hadoop: Storage and Compute on One Platform The Hadoop Way

The Traditional Way Compute (RDBMS, EDW)

Data Storage (SAN, NAS)

Compute (CPU)

Storage (Disk)

z z

Network

5

Memory

Expensive, Special purpose, “Reliable” Servers Expensive Licensed Software • Hard to scale • Network is a bottleneck • Only handles relational data • Difficult to add new fields & data types

Commodity “Unreliable” Servers Hybrid Open Source Software • Scales out forever • No bottlenecks • Easy to ingest any data • Agile data access

Expensive & Unattainable

Affordable & Attainable

$30,000+ per TB

$300-$1,000 per TB ©2014 Cloudera, Inc. All rights reserved.

Expanding Data Requires A New Approach

What we do

What we should do

Copy Data to Applications Comput e

Comput e

Dat a

Data

Data Comput e

6

Bring Applications to Data

Comput e

Process-centric businesses use:

Data

• Structured data mainly • Internal data only • “Important” data only • Multiple copies of data

Comput e

Comput e

Data

©2014 Cloudera, Inc. All rights reserved.

Information-centric businesses use all Data: Multi-structured, Internal & external data of all types

A Typical Journey of Hadoop Adoption Transformative Applications (New Business Value)

Operational Efficiency (Faster, Bigger, Cheaper)

Cheap Storage

ETL Acceleration

EDW Optimization

Agile Exploration

Converged Analytics

Business

IT 7

Data Science

©2014 Cloudera, Inc. All rights reserved.

The Typical Enterprise Data Analytics Stack Business Intelligence / Applications RDBMS ETL Processing Staging / Storage Collection 8

©2014 Cloudera, Inc. All rights reserved.

Step 1: EDH for Storage/Staging/Active Archive Business Intelligence / Applications RDBMS ETL Processing EDH for Storage Active Archive Collection 9

©2014 Cloudera, Inc. All rights reserved.

Step 2: EDH for Data Collection (Flume/Sqoop) Business Intelligence / Applications RDBMS ETL Processing

EDH for Collection & Storage.

10

©2014 Cloudera, Inc. All rights reserved.

Step 3: EDH for ETL Processing Acceleration Business Intelligence / Applications RDBMS

EDH for Collection, Storage & ETL Processing Acceleration.

11

©2014 Cloudera, Inc. All rights reserved.

ETL / Data Integration Tools

Step 4: EDH for EDW Optimization (Impala) Business Intelligence / Applications RDBMS

Rarely Used Data

EDH for Collection, Storage, ETL Acceleration & Historical RDBMS Data/Queries

12

©2014 Cloudera, Inc. All rights reserved.

Step 5: EDH for Agile Exploration BI / Applications

Agile Exploration

RDBMS

EDH for Collection, Storage, ETL Acceleration, Historical Queries, & Agile Exploration

13

©2014 Cloudera, Inc. All rights reserved.

Step 6: EDH for Data Science (Not Only SQL) BI / Applications

Agile Exploration

Data Science

RDBMS

EDH for Collection, Storage, ETL Acceleration, Historical Queries, Exploration & Data Science

14

©2014 Cloudera, Inc. All rights reserved.

Step 7: Converged Analytics - Apps Come to Data BI

Explore

Data Science

SAS, R, Spark

Informatica SyncSort, Pentaho

Hunk ...

RDBMS EDH for Collection, Storage, ETL Acceleration, Historical Queries, Exploration, Data Science & Mulitple Applications/Workloads

15

©2014 Cloudera, Inc. All rights reserved.

The Traditional Way: Bringing Data to Compute 4

Complex Architecture

3

Cost of Analytics

2

Time to Data

1

Missing Data

16

• Many special-purpose systems • Moving data around • No complete views

• Existing systems strained • No agility • “BI backlog” EDWS

MARTS

SERVERS

DOCUMENTS

STORAGE

SEARCH

ARCHIVE

• Up-front modeling • Transforms slow • Transforms lose data

• Leaving data behind • Risk and compliance • High cost of storage

ERP, CRM, RDBMS, MACHINES

FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS

©2014 Cloudera, Inc. All rights reserved.

EXTERNAL DATA SOURCES

The New Way: Bringing Compute to Data 4

Diverse Analytic Platform • Bring applications to data • Combine different workloads on common data (i.e. SQL + Search) • True analytic agility

3

Self-Service Exploratory BI

2

Persistent Staging

1

Active Compliance Archive

17

3 2

• Simple search + BI tools • “Schema on read” agility • Reduce BI user backlog requests SERVERS

MARTS

EDWS

DOCUMENTS

STORAGE SEARCH

ARCHIVE

1

• One source of data for all analytics • Persist state of transformed data • Significantly faster & cheaper

• Full fidelity original data • Indefinite time, any source • Lowest cost storage

4

ERP, CRM, RDBMS, MACHINES

FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS

©2014 Cloudera, Inc. All rights reserved.

ESTERNAL DATA SOURCES

Evolution of The Enterprise Data Hub

18

SEARCH ENGINE

MACHINE LEARNING

STREAM PROCESSING

MAPREDUCE

IMPALA

SOLR

SPARK

SPARK STREAMING

✖ ✔ ✖ ✔

✖ ✔

WORKLOAD MANAGEMENT

YARN

3RD PARTY APPS

CLOUDERA MANAGER

Secure and Governed

ANALYTIC SQL

SYSTEM MANAGEMENT

Open Architecture

BATCH PROCESSING

CLOUDERA NAVIGATOR

Managed



CLOUDERA’S ENTERPRISE DATA HUB DATA MANAGEMENT

Open Source Scalable Flexible Cost-Effective

STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT,, SECURE SENTRY

FILESYSTEM

ONLINE NOSQL

HDFS

HBASE

©2014 Cloudera, Inc. All rights reserved.

The Modern Information Architecture Data Architects

System Operators

Engineers

Data Scientists

Analysts

Business Users

META DATA / ETL TOOLS

CLOUDERA MANAGER

CONVERGED ANALYTICS

DATA MODELING

BI / ANALYTICS

ENTERPRISE REPORTING

ENTERPRISE DATA WAREHOUSE

ENTERPRISE DATA HUB

SYS LOGS

WEB LOGS

FILES

ONLINE SERVING SYSTEM

RDBMS

WEB/MOBILE APPLICATION Customers & End Users

19

©2014 Cloudera, Inc. All rights reserved.

The Power of the EDH is? EDH

RDBMS

20

©2014 Cloudera, Inc. All rights reserved.

Enabling The App Store of Big Data BI and Analytics Partners SI, Cloud, MSP Partners

Database Partners Resellers Data Integration Partners Hardware Partners

21

©2014 Cloudera, Inc. All rights reserved.

Thank You! Twitter: @awadallah 22

©2014 Cloudera, Inc. All rights reserved.

Amr - BIG2014 Keynote.pdf

effectiveness. TRAVEL &. TRANSPORTATION. Sensor analysis for optimal. traffic flows. Customer sentiment. COMMUNICATIONS. Location-based. advertising.

3MB Sizes 1 Downloads 157 Views

Recommend Documents

BIG2014-CM.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. BIG2014-CM.pdf.

BIG2014-CM.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. BIG2014-CM.pdf.

AMR Liability Release 2016.pdf
its agents and assigns, the photographs, film, video tapes, electronic representations and sound recordings (hereinafter “Media”. inclusively) made of them ...

amr-bootcamp17-flyer.pdf
Page 1. amr-bootcamp17-flyer.pdf. amr-bootcamp17-flyer.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying amr-bootcamp17-flyer.pdf. Page 1 of 1.

AMR Liability Release 2016.pdf
its agents and assigns, the photographs, film, video tapes, electronic representations and sound recordings (hereinafter “Media”. inclusively) made of them during their participation in the AMR. The Participant, and where applicable the Custodial

AMR Housing & Pre-reg FAQ.pdf
flotilla times (if participating in those events). Please have several options in mind, as the slots. available fill up quickly. Once your ship is registered, please ...

AMR Boarding Guide 2015 FINAL.pdf
Hornet to the California Maritime Academy. Page 3 of 49. AMR Boarding Guide 2015 FINAL.pdf. AMR Boarding Guide 2015 FINAL.pdf. Open. Extract. Open with.

AMR-WB+: A NEW AUDIO CODING STANDARD FOR 3RD ...
Novel coding techniques leading to the outstanding AMR- ... Audio coding for mobile applications has to cope with .... windows improves the coding gain. On the ...

tarekh-almshrq-alarby-15161922-amr-ar_ptiff.pdf
Page. 1. /. 10. Loading… Page 1 of 10. Page 1 of 10. Page 2 of 10. Page 2 of 10. Page 3 of 10. Page 3 of 10. Main menu. Displaying tarekh-almshrq-alarby-15161922-amr-ar_ptiff.pdf. Page 1 of 10.

AMR-Building-Local-Coalitions-1-26-12.pdf
Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. AMR-Building-Local-Coalitions-1-26-12.pdf. AMR-Build

/w5H w?96 11 7!.AMR 5026, 0;.89/ /.|/O 0 OH 8 m
Aug 29, 1980 - Foreign Application Priority Data crawlers, left and right, a main ... vators, transportation vehicles, portable conveyors, and the like, must always ...

Application of an AMR Strategy to an Abstract Bubble Vibration Model
namics system by means of an Adaptive Mesh Refinement algorithm in order to handle ... thanks to a hierarchical grid structure whereas we use the Local Defect ..... data at time n, intermediate calculated values, and required data at time n + 1. ...