Business Data Lake as a Platform for Big Data to Transform your Business

COLLECT, STORE, ANALYZE & USE TRADITIONAL AND EMERGING SOURCES Traditional

Wittaya Warunchaichna Big Data Business Manager, Southeast Asia

Emerging

Archive

Video

Public records

Social Networks, User Generated Content

Enterprise File Data

Machine Data

Internet Of Things

Location Data

© Copyright 2016 Dell Inc.

UNSTRUCTURED DATA GROWTH

TWO STORAGE WORLDS CONVERGE…

80%

BIG DATA

74% 67%

2013

2015

2017

37 EB

71 EB

133 EB

Total Capacity Shipped, Worldwide

© Copyright 2016 Dell Inc.

2

Enterprise @Scale

Enterprise IT

Unstructured Data

3

© Copyright 2016 Dell Inc.

4

Traditional Workloads

Emerging Workloads

Traditional Workloads

DAS

NAS

SAN

TAPE

5

INTEROPERABILITY FOR SCALE-OUT DATA LAKE

CLOUD

OBJECT

© Copyright 2016 Dell Inc. © Copyright 2014 EMC Corporation. All rights reserved

6

ADVANTAGES OF A SCALE-OUT DATA LAKE

FILE

FILE

© Copyright 2016 Dell Inc. © Copyright 2014 EMC Corporation. All rights reserved

Scale-Out Data Lake

SAN

OBJECT

© Copyright 2016 Dell Inc. © Copyright 2014 EMC Corporation. All rights reserved

DAS

NAS

CLOUD

TAPE

Emerging Workloads

• Eliminate inefficient islands of storage

• Provide data protection and security

• Simplify management and reduce costs

• Accelerate data analytics

• Enable better information sharing

7

© Copyright 2016 Dell Inc.

• Support data-driven decision making

8

CURRENT CHALLENGES WITH EXISTING EDWH 3

MODERNIZE THE DATA WAREHOUSE

Unable to leverage new data sources

1

Throw Data Away

2

© Copyright 2016 Dell Inc.

9

HOW DO YOU LOWER & CONTROL DWH COSTS Source Data APP

APP

APP

ELT

Data Warehouse

Waste capacity on low value workloads

© Copyright 2016 Dell Inc.

10

CURRENT CHALLENGES WITH EDWH

Business Intelligence

1. Hot versus cold data storage

APP

– 70% of data in a typical enterprise environment is unused

Transactions, OLTP, OLAP

2. Processing capacity Analytic Query & Reporting

Documents and Emails

– On average 55% of CPU capacity is low value ETL

3. New data sources STAGING

© Copyright 2016 Dell Inc.

– Traditional systems are unable to capture and use new data sources, such as unstructured or semi-structured data 11

© Copyright 2016 Dell Inc.

12

DATA ARCHITECTURE OPTIMIZATION WITH HADOOP

HOW DO YOU LOWER & CONTROL DWH COSTS Source Data APP

3

Leverage new data sources

1

Don’t throw data away

APP

APP

ETL

Data Discovery & Business Analytics

Data Warehouse

APP

Transactions, OLTP, OLAP

Analytic Query & Reporting

Documents and Emails

2

Reclaim Enterprise Data Warehouse for high value BI

Data Lake Social Media, Web Logs

STAGING Internet of Things © Copyright 2016 Dell Inc.

13

EDWH OPTIMIZATION WITH ISILON & HADOOP

© Copyright 2016 Dell Inc.

14

MODERN DATA LAKE ENVIRONMENT BI / DWH Environment

5-YEAR TCO Data Prep & Enrichment

$23K per TB

All Data Fed Into The Data Lake

Vs.

per TB

-

Production Predictable Load SLA Drive Heavily Governed Standard Tools

Active Archive

HADOOP DATA LAKE

ISILON SCALE OUT NAS STORAGE

$7K

© Copyright 2016 Dell Inc.

ETL

DWH MPP

Analytics / Sandbox Environment

Analytics Sandbox

-

Exploratory, Ad Hoc Unpredictable Load Experimentation Loosely Governed Best Tools

Hadoop as the Foundation of your Data Management and Analytics Architecture

15

© Copyright 2016 Dell Inc.

16

DELL EMC BIG DATA ECOSYSTEM DATA LAKE INFRASTRUCTURE APPLICATIONS

RDBMS

STATISTICAL MODELING/NLP

STREAM CEP

EXPLORATION

TRANSFORM

BI

DATA WAREHOUSE

CATALOG AND PROVISION DISCOVER/MAP

EMAIL

SEARCH/INDEX

OCTOBER 2016

ORGANIZE/TAG

SOCIAL MEDIA 3rd PARTY

DATA LAKE

MACHINE IOT

ENTERPRISE LOG ANALYSIS

HADOOP SQL

© Copyright 2016 Dell Inc.

17

SCALE-OUT DATA LAKE REQUIREMENTS • Support multiple workloads and applications – Traditional and emerging

• Multi-protocol support • Ability to scale capacity and performance

• Enterprise data protection

Storage for the Scale-Out Data Lake

• Meet security and compliance requirements

• Efficient and easy to manage to reduce costs • Powerful data analytics capabilities

19

20

EMC ISILON SCALE-OUT NAS ENVIRONMENT Protocols NFS

SMB

HTTP

FTP

ONEFS OPERATING SYSTEM

Clients and Applications

Single File System One Namespace

HDFS for Hadoop

REST for Object

Multi-Protocol

Client/Application Layer

High Performance

Gig-E 10 Gig-E Network

RESTful API GET PUT POST DELETE

Ethernet Layer

OneFS Operating Environment

A Nodes

Linear I/O Performance Isilon OneFS

Easy Growth

21

EMC Isilon – Scale-Out Performance Tunable Performance and Capacity

Linear Scalability

Unmatched Efficiency

Intra-cluster Communication

Simple and Powerful!

Isilon Flexibility at Scale

Simplicity & Ease of Use

22

ISILON PLATFORM PORTFOLIO

RAID Limitations at Scale

S-SERIES

X-SERIES

NL-SERIES

HD-SERIES

A-SERIES

High Transactional Platform

High Throughput Platform

Nearline Storage Platform

High Density Platform

Backup and Performance Accelerators

S210

X410

S200

X400

RAID Controller – I/O Threshold

Non-linear I/O Performance

A100 Performance Accelerator

NL410

HD400 A100 Backup Accelerator

Standard NAS Technology X200

Scalability

Scale-Out Performance, Capacity, Both tuneable

Scale-Up Capacity only, limited performance options

Performance

True linear predictability

Degradation of performance & capacity at scale

SSD and SAS (16 TB – 4.15 PB)*

SSD and SATA (24 TB –20.7 PB)*

SATA (108 TB – 30 PB)*

SATA (1 PB – 50 PB)*

Scale performance without capacity

*Scales from 3 node cluster to 144 node cluster 23

24

ISILON CLOUDPOOLS

Isilon’s Innovative OneFS Filesystem Architecture Matters 

Single file system across the entire cluster



Policy driven automated data management

CORE

CLOUD PROVIDER

High Performance HOT DATA



Optimize storage resources

Balanced Performance & Capacity

>30 days

Archive

FROZEN DATA

WARM DATA

1-2 Months ECS

1-2+ years

Deep Archive

25

ISILON CLOUDPOOLS CORE

EMC CONFIDENTIAL—INTERNAL USE ONLY

26

ISILON, SCALE-OUT NAS FOR BIG DATA CLOUD PROVIDER

SINGLE FILE SYSTEM, SINGLE VOLUME SIMPLICITY FOR ACTIVE, PERSISTENT, AND ARCHIVE DATA

SEAMLESS CLOUD INTEGRATION

Virtualized Servers Virtualized Servers

• Quota management • Thin provisioning

X-series

• High speed replication • Disaster recovery • Business continuance

S-series

WAN/LAN

Backup Accelerator • File immutability

Network

• Protection from deletion/change

Clients NL-series

APPS & USERS

• Load balancing • Seamless failover • Performance zones

ECS

Client/Application Layer

EMC CONFIDENTIAL—INTERNAL USE ONLY

27

• Automated storage tiering

NL-series • Instant recovery • Data protection

Primary & Nearline Storage

Local/Remote Archive

28

UNSTRUCTURED DATA

Archive/Compliance

File Shares and Home directories

Vmware/NFS

SQL/DB Dumps Social Media Feeds BLOBS Hadoop - Bigdata Cloud/Object

Unstructured Data/Content

DW – ETL Offload VDI Splunk/M2M Log Files Video/Surveillance Broadcast/Content Streaming Call Centre CVR Backup Mobile Apps

29

30

Dell EMC Cloudera Hadoop solution with Isilon Shared Storage – Modular Infrastructure

Dell EMC Hortonworks Hadoop solution with Isilon Shared Storage – Modular Infrastructure

Primary use case: High density Storage-centric consolidated Data Lake

Primary use case: High density Storage-centric consolidated Data Lake

31 of 98 12

Dell - Internal Use - Confidential

53

49

52

60

68

76

84

92

80

88

104

112

120

108

116

124

104

112

120

0

Stack ID

Stack ID

35

34

33

33

32

32

31

31

30

30

29

29

28

28

27

27

26

26

KVM

21

21

20

20

KVM

19

Differentiation

17

16

16

KVM

15

15

14

14

KVM

13

13

12

12

11

11

10

10

KVM

09

09

08

08

KVM

07

07

06

06

KVM

05

04

04

KVM

03

03

02

02

KVM

Industry Leading storage density and scaling Consolidates data silos with one copy of data Enterprise-grade File Management File-level regulatory compliance out-of-the-box Current Isilon customers: brings analytics to where the data exists in Isilon

01

32 of 98 12

Dell - Internal Use - Confidential

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

34

35

36

37

36

37

38

39

38

39

40

41

40

41

42

43

42

43

44

45

44

45

46

47

46

47

48

ACT

48

ACT

50

53

52

Pod Network 2x Dell EMC Networking S4048 10GbE Pod Switches 1x Dell EMC Networking S3048 10GbE Pod Switches Cluster Network 2x Dell EMC Networking S6000 40GbE Cluster Switches

42

41

53

52

54

40

3

5

7

9

11

13

15

17

19

21

2

4

6

8

10

12

14

16

18

20

22

1

LNK 23 ACT

2

Stack No. LNK

24 COMBO PORTS 23

38

ACT

24

25

SFP+

26

37

36

36

35

Stack ID

Stack ID

35

34

33

33

32

32

31

31

30

30

29

29

28

28

27

27

26

26

25

24

KVM

23

23

22

22

KVM

21

21

20

20

KVM

19

Infrastructure Nodes 4x PowerEdge FC630 with 3x 1.2TB HDD per Sled

Stack-ID

Stack-ID

54

51

50

39

1

38

37

19

18

• • • • •

2

2

40

39

34

18

KVM

17

1

1

41

25

23

01

Hortonworks Data Platform Ent+ Isilon OneFS

24

22

LNK

LNK

49

Scales from 100TB to 20 PB

25

23

05

Shared Storage Nodes 4x Isilon X410 with 102TB HDD/ 3.2TB SSD/ 256 GB 2x QDR Infiniband Switch 8 ports

Scale Storage independently from Compute Minimize data movement Eliminate Shadow IT projects Current Isilon customers: leverage existing File Management processes

24

KVM

Compute Nodes 6x PowerEdge FC630 with 8x 1.2TB HDD per Sled

• • • •

KVM

22

51

42

Solution benefits

124

44

72

124

36

64

116

28

56

8

88

80

72

64

56

48

40

108

20

48

4

12

40

96

4

92

84

76

68

60

52

44

36

28

20

12

35

100

8

32

36

24

36

16

37

116

38

26

108

SFP+

120

25

112

2

ACT

24

104

24 COMBO PORTS 23

124

1

LNK 23 ACT

116

22

108

21

20

120

19

18

112

17

16

104

15

14

4

13

12

92

11

10

84

9

8

76

7

6

68

5

4

60

40

3

2

52

42

41

44

Stack-ID

Stack-ID

39

1

Stack No.

19

Infrastructure Nodes 4x PowerEdge FC630 with 3x 1.2TB HDD per Sled

54

LNK

0

Cluster Network 2x Dell EMC Networking S6000 40GbE Cluster Switches

54

53

52

38

34

Pod Network 2x Dell EMC Networking S4048 10GbE Pod Switches 1x Dell EMC Networking S3048 10GbE Pod Switches

52

51

50

36

51

50

49

28

ACT

ACT

20

48

48

12

47

47

100

46

46

8

45

45

0

44

44

96

43

43

88

42

42

80

41

41

72

40

40

64

39

39

56

38

38

48

37

37

40

36

36

32

35

35

24

34

34

16

33

33

4

32

32

92

31

31

84

30

30

76

29

29

68

28

28

60

27

27

52

26

26

44

25

25

36

24

24

28

23

23

20

22

22

12

21

21

100

20

20

8

19

19

0

18

18

96

17

17

88

16

16

80

15

15

72

14

14

64

13

13

56

12

12

48

11

11

40

10

10

32

9

9

24

8

8

37

18

Industry Leading storage density and scaling Consolidates data silos with one copy of data Enterprise-grade File Management File-level regulatory compliance out-of-the-box Current Isilon customers: brings analytics to where the data exists in Isilon

7

7

40

25

• • • • •

6

6

39

24

Differentiation

5

5

100

Cloudera ™ Enterprise Isilon OneFS

4

4

96

Scales from 100TB to 20 PB

3

3

32

Scale Storage independently from Compute Minimize data movement Eliminate Shadow IT projects Current Isilon customers: leverage existing File Management processes

2

2

24

• • • •

1

1

16

Solution benefits

LNK

LNK

16

49

42

41

18

KVM

17

17

16

16

KVM

15

15

14

14

KVM

13

13

12

12

KVM

Compute Nodes 6x PowerEdge FC630 with 8x 1.2TB HDD per Sled

11

11

10

10

KVM

09

09

08

08

KVM

07

07

06

06

KVM

05

Shared Storage Nodes 4x Isilon X410 with 102TB HDD/ 3.2TB SSD/ 256 GB 2x QDR Infiniband Switch 8 ports

05

04

04

KVM

03

03

02

02

KVM

01

01

SO WHERE & HOW TO START YOUR JOURNEY ?

Thanks to our Sponsors Dell EMC Solutions are powered by Intel®

© Copyright 2016 Dell Inc.

33

© Copyright 2016 Dell Inc.

34

09. DellEMC Business Data Lake as a Platform for Big Data to ...

DellEMC Business Data Lake as a Platform for Big Data to Transform Your Business (12.6.17).pdf. 09. DellEMC Business Data Lake as a Platform for Big Data to ...

3MB Sizes 3 Downloads 163 Views

Recommend Documents

Processing Big Data with Azure Data Lake - GitHub
Processing Big Data with Azure Data Lake. Lab 3 – Using C# in U-SQL. Overview. U-SQL is designed to blend the declarative nature of SQL with the procedural ...

Processing Big Data with Azure Data Lake - GitHub
Processing Big Data with Azure Data Lake. Lab 4 – Monitoring U-SQL Execution. Overview. U-SQL jobs are executed in parallel. You can use the job graph, and ...

PDF-Download Data Lake Development with Big Data ...
Book synopsis. Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data ...