Grab some coffee and enjoy the pre-show banter before the top of the hour!

Synthesis A New Webinar Series

ANALYST:

Eric Kavanagh

THE LINE UP

CEO InsideAnalysis ANALYST:

Wayne Eckerson Founder, Business Analytics Eckerson Group ANALYST:

Dave Wells Director, Data Management Practice Eckerson Group

GUEST:

Joy King VP Vertica Product Management Micro Focus

Agenda

• Context – Eric Kavanagh • Conceptual Perspective – Wayne Eckerson –Discussion

• Logical Perspective – Dave Wells –Discussion

• Sponsor Perspective – Joy King

Synthesis Series

–Discussion © Eckerson Group 2017

Twitter: @weckerson

www.eckerson.com

Synthesis Series A joint production of

What is a Modern Data Warehouse? April 18, 2018

© Eckerson Group 2017

Twitter: @weckerson

www.eckerson.com

2018 Schedule

Synthesis Series Q2: Modern Data Warehousing – March 21st - ”Is a traditional DW Dead?” – April 18th – “What is a Modern DW?” – May 16th – “Is a Single Version of Truth Possible?” – June 13th – “Can Data Warehouses and Data Lakes Coexist?”

• Q3: Is AI the New BI? • Q4:Modern Data Pipeline © Eckerson Group 2017

Twitter: @weckerson

www.eckerson.com

Ten Characteristics

1. 2. 3. 4. 5.

Customer-Centric Adaptable Automated Elastic Flexible

6. Collaborative 7. Governed 8. Simple 9. Real-time 10.Secure

BONUS: Resilient From Wayne Eckerson, “Ten Characteristics of a Modern Data Architecture” Jan 25, 2018 © Eckerson Group 2017

Twitter: @weckerson

www.eckerson.com

Discussion

© Eckerson Group 2017

Twitter: @weckerson

www.eckerson.com

Let’s Solve the Right Problem Shift attention from modern data warehouse to modern data architecture. Adapt to realities of modern data management • • • •

big data, unstructured data, and data lakes scale out and elastic infrastructure agility and self-service advanced analytics and machine learning

Retain the benefits of data warehousing • • • •

Apply architecture to build things that • • • • •

are well suited to their purpose fit gracefully into the environment comply with codes and regulations are sustainable through needed lifespan are aesthetically pleasing

integrated and disparity-free data subject oriented and semantically aligned non-volatile record of business history time-variant data for time-series analysis

© Eckerson Group 2017

www.eckerson.com

Modern Data Warehousing We don’t need a data warehouse, but we do need data warehousing! Data lake and data warehousing must work together.

Publishing matters. Not everyone wants self-service. Legacy data warehouses exist and must be accommodated. The one-size-fits-all, right-for-everyone architecture doesn’t exist.

© Eckerson Group 2017

www.eckerson.com

Legacy Data Warehouse Challenges Growth Management Workload Fluctuation Data Center Management Data Center & Operations Costs Processing Bottlenecks & Delays Projects Wait for Infrastructure

The data warehouse is not dead.

It is alive, but not alive and well.

Business Critical with Risks

Security & Governance Challenged Complex Database Management © Eckerson Group 2017

www.eckerson.com

Data Warehousing in the Cloud SCALABILITY

growth in data, processing & users

ELASTICITY

adapt to workload peaks and valleys

MANAGED INFRASTRUCTURE

reduce or eliminate data center overhead

COST SAVINGS

cut cost of hardware, staffing, power, etc.

PROCESSING SPEED

fast data pipelines, no bottlenecks

DEPLOYMENT SPEED

agility, no waiting for infrastructure

DISASTER RECOVERY

virtualization benefits: copy, backup, etc.

SECURITY AND GOVERNANCE

service provider features + VPC advantages

RDBMS IN THE CLOUD

gracefully accepts existing warehouse data

© Eckerson Group 2017

www.eckerson.com

Rethinking Data Management Topology Sources

Data Core Enterprise Data Hub Data Lake

Web Open Commercial

Applications

NoSQL

Legacy Data Warehouses

Hive

Social Media Machine / IoT Geospatial Legacy OLTP

Operational Data Store

Master Data Repository

Analytic Sandboxes

External

© Eckerson Group 2017

www.eckerson.com

Automation Prescription Prediction Forecasting Discovery Exploration Dashboards Scorecards OLAP Reporting

Data Warehousing and the Data Lake Data Warehousing Outside the Data Lake

Transaction Web

3rd

Party

Social Media Machine Geospatial © Eckerson Group 2017

Data Lake landing area for incoming data raw data, refined data & sandboxes security, sensitivity, and semantic tagging classified by trust level (gold, silver, bronze)

ETL/ELT

Data Refinement

Data Warehouse relational, subject-oriented, historical bus or hub-and-spoke architecture integrated, cleansed, aggregated includes master reference & metrics data

Applications

Data Access & Data Preparation with security & governance controls

Legacy

Data Ingestion ETL, ELT, Bulk Load, Stream Processing

Sources

www.eckerson.com

Reporting OLAP Scorecards Dashboards Exploration Analytics

Data Warehousing and the Data Lake Data Warehousing Inside the Data Lake Data Lake

Transaction Web 3rd Party Social Media

Machine

Data Ingestion ETL, ELT, Bulk Load, Stream Processing

Legacy

Geospatial © Eckerson Group 2017

Raw Data Zone ingest & lightly tag Refined Data Zone curate, improve & fully tag Analytic Sandboxes explore & discover Data Warehouse integrate & aggregate

Applications

Data Access & Data Preparation with security & governance controls

Sources

www.eckerson.com

Reporting OLAP

Scorecards Dashboards Exploration Analytics

Building for the Future It’s not a data warehouse problem. It’s an architecture problem. We don’t need a data warehouse, but we do need data warehousing!

Migrating to the cloud can help, but it doesn’t fix everything. Rethinking topology for data lakes and warehousing is essential. It’s time for new architecture and it must be ADAPTABLE architecture.

© Eckerson Group 2017

www.eckerson.com

Discussion

© Eckerson Group 2017

Twitter: @weckerson

www.eckerson.com

The Traditional Data Warehouse Tied to Hardware • Software – Hardware sold as an appliance • Requires capital investment and “forklifting”

Traditional Data Warehouse

Built for On-premises only • Limited deployment options • Bolt-on cloud functionality

Storage / Compute Coupled • Scaling storage means scaling compute • Over-provisioning resources for most demanding case scenarios

Standard SQL • ANSI SQL compliant • Limited advanced analytics or machine learning

Data Stored in a Silo • Requires moving data into the data warehouse for analytics • Limited functionality to query data in place

What is a Modern Data Warehouse? Analyze Data in the right place • Advanced compression for native data storage • Run queries against S3 and HDFS data lakes

Freedom from Underlying Infrastructure

Modern Data Warehouse

• Software-only deployment model • Available on all major clouds, on premise and natively on Hadoop

Separation of Compute and Storage • Scaling storage means scaling compute • Rapid and flexible provisioning of resources for most demanding scenarios

Advanced Analytics and Machine Learning • In-database machine learning functions • No down sampling or data movement • R, Python, C++ and Java extensibility

Analyze data in the right place Modern Data Warehouse

 Data Lakes are cost effective ways to store large volumes of data efficiently but performing analytics on this data requires moving it in and out of a traditional data warehouse

 As Data Lake volumes grow, more and more historical data goes underutilized by the organization, creating a balancing act between cost-effective data storage and performant analytics How a Modern Data Warehouse Addresses this:  Analyze ORC and Parquet data stored on Hadoop or Amazon S3 using External Tables  Create JOINs between the data warehouse and data lake to explore all of your data with 100% of SQL queries and advanced analytics

S3

Freedom from Underlying Infrastructure 

One core analytics engine available onpremises, on Hadoop and in the Clouds



Utilizes industry-standard hardware



Available on all major public clouds (AWS, Azure and Google Cloud Platform) and tight integration with other cloud services (BI, ETL, S3, etc.)



Seamlessly replicate data warehouse between on-premises and cloud or across different cloud providers



Simple and rapid deployment via cloud marketplaces

On-Premises

Separation of Compute and Storage Underutilized Compute

 As more companies move to the cloud, separating compute and storage becomes a critical component needed to capitalize on cloud economics

 Tightly coupled compute and storage forces companies to overprovision for worst-case scenarios

With Separation of Compute and Storage:  Simply provision cloud infrastructure for standard workloads, and add more compute power when required  No more procuring, configuring, and administering costly resources for worst-case scenarios. Data platform costs are tied directly to business value and variable workloads

Dynamic Compute

Advanced Analytics and Machine Learning Statistical Summary

Machine Learning

Outer Detection

Time Series Speed

Sessionize Pattern Matching

ANSI SQL

Date/ Time Algebra Scalability Massively Parallel Processing

Modern Data Warehouse

Window/ Partition Date Type Handling

Normalization

And More…

Business Understanding

Data Analysis & Understanding

SQL

Random Forests Logistic Regression

Imbalanced Data Processing

Linear Regression

Sampling

Ridge Regression Naive Bayes

Missing Value Imputation

ROC Tables Error Rate

In-Database Scoring

Lift Table Confusion Matrix

Speed

Scale

Cross Validation

R-Squared

And More…

And More…

MSE

Security

Data Preparation

Modeling

Evaluation

Deployment

Sequences Deploy Anywhere

Model-level Stats

SVM

SQL

SQL

SQL

SQL

Vertica is the Industry’s Only Infrastructure Agnostic, Unified Advanced Analytics Platform for All Your Data

Analyze in the Right Place

Advanced Analytics and in-Database Machine Learning

Freedom from Underlying Infrastructure

Strong Reliable Performance At Exabyte Scale

Learn More: www.vertica.com Try it Free: www.vertica.com/try Joy King VP, Product Management and Product Marketing [email protected]

Discussion

© Eckerson Group 2017

Twitter: @weckerson

www.eckerson.com

Synthesis - What is a Modern DW - Full Presentation_v2.pdf

InsideAnalysis. ANALYST: Wayne Eckerson. Founder, Business Analytics. Eckerson Group. GUEST: Joy King. VP Vertica Product Management. Micro Focus.

2MB Sizes 0 Downloads 111 Views

Recommend Documents

pdf-1441\tourism-a-modern-synthesis-3rd-edition.pdf
pdf-1441\tourism-a-modern-synthesis-3rd-edition.pdf. pdf-1441\tourism-a-modern-synthesis-3rd-edition.pdf. Open. Extract. Open with. Sign In. Main menu.

pdf-1469\little-sister-collection-dw-thinks-big-dw-all-wet-dw ...
... apps below to open or edit this item. pdf-1469\little-sister-collection-dw-thinks-big-dw-all-w ... ust-a-rainy-day-just-a-nap-the-new-potty-when-i-grow.pdf.

What is a P-value?
May 25, 1998 - p-value of 0.002 favoring group A arises very infrequently when the only differences between ... whether taking aspirin daily can reduce the risk of heart attack needs to take account of the facts that, .... the time, the split will be

What is a value?
Only when you carefully consider alternatives and consequences and then make a choice is value reflected in that decision. 4. When you value something, it has a positive quality for you. If your decision not to cheat is something you feel good about,

What is a Circle Process.pdf
... a problem loading this page. Retrying... Whoops! There was a problem loading this page. Retrying... What is a Circle Process.pdf. What is a Circle Process.pdf.

LESSON 2: WHAT IS A WATERSHED?
2. Present the PowerPoint Lesson. Have students define vocabulary words while watching. ... http://www.watershedactivities.com/projects/spring/scleanup.html.

what is a pdf portfolio
Page 1 of 1. File: What is a pdf portfolio. Download now. Click here if your download doesn't start automatically. Page 1 of 1. what is a pdf portfolio. what is a pdf ...

What is capacity4dev.eu? capacity4dev.eu is a ... -
capacity4dev.eu is a growing online community of over 7000 development and cooperation professionals, from the European Commission and EEAS, EU member states, partner countries, civil society and the research community. In the three years since its l

What is a model?
it is used in many contexts, but the typology offered should help make sense of ... Conceptual models basically convey meaning and can be pieced together to ...

In Response to: What Is a Grid?
Correspondence: Ken Hall, BearingPoint, South Terraces Building, ... the dynamic creation of multiple virtual organizations and ... Int J. Supercomp App. 2001 ...

In Response to: What Is a Grid?
uted and cost-effective way to boost computational power to ... Correspondence: Ken Hall, BearingPoint, South Terraces Building, .... Int J. Supercomp App. 2001 ...

What is Bitcoin? What is Cryptocurrency? Why ... Accounts
Virtual Currency and Taxation Part I. Amy Wall, Tucson Tax Team. ○ Silk Road was an online black market (aka darknet market) founded in February 2011 by the “Dread Pirate Roberts” (later found to be Ross Ulbricht). ○ Silk Road sold illegal su

What is Strategy?
Laptop computers, mobile communica- tions, the Internet, and software such .... ten escort customers through the store, answering questions and helping them ...

What is NetBeans? - GitHub
A comprehensive, modular IDE. – Ready to use out of the box. – Support for latest Java specifications. & standards. – Other languages too. (PHP, C/C++, etc). – Intuitive workflow. – Debugger, Profiler,. Refactoring, etc. – Binaries & ZIPs

What Is Real?
Page 3 .... lapping lines of thought make it clear that the core units of quan- tum field theory do not behave like billiard .... Second, let us suppose you had a particle localized in your ... they suer from their own diculties, and I stick to the s

What is Strategy?
assembling final products, and training employees. Cost is ... proaches are developed and as new inputs become ..... in automotive lubricants and does not offer other ...... competitive advantage in Competitive Advantage (New York: The Free.

What is NAS.pdf
Sign in. Loading… Page 1. Whoops! There was a problem loading more pages. Retrying... What is NAS.pdf. What is NAS.pdf. Open. Extract. Open with. Sign In.

1.What is
C.R.M.Hurd. D.E.W.Burgess. Ans:A. 73.The concept 'Umland'means: ... Viticulture meant for: A.Lemon cultivation. B.Apple cultivation. C.Orange cultivation.

What is Virtualization? - Ashraf Aboulnaga
Database Replication. • Replication of front-end already possible. – through dynamic server provisioning e.g., IBM's. Tivoli, WebSphereXD, [Benn05], [Urga05], [Kar06]. • Database tier typically not replicated. Replication with Oracle RAC. • N

What is STEAM.pdf
Page 1 of 1. Connect ~ Engage ~ Inspire. OUR VISION. Our goal in FUSD is to provide quality programming that fosters each child's social and cognitive.

What is Geothermal Energy? - physicsinfo
However, this is not necessar- ily the result of geothermal energy but is more often stored solar energy from the sun (Ground source heat is explained in brief on ...

What Is AWS Icebreaker? - GitHub
physical devices from smart phone apps. The following diagram illustrates a high-level view of the Icebreaker service: You can interact with Icebreaker in a ...