Grab some coffee and enjoy the pre-show banter before the top of the hour!
Synthesis A New Webinar Series
ANALYST:
Eric Kavanagh
THE LINE UP
CEO InsideAnalysis ANALYST:
Wayne Eckerson Founder, Business Analytics Eckerson Group ANALYST:
Dave Wells Director, Data Management Practice Eckerson Group
GUEST:
Joy King VP Vertica Product Management Micro Focus
Agenda
• Context – Eric Kavanagh • Conceptual Perspective – Wayne Eckerson –Discussion
• Logical Perspective – Dave Wells –Discussion
• Sponsor Perspective – Joy King
Synthesis Series
–Discussion © Eckerson Group 2017
Twitter: @weckerson
www.eckerson.com
Synthesis Series A joint production of
What is a Modern Data Warehouse? April 18, 2018
© Eckerson Group 2017
Twitter: @weckerson
www.eckerson.com
2018 Schedule
Synthesis Series Q2: Modern Data Warehousing – March 21st - ”Is a traditional DW Dead?” – April 18th – “What is a Modern DW?” – May 16th – “Is a Single Version of Truth Possible?” – June 13th – “Can Data Warehouses and Data Lakes Coexist?”
• Q3: Is AI the New BI? • Q4:Modern Data Pipeline © Eckerson Group 2017
Twitter: @weckerson
www.eckerson.com
Ten Characteristics
1. 2. 3. 4. 5.
Customer-Centric Adaptable Automated Elastic Flexible
6. Collaborative 7. Governed 8. Simple 9. Real-time 10.Secure
BONUS: Resilient From Wayne Eckerson, “Ten Characteristics of a Modern Data Architecture” Jan 25, 2018 © Eckerson Group 2017
Twitter: @weckerson
www.eckerson.com
Discussion
© Eckerson Group 2017
Twitter: @weckerson
www.eckerson.com
Let’s Solve the Right Problem Shift attention from modern data warehouse to modern data architecture. Adapt to realities of modern data management • • • •
big data, unstructured data, and data lakes scale out and elastic infrastructure agility and self-service advanced analytics and machine learning
Retain the benefits of data warehousing • • • •
Apply architecture to build things that • • • • •
are well suited to their purpose fit gracefully into the environment comply with codes and regulations are sustainable through needed lifespan are aesthetically pleasing
integrated and disparity-free data subject oriented and semantically aligned non-volatile record of business history time-variant data for time-series analysis
© Eckerson Group 2017
www.eckerson.com
Modern Data Warehousing We don’t need a data warehouse, but we do need data warehousing! Data lake and data warehousing must work together.
Publishing matters. Not everyone wants self-service. Legacy data warehouses exist and must be accommodated. The one-size-fits-all, right-for-everyone architecture doesn’t exist.
© Eckerson Group 2017
www.eckerson.com
Legacy Data Warehouse Challenges Growth Management Workload Fluctuation Data Center Management Data Center & Operations Costs Processing Bottlenecks & Delays Projects Wait for Infrastructure
The data warehouse is not dead.
It is alive, but not alive and well.
Business Critical with Risks
Security & Governance Challenged Complex Database Management © Eckerson Group 2017
www.eckerson.com
Data Warehousing in the Cloud SCALABILITY
growth in data, processing & users
ELASTICITY
adapt to workload peaks and valleys
MANAGED INFRASTRUCTURE
reduce or eliminate data center overhead
COST SAVINGS
cut cost of hardware, staffing, power, etc.
PROCESSING SPEED
fast data pipelines, no bottlenecks
DEPLOYMENT SPEED
agility, no waiting for infrastructure
DISASTER RECOVERY
virtualization benefits: copy, backup, etc.
SECURITY AND GOVERNANCE
service provider features + VPC advantages
RDBMS IN THE CLOUD
gracefully accepts existing warehouse data
© Eckerson Group 2017
www.eckerson.com
Rethinking Data Management Topology Sources
Data Core Enterprise Data Hub Data Lake
Web Open Commercial
Applications
NoSQL
Legacy Data Warehouses
Hive
Social Media Machine / IoT Geospatial Legacy OLTP
Operational Data Store
Master Data Repository
Analytic Sandboxes
External
© Eckerson Group 2017
www.eckerson.com
Automation Prescription Prediction Forecasting Discovery Exploration Dashboards Scorecards OLAP Reporting
Data Warehousing and the Data Lake Data Warehousing Outside the Data Lake
Transaction Web
3rd
Party
Social Media Machine Geospatial © Eckerson Group 2017
Data Lake landing area for incoming data raw data, refined data & sandboxes security, sensitivity, and semantic tagging classified by trust level (gold, silver, bronze)
ETL/ELT
Data Refinement
Data Warehouse relational, subject-oriented, historical bus or hub-and-spoke architecture integrated, cleansed, aggregated includes master reference & metrics data
Applications
Data Access & Data Preparation with security & governance controls
Legacy
Data Ingestion ETL, ELT, Bulk Load, Stream Processing
Sources
www.eckerson.com
Reporting OLAP Scorecards Dashboards Exploration Analytics
Data Warehousing and the Data Lake Data Warehousing Inside the Data Lake Data Lake
Transaction Web 3rd Party Social Media
Machine
Data Ingestion ETL, ELT, Bulk Load, Stream Processing
Legacy
Geospatial © Eckerson Group 2017
Raw Data Zone ingest & lightly tag Refined Data Zone curate, improve & fully tag Analytic Sandboxes explore & discover Data Warehouse integrate & aggregate
Applications
Data Access & Data Preparation with security & governance controls
Sources
www.eckerson.com
Reporting OLAP
Scorecards Dashboards Exploration Analytics
Building for the Future It’s not a data warehouse problem. It’s an architecture problem. We don’t need a data warehouse, but we do need data warehousing!
Migrating to the cloud can help, but it doesn’t fix everything. Rethinking topology for data lakes and warehousing is essential. It’s time for new architecture and it must be ADAPTABLE architecture.
© Eckerson Group 2017
www.eckerson.com
Discussion
© Eckerson Group 2017
Twitter: @weckerson
www.eckerson.com
The Traditional Data Warehouse Tied to Hardware • Software – Hardware sold as an appliance • Requires capital investment and “forklifting”
Traditional Data Warehouse
Built for On-premises only • Limited deployment options • Bolt-on cloud functionality
Storage / Compute Coupled • Scaling storage means scaling compute • Over-provisioning resources for most demanding case scenarios
Standard SQL • ANSI SQL compliant • Limited advanced analytics or machine learning
Data Stored in a Silo • Requires moving data into the data warehouse for analytics • Limited functionality to query data in place
What is a Modern Data Warehouse? Analyze Data in the right place • Advanced compression for native data storage • Run queries against S3 and HDFS data lakes
Freedom from Underlying Infrastructure
Modern Data Warehouse
• Software-only deployment model • Available on all major clouds, on premise and natively on Hadoop
Separation of Compute and Storage • Scaling storage means scaling compute • Rapid and flexible provisioning of resources for most demanding scenarios
Advanced Analytics and Machine Learning • In-database machine learning functions • No down sampling or data movement • R, Python, C++ and Java extensibility
Analyze data in the right place Modern Data Warehouse
Data Lakes are cost effective ways to store large volumes of data efficiently but performing analytics on this data requires moving it in and out of a traditional data warehouse
As Data Lake volumes grow, more and more historical data goes underutilized by the organization, creating a balancing act between cost-effective data storage and performant analytics How a Modern Data Warehouse Addresses this: Analyze ORC and Parquet data stored on Hadoop or Amazon S3 using External Tables Create JOINs between the data warehouse and data lake to explore all of your data with 100% of SQL queries and advanced analytics
S3
Freedom from Underlying Infrastructure
One core analytics engine available onpremises, on Hadoop and in the Clouds
Utilizes industry-standard hardware
Available on all major public clouds (AWS, Azure and Google Cloud Platform) and tight integration with other cloud services (BI, ETL, S3, etc.)
Seamlessly replicate data warehouse between on-premises and cloud or across different cloud providers
Simple and rapid deployment via cloud marketplaces
On-Premises
Separation of Compute and Storage Underutilized Compute
As more companies move to the cloud, separating compute and storage becomes a critical component needed to capitalize on cloud economics
Tightly coupled compute and storage forces companies to overprovision for worst-case scenarios
With Separation of Compute and Storage: Simply provision cloud infrastructure for standard workloads, and add more compute power when required No more procuring, configuring, and administering costly resources for worst-case scenarios. Data platform costs are tied directly to business value and variable workloads
Dynamic Compute
Advanced Analytics and Machine Learning Statistical Summary
Machine Learning
Outer Detection
Time Series Speed
Sessionize Pattern Matching
ANSI SQL
Date/ Time Algebra Scalability Massively Parallel Processing
Modern Data Warehouse
Window/ Partition Date Type Handling
Normalization
And More…
Business Understanding
Data Analysis & Understanding
SQL
Random Forests Logistic Regression
Imbalanced Data Processing
Linear Regression
Sampling
Ridge Regression Naive Bayes
Missing Value Imputation
ROC Tables Error Rate
In-Database Scoring
Lift Table Confusion Matrix
Speed
Scale
Cross Validation
R-Squared
And More…
And More…
MSE
Security
Data Preparation
Modeling
Evaluation
Deployment
Sequences Deploy Anywhere
Model-level Stats
SVM
SQL
SQL
SQL
SQL
Vertica is the Industry’s Only Infrastructure Agnostic, Unified Advanced Analytics Platform for All Your Data
Analyze in the Right Place
Advanced Analytics and in-Database Machine Learning
Freedom from Underlying Infrastructure
Strong Reliable Performance At Exabyte Scale
Learn More: www.vertica.com Try it Free: www.vertica.com/try Joy King VP, Product Management and Product Marketing
[email protected]
Discussion
© Eckerson Group 2017
Twitter: @weckerson
www.eckerson.com