What’s New in Pentaho 7.0? Pedro Alves
According to the Gartner Market Guide for SelfService Data Preparation Analytics…
Data and analytics users spend the majority of their time either preparing data for analysis or waiting for data to be prepared for them.
Disparate tools, disparate problems
Business
Today’s data landscape is littered with disparate tools and disjointed processes.
Business Analytics
Data Prep
How do you get ahead with that? ETL / Data Engineering
IT
Bridging the gap between data preparation and analytics
Business 7.0 is about unlocking the data divide between business and IT. Leveraging the Pentaho platform to drive the business with usable, accurate, and accessible analytics across the organization.
Business Analytics Data Prep ETL / Data Engineering IT
Pentaho’s platform today
A data integration and business analytics platform that can access, prepare, blend and analyze any structured or unstructured data.
Pentaho’s future: a platform that fully enables a healthy data ecosystem
1. Alleviates disparate tools and complexity.
2. Makes analytics accessible at any stage of the data pipeline.
3. On top of governance, security, big data ecosystem support, and the required foundations of a blended data world.
Pentaho 7.0 Analyze data anywhere in the data pipeline
Bridging the gap between data preparation and analytics with a Visual Data Experience from anywhere in the data pipeline: • Bringing Analytics into Data Prep • Share Analytics during Data Prep • Reporting Enhancements
On top of governance, security, and Big Data ecosystem support for a blended data world: • • • • •
Spark Metadata Injection Hadoop Security Support for Kafka, Avro, Parquet Admin Simplification
A Visual Data Experience From Anywhere in The Data Pipeline Pentaho 7.0
Imagine if you were able to access analytics from anywhere within the data pipeline
Engineering
Preparation
Analytics
Managing and automating the pipeline
Bringing analytics into data prep
Visualize data in-flight, without switching in and out of tools
Bringing analytics into data prep
Access to tables, visualizations, charts, graphs, or ad hoc analysis during data prep.
Bringing analytics into data prep
Identify missing or incorrect data during the data prep process.
Bringing analytics into data prep
Publish data sources to the business, and get data to the business faster
Shrinking the gap between data preparation & business analytics
1. ETL developers and data prep staff can easily spot check analytics without switching in and out of tools.
2. Creates a more collaborative process between business and IT, shortening the cycle from data to analytics.
A Visual Data Experience in the words of Sears Holdings Corp.
“ The ability to spot check and visualize our data throughout its lifecycle allows for a much more informative and streamlined datadriven decision making process to create more reliability, while reducing costs. ” Meir Kornfield, Director, Product Management and Business Intelligence, Sears Holdings Corp.
Big Data Ecosystem Support for a Blended Data World Pentaho 7.0
7.0 makes big data operational
Operationalize High Performance Pipelines with Spark Integration
Protect Data Assets with Expanded Hadoop Security
Automate Onboarding with Enhanced Metadata Injection
Spark potential Potential and Growth •
Faster processing than MapReduce
•
Drives real time & intelligent big data applications at scale
Market Challenges •
Skill barriers – Spark requires specialized developer skills
•
Somewhat lacking in enterprise maturity – memory management, multi user access, etc.
•
Effective integration with broader data architectures is challenging
Current state: Pentaho and Spark
•
Execute Spark applications in PDI jobs
•
Supports existing Java and Scala code from core Spark libraries
Intuitive coordination of high performance pipelines
Challenge: Hard to manage multiple Spark applications and multiple programming languages AND operationalize them in data pipelines with full flexibility
7.0 Expands Spark Orchestration •
Coordinate and schedule Spark applications for Horton and Cloudera
•
Operationalize streaming, machine learning, and core Spark techniques within jobs
•
Choice of programming language, incl. Python
Remove skill barriers to use Spark Challenge: Spark requires specialized developer skills. Need an easy way to integrate Spark data with other data processes.
7.0 Adds SQL on Spark Connectivity •
PDI access to SQL on Spark for rapid data prep and queries – on Horton and Cloudera
•
Improves productivity by using existing IT data skill sets on Spark
•
Accelerates time to value in big data pipeline projects
More secure clusters, better big data governance, and reduced risk Challenge: Protect key enterprise big data assets against intrusion and reduce risk of security breaches
Expanded Hadoop Security •
Secure multi-user access to the cluster via updated Kerberos integration, enabling user level tracking by mapping PDI users to Hadoop users
•
Compatibility with Sentry to enforce user authorization rules governing access to specific Hadoop data assets
KERBEROS
Accelerate data onboarding with Metadata Injection
What happens when data sources proliferate?
Data Read Data Source Data Source Data Source Data Source Data Source Data Source Data Source Data Source Source
x100
Transform
Write
Data Target
Example use cases: • Migrating 100+ tables between databases • Ingesting 100+ data sources into Hadoop • Allowing end users to onboard data themselves
Accelerate data onboarding with Metadata Injection
Build more transformations?
X100??? Data Data Read Transform Write Data Source Target Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Source
x100
Accelerate data onboarding with Metadata Injection Pass metadata in at run time to generate jobs on the fly
DRIVE HUNDREDS OF JOBS WITH 1 TEMPLATE Data Read Data Source Data Source Data Source Data Source Data Source Data Source Data Source Data Source Source
x100
Transform
Write
Data Target
Reduced development time, cost, and risk
Rapidly automate and scale big data onboarding Challenge: IT teams spend too much time coding ingestion and processing jobs for a wide variety of big data sources
Metadata Injection Expansion •
Expands options for auto-generated data flows, by allowing metadata to be passed to a wider array of PDI steps at runtime
•
Increases IT productivity when building out many data migration and onboarding processes
•
Now works with 30+ additional PDI steps • Includes compatibility with Hadoop, Hbase, NoSQL, JSON, XML, and Analytic DB steps
Steps Newly Enabled for Metadata Injection
Simplified configuration, deployment and administration of Pentaho
Pentaho 7.0 reduces time to insights by making it faster and easier to configure, deploy and manage DI and BI services within development to production environments used to support the data lifecycle with no licensing impact Configure and Deploy Faster
Simplify Administration
Pentaho 7.0 in Action Use Case: Retail bank needs to reduce costs and risk related to credit card fraud with a repeatable business process Coordinate fraud model creation on Spark
Ingest new transactions to Hadoop via Kafka
Access modeled data for analysis via SQL on Spark
Visually inspect data set for quality, completeness
Collaboratively share results with the business
•
Orchestrate workflow across components and integrate data in one end-to-end pipeline
•
Differentiated solution in the market for visual inspection of data at any step in the prep process
•
Fewer tools and fewer new skills needed
•
Data prep cycle time, time to insight accelerated
DEMO
Analysts & Press on Pentaho 7.0: “ As Hadoop can be a challenge around security, Pentaho is expanding its Hadoop data security integration to promote better Big Data governance, protecting clusters from intruders. ”
– Bev Terrell, SiliconANGLE
“ So this looks to be a major enhancement which is really setting out the Pentaho stall as a BI vendor of choice at the Enterprise level with integrated capability which is easier to use and more powerful out of the box than the comparable offerings in the marketplace which are still reliant on skilled technicians to unite and enact the solutions. ”
- David Norris, Bloor Research
“ No other platform lets IT and the business collaborate in this way, at such an early stage in the process. ”
- Adrian Bridgwater, TechTarget
Questions?
Thank You