What’s New in Pentaho 7.0? Pedro Alves

According to the Gartner Market Guide for SelfService Data Preparation Analytics…

Data and analytics users spend the majority of their time either preparing data for analysis or waiting for data to be prepared for them.

Disparate tools, disparate problems

Business

Today’s data landscape is littered with disparate tools and disjointed processes.

Business Analytics

Data Prep

How do you get ahead with that? ETL / Data Engineering

IT

Bridging the gap between data preparation and analytics

Business 7.0 is about unlocking the data divide between business and IT. Leveraging the Pentaho platform to drive the business with usable, accurate, and accessible analytics across the organization.

Business Analytics Data Prep ETL / Data Engineering IT

Pentaho’s platform today

A data integration and business analytics platform that can access, prepare, blend and analyze any structured or unstructured data.

Pentaho’s future: a platform that fully enables a healthy data ecosystem

1. Alleviates disparate tools and complexity.

2. Makes analytics accessible at any stage of the data pipeline.

3. On top of governance, security, big data ecosystem support, and the required foundations of a blended data world.

Pentaho 7.0 Analyze data anywhere in the data pipeline

Bridging the gap between data preparation and analytics with a Visual Data Experience from anywhere in the data pipeline: • Bringing Analytics into Data Prep • Share Analytics during Data Prep • Reporting Enhancements

On top of governance, security, and Big Data ecosystem support for a blended data world: • • • • •

Spark Metadata Injection Hadoop Security Support for Kafka, Avro, Parquet Admin Simplification

A Visual Data Experience From Anywhere in The Data Pipeline Pentaho 7.0

Imagine if you were able to access analytics from anywhere within the data pipeline

Engineering

Preparation

Analytics

Managing and automating the pipeline

Bringing analytics into data prep

Visualize data in-flight, without switching in and out of tools

Bringing analytics into data prep

Access to tables, visualizations, charts, graphs, or ad hoc analysis during data prep.

Bringing analytics into data prep

Identify missing or incorrect data during the data prep process.

Bringing analytics into data prep

Publish data sources to the business, and get data to the business faster

Shrinking the gap between data preparation & business analytics

1. ETL developers and data prep staff can easily spot check analytics without switching in and out of tools.

2. Creates a more collaborative process between business and IT, shortening the cycle from data to analytics.

A Visual Data Experience in the words of Sears Holdings Corp.

“ The ability to spot check and visualize our data throughout its lifecycle allows for a much more informative and streamlined datadriven decision making process to create more reliability, while reducing costs. ” Meir Kornfield, Director, Product Management and Business Intelligence, Sears Holdings Corp.

Big Data Ecosystem Support for a Blended Data World Pentaho 7.0

7.0 makes big data operational

Operationalize High Performance Pipelines with Spark Integration

Protect Data Assets with Expanded Hadoop Security

Automate Onboarding with Enhanced Metadata Injection

Spark potential Potential and Growth •

Faster processing than MapReduce



Drives real time & intelligent big data applications at scale

Market Challenges •

Skill barriers – Spark requires specialized developer skills



Somewhat lacking in enterprise maturity – memory management, multi user access, etc.



Effective integration with broader data architectures is challenging

Current state: Pentaho and Spark



Execute Spark applications in PDI jobs



Supports existing Java and Scala code from core Spark libraries

Intuitive coordination of high performance pipelines

Challenge: Hard to manage multiple Spark applications and multiple programming languages AND operationalize them in data pipelines with full flexibility

7.0 Expands Spark Orchestration •

Coordinate and schedule Spark applications for Horton and Cloudera



Operationalize streaming, machine learning, and core Spark techniques within jobs



Choice of programming language, incl. Python

Remove skill barriers to use Spark Challenge: Spark requires specialized developer skills. Need an easy way to integrate Spark data with other data processes.

7.0 Adds SQL on Spark Connectivity •

PDI access to SQL on Spark for rapid data prep and queries – on Horton and Cloudera



Improves productivity by using existing IT data skill sets on Spark



Accelerates time to value in big data pipeline projects

More secure clusters, better big data governance, and reduced risk Challenge: Protect key enterprise big data assets against intrusion and reduce risk of security breaches

Expanded Hadoop Security •

Secure multi-user access to the cluster via updated Kerberos integration, enabling user level tracking by mapping PDI users to Hadoop users



Compatibility with Sentry to enforce user authorization rules governing access to specific Hadoop data assets

KERBEROS

Accelerate data onboarding with Metadata Injection

What happens when data sources proliferate?

Data Read Data Source Data Source Data Source Data Source Data Source Data Source Data Source Data Source Source

x100

Transform

Write

Data Target

Example use cases: • Migrating 100+ tables between databases • Ingesting 100+ data sources into Hadoop • Allowing end users to onboard data themselves

Accelerate data onboarding with Metadata Injection

Build more transformations?

X100??? Data Data Read Transform Write Data Source Target Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Data Source Read Transform Write Source

x100

Accelerate data onboarding with Metadata Injection Pass metadata in at run time to generate jobs on the fly

DRIVE HUNDREDS OF JOBS WITH 1 TEMPLATE Data Read Data Source Data Source Data Source Data Source Data Source Data Source Data Source Data Source Source

x100

Transform

Write

Data Target

Reduced development time, cost, and risk

Rapidly automate and scale big data onboarding Challenge: IT teams spend too much time coding ingestion and processing jobs for a wide variety of big data sources

Metadata Injection Expansion •

Expands options for auto-generated data flows, by allowing metadata to be passed to a wider array of PDI steps at runtime



Increases IT productivity when building out many data migration and onboarding processes



Now works with 30+ additional PDI steps • Includes compatibility with Hadoop, Hbase, NoSQL, JSON, XML, and Analytic DB steps

Steps Newly Enabled for Metadata Injection

Simplified configuration, deployment and administration of Pentaho

Pentaho 7.0 reduces time to insights by making it faster and easier to configure, deploy and manage DI and BI services within development to production environments used to support the data lifecycle with no licensing impact Configure and Deploy Faster

Simplify Administration

Pentaho 7.0 in Action Use Case: Retail bank needs to reduce costs and risk related to credit card fraud with a repeatable business process Coordinate fraud model creation on Spark

Ingest new transactions to Hadoop via Kafka

Access modeled data for analysis via SQL on Spark

Visually inspect data set for quality, completeness

Collaboratively share results with the business



Orchestrate workflow across components and integrate data in one end-to-end pipeline



Differentiated solution in the market for visual inspection of data at any step in the prep process



Fewer tools and fewer new skills needed



Data prep cycle time, time to insight accelerated

DEMO

Analysts & Press on Pentaho 7.0: “ As Hadoop can be a challenge around security, Pentaho is expanding its Hadoop data security integration to promote better Big Data governance, protecting clusters from intruders. ”

– Bev Terrell, SiliconANGLE

“ So this looks to be a major enhancement which is really setting out the Pentaho stall as a BI vendor of choice at the Enterprise level with integrated capability which is easier to use and more powerful out of the box than the comparable offerings in the marketplace which are still reliant on skilled technicians to unite and enact the solutions. ”

- David Norris, Bloor Research

“ No other platform lets IT and the business collaborate in this way, at such an early stage in the process. ”

- Adrian Bridgwater, TechTarget

Questions?

Thank You

Pentaho-7.0-PedroAlves.pdf

There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.
Missing:

7MB Sizes 5 Downloads 130 Views

Recommend Documents

No documents