Data Warehouse with Pentaho Data Integration

Course ID : PHI-BI-002 Platform : Windows http://www.phi-Integration.com

Data Warehouse with Kettle (Pentaho Data Integration)

Data Warehouse with Kettle (Pentaho Data Integration) I.

Introduction Data analysis as part of business intelligence solutions is a growingly demanded needs. Unfortunately in most transactional systems, rarely data is organized in an effective and efficient aggregates. In those cases, producing an analytical report require an exhaustive and time consuming efforts. Thus, to successfully build a comprehensive BI solution we need to transform our transactional data into analytical friendly format, i.e. Multi Dimensional Modelling. The transformation is usually stored in a data warehouse. Of course, in data warehouse process usually involved a tool called ETL (Extract, Transform and Load). Pentaho Data Integration (Kettle) is one of the ETL tool that is open source and very popular. This course presents an overview of data warehouse, multi dimensional, ETL concept, the extensive use of Pentaho Data Integration to build a powerful data warehouse solution. The practical approach of this course involved many cases we face before, so that participants will get a highly valuable skills from attending the class.

PHI-Integration.com

Page : 2 / 6

Hala

Data Warehouse with Kettle (Pentaho Data Integration)

II.

Who Should Attend ? This course is designed for those new to Data Warehouse and ETL or need to understand the basics of Pentaho Data Integration.

III.

Objectives At the completion of this course, attendee should be able to : Understand the concepts and topics of Data Warehouse, Dimensional Modeling, OLAP and ETL Use Pentaho Data Integration to build simple jobs / transformations Consume data from several data sources Building and Populating Fact and Dimensional tables Troubleshooting techniques Schedule job / transformation

IV.

Course Duration 5 days / 30 hours

V.

Course Prerequisites Basic understanding one of several popular DBMS (Oracle, SQL Server, MySQL, etc) and of Structured Query Language (SQL) No understanding of other Pentaho tools is needed

VI.

Course Requirements PC or Laptop with minimum of 2GHz CPU, 1 GB of RAM, DVD Drive and 2 GB of available hard disk space. Softwares : ◦ ◦ ◦ ◦ ◦ ◦

Microsoft Windows XP Pro Java Runtime Environment (JRE) Apache Tomcat MySQL 5.0 Database Server Pentaho Data Integration Pentaho Data Analysis (Mondrian)

PHI-Integration.com

Page : 3 / 6

Hala

Data Warehouse with Kettle (Pentaho Data Integration)

VII.

Course Outline 1. Introduction to Data Warehouse ◦ Data Warehouse. ◦ Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP). ◦ Data Warehouse and OLAP. ◦ Delivering Solution with ETL (Extract, Transform, Load) Tool. 2. Installation and Configuration ◦ Java Runtime Environment / Java Development Kit. ◦ Pentaho Data Integration. ◦ XAMPP package (Apache HTTP Server and MySQL). ◦ SQLYog – a GUI based mysql client. ◦ Data and Script samples. 3. Short Introduction to MySQL ◦ MySQL Storage Engines. ◦ Administering MySQL via PHPMyAdmin. ◦ PHI-Minimart sample database installation. 4. Pentaho Data Integration (Kettle) ◦ Introducing Kettle as Pentaho’s ETL Suite ◦ Architecture ◦ Components ▪ Spoon : Graphical UI Designer for job / transformation steps ▪ Pan : Command line batch script for transformation execution ▪ Kitchen : Command line batch script for transformation execution ▪ Carte : Cluster server ◦ Job / Transformation ▪ Step and Hop. ▪ Row and Meta Data. ▪ Relation between job and transformation. 5. Getting Started with Spoon ◦ File system and RDBMS based Repository ◦ Spoon Development Environment ◦ Database Connections ◦ Job and Transformation ▪ Creating job ▪ Creating transformation ▪ Calling transformation from job

PHI-Integration.com

Page : 4 / 6

Hala

Data Warehouse with Kettle (Pentaho Data Integration)



Configuring Log

6. Multi Dimensional Modelling ◦ Normalized versus Multi Dimensional Model ◦ Fact and Dimension Tables ◦ Star Schema and Snowflake Schema ◦ Tasks : ▪ Create a Kettle transformation to map PHI-Minimart transactional database sample to dimensional modeling database. ▪ Create logs for each steps. 7. Change Data Capture (CDC) ◦ What is CDC ? ◦ Why CDC is so hard that heavily relied on data source ? ◦ SQL Server 2008’s CDC feature demonstration. ◦ Tasks : ▪ Create a Kettle transformation to map PHI-Minimart transactional database sample to dimensional modeling database ▪ Create logs for each steps 8. Slowly Changing Dimension (SCD) ◦ Slowly Changing Dimension to solve master data historical problems. ◦ SCD Types ◦ Use of Kettle’s step to solve several SCD types with several schema : ▪ Insert / Update ▪ Punch Through 9. Orphan / Late Arrival ◦ What is Late Arrival Dimension? ◦ Typical Situations where Late Arrival occurs. ◦ Best practice of Late Arrival’s handling. 10. OLAP View of Multidimensional Data (Mondrian / JPivot) ◦ Mondrian Installation ◦ Creating scheme based on our fact and dimension tables ◦ View and navigate our Cube using Web Browser 11. Data Staging ◦ What is Data Staging? ◦ Background : Physical I/O versus In-Memory Processing ◦ Task : PHI-Integration.com

Page : 5 / 6

Hala

Data Warehouse with Kettle (Pentaho Data Integration)

▪ Create a transformation to join from 3 data sources : text file, Excel spreadsheet, and RDBMS ▪ Create a currency staging table to solve sequential dependence problem. 12. Advance Controls ◦ Environment Variables. ◦ Shared Objects. ◦ Error Handling. ◦ Email job results. ◦ Task : ▪ Create a dynamic tables dump using variable and looping control. ▪ Refining existing transformations to use email alert. 13. Automation ◦ Using Windows Task Scheduler to schedule ETL running job and transformation.

PHI-Integration.com

Page : 6 / 6

Hala

Data Warehouse with Pentaho Data Integration -

Data analysis as part of business intelligence solutions is a growingly demanded needs. .... Create a transformation to join from 3 data sources : text file, Excel.

208KB Sizes 11 Downloads 205 Views

Recommend Documents

Data Warehouse with Pentaho Data Integration -
PC or Laptop with minimum of 2GHz CPU, 1 GB of RAM, DVD Drive and 2 GB of ... Punch Through. 9. Orphan / Late Arrival. ◦ What is Late Arrival Dimension?

Pentaho Data Integration Cookbook Second Edition - , Pulvirenti ...
Pentaho Data Integration Cookbook Second Edition - , Pulvirenti Adrian Sergio.pdf. Pentaho Data Integration Cookbook Second Edition - , Pulvirenti Adrian ...

pentaho data integration user guide pdf
pentaho data integration user guide pdf. pentaho data integration user guide pdf. Open. Extract. Open with. Sign In. Main menu. Displaying pentaho data ...

Pentaho Data Integration Cookbook Second Edition
Click the button below to register a free account and download the file ... Kettle with Pentaho Reporting, Pentaho Dashboards, Community Data Access, and the ...

Data integration with uncertainty
Feb 17, 2008 - This section describes the requirements from a data integra- tion system that supports uncertainty and the overall archi- tecture of the system.

Data integration with uncertainty - Springer Link
Nov 14, 2008 - sources by automatic methods (e.g., HTML pages, emails, blogs). ..... If a tuple is an answer to Q under multiple mappings in m, then we add.

PERENCANAAN DATA WAREHOUSE PEMETAAN DATA SISWA.pdf
PERENCANAAN DATA WAREHOUSE PEMETAAN DATA SISWA.pdf. PERENCANAAN DATA WAREHOUSE PEMETAAN DATA SISWA.pdf. Open. Extract.

Data Warehouse and Data Mining Technology Data ...
IJRIT International Journal of Research in Information Technology, Vol. 1, Issue 2, February ... impact, relevance and need in Enterpr relevance and ... The data that is used in current business domains is not accurate, complete and precise.

Data Warehouse Engineering Process (DWEP) with ...
Index Terms— Data warehouse, UML, Unified process, data models ... management's decisions...” [1], and Ralph ... database that stores and organizes information that is .... Management Systems Interim Report; ACM SIGMOD FDT, Vol. 7, No.

Data Warehouse Engineering Process (DWEP) with ...
The UP is a methodology for software development proposed by OMG [17], ... Mapping (DM), Data Warehouse State Machine Schema. (DWMSS) and the Data ...

Implementing A Data Warehouse With Microsoft SQL Server 2012.pdf
Microsoft - Implementing A Data Warehouse With Microsoft SQL Server 2012.pdf. Microsoft - Implementing A Data Warehouse With Microsoft SQL Server 2012.

Data Warehouse Engineering Process (DWEP) with ...
the intelligence business, Bill Inmon defines it: “... A data warehouse is a ... The UP is a methodology for software development proposed by OMG [17], its main ...

Data Warehouse Engineering Process (DWEP) with ...
processes of extraction, transformation and loading (ETL), and. (iii) the area storage. Lujan [3, 4] proposed in his doctoral thesis using the Data Warehouse ...

Download Building a Scalable Data Warehouse with ...
Read Best Book Online Building a Scalable Data Warehouse with Data Vault 2.0, ebook ... Data Vault 2.0, pdf epub free download Building a Scalable Data Warehouse with Data .... Server Integration Services. (SSIS), including automation.

Data Warehouse Engineering Process (DWEP) with U.M.L. 2.1.1.
business rules to determine the design of the data warehouse. In this model ... The UP is a methodology for software development proposed by OMG [17], its ...