My Account Log in

4 options

Pentaho data integration beginner's guide / Maria Carina Roldan.

EBSCOhost Academic eBook Collection (North America) Available online

EBSCOhost Academic eBook Collection (North America)

Ebook Central Academic Complete Available online

Ebook Central Academic Complete

Ebook Central College Complete Available online

Ebook Central College Complete

O'Reilly Online Learning: Academic/Public Library Edition Available online

O'Reilly Online Learning: Academic/Public Library Edition
Format:
Book
Author/Creator:
Roldan, Maria C.
Language:
English
Subjects (All):
Data integration (Computer science).
Database management--Computer programs.
Physical Description:
1 online resource (502 pages) : illustrations
Edition:
Second edition.
Place of Publication:
Birmingham : Packt Publishing, 2013.
Language Note:
English
System Details:
text file
Summary:
Extract, Transform, and Load (ETL) is the essence of data integration and this book shows you how to achieve it quickly and efficiently using Pentaho Data. A hands-on guide that you’ll find an indispensable time-saver. Manipulate your data by exploring, transforming, validating, and integrating it Learn to migrate data between applications Explore several features of Pentaho Data Integration 5.0 Connect to any database engine, explore the databases, and perform all kind of operations on databases In Detail Capturing, manipulating, cleansing, transferring, and loading data effectively are the prime requirements in every IT organization. Achieving these tasks require people devoted to developing extensive software programs, or investing in ETL or data integration tools that can simplify this work. Pentaho Data Integration is a full-featured open source ETL solution that allows you to meet these requirements. Pentaho Data Integration has an intuitive, graphical, drag-and-drop design environment and its ETL capabilities are powerful. However, getting started with Pentaho Data Integration can be difficult or confusing. "Pentaho Data Integration Beginner's Guide, Second Edition" provides the guidance needed to overcome that difficulty, covering all the possible key features of Pentaho Data Integration. "Pentaho Data Integration Beginner's Guide, Second Edition" starts with the installation of Pentaho Data Integration software and then moves on to cover all the key Pentaho Data Integration concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to do all kinds of data manipulation and work with plain files. Then, the book gives you a primer on databases and teaches you how to work with databases inside Pentaho Data Integration. Moreover, you will be introduced to data warehouse concepts and you will learn how to load data in a data warehouse. After that, you will learn to implement simple and complex processes. Finally, you will have the opportunity of applying and reinforcing all the learned concepts through the implementation of a simple datamart. With "Pentaho Data Integration Beginner's Guide, Second Edition", you will learn everything you need to know in order to meet your data manipulation requirements.
Contents:
Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Table of Contents
Preface
Chapter 1: Getting Started with Pentaho Data Integration
Pentaho Data Integration and Pentaho BI Suite
Exploring the Pentaho Demo
Pentaho Data Integration
Using PDI in real world scenarios
Loading data warehouses or datamarts
Integrating data
Data cleansing
Migrating information
Exporting data
Integrating PDI along with other Pentaho tools
Installing PDI
Time for action - installing PDI
Launching the PDI graphical designer - Spoon
Time for action - starting and customizing Spoon
Spoon
Setting preferences in the Options window
Storing transformations and jobs in a repository
Creating your first transformation
Time for action - creating a hello world transformation
Directing Kettle engine with transformations
Exploring the Spoon interface
Designing a transformation
Running and previewing the transformation
Installing MySQL
Time for action - installing MySQL on Windows
Time for action - installing MySQL on Ubuntu
Summary
Chapter 2: Getting Started with Transformations
Designing and previewing transformations
Time for action - creating a simple transformation and getting familiar with the design process
Getting familiar with editing features
Using the mouse-over assistance toolbar
Working with grids
Understanding the Kettle rowset
Looking at the results in the Execution Results pane
The Logging tab
The Step Metrics tab
Running transformations in an interactive fashion
Time for action - generating a range of dates and inspecting the data as it is being created
Adding or modifying fields by using different PDI steps
The Select values step
Getting fields
Date fields
Handling errors.
Time for action - avoiding errors while converting the estimated time from string to integer
The error handling functionality
Time for action - configuring the error handling to see the description of the errors
Personalizing the error handling
Chapter 3: Manipulating Real-world Data
Reading data from files
Time for action - reading results of football matches from files
Input files
Input steps
Reading several files at once
Time for action - reading all your files at a time using a single text file input step
Time for action - reading all your files at a time using a single text file input step and regular expressions
Regular expressions
Troubleshooting reading files
Sending data to files
Time for action - sending the results of matches to a plain file
Output files
Output steps
Getting system information
Time for action - reading and writing matches files with flexibility
The Get System Info step
Running transformations from a terminal window
Time for action - running the matches transformation from a terminal window
XML files
Time for action - getting data from an XML file with information about countries
What is XML?
PDI transformation files
Getting data from XML files
XPath
Configuring the Get data from XML step
Kettle variables
How and when you can use variables
Chapter 4: Filtering, Searching, and Performing Other Useful Operations with Data
Sorting data
Time for action - sorting information about matches with the Sort rows step
Calculations on groups of rows
Time for action - calculating football match statistics by grouping data
Group by Step
Numeric fields
Filtering
Time for action - counting frequent words by filtering
Time for action - refining the counting task by filtering even more.
Filtering rows using the Filter rows step
Looking up data
Time for action - finding out which language people speak
The Stream lookup step
Data cleaning
Time for action - fixing words before counting them
Cleansing data with PDI
Chapter 5: Controlling the Flow of Data
Splitting streams
Time for action - Browsing new features of PDI by copying a dataset
Copying rows
Distributing rows
Time for action - Assigning tasks by distributing
Splitting the stream based on conditions
Time for action - Assigning tasks by filtering priorities with the Filter rows step
PDI steps for splitting the stream based on conditions
Time for action - Assigning tasks by filtering priorities with the Switch/Case step
Merging streams
Time for action - Gathering progress and merging it all together
PDI options for merging streams
Time for action - Giving priority to Bouchard by using the Append Stream
Treating invalid data by splitting and merging streams
Time for action - Treating errors in the estimated time to avoid discarding rows
Treating rows with invalid data
Chapter 6: Transforming Your Data by Coding
Doing simple tasks with the JavaScript step
Time for action - counting frequent words by coding in JavaScript
Using the JavaScript language in PDI
Inserting JavaScript code using the Modified JavaScript Value Step
Adding fields
Modifying fields
Using transformation predefined constants
Testing the script using the Test script button
Reading and parsing unstructured files with JavaScript
Time for action - changing a list of house descriptions with JavaScript
Looping over the dataset rows
Doing simple tasks with the Java Class step
Time for action - counting frequent words by coding in Java
Using the Java language in PDI.
Inserting Java code using the User Defined Java Class step
Sending rows to the next step
Data types equivalence
Testing the Java Class using the Test class button
Transforming the dataset with Java
Time for action - splitting the field to rows using Java
Avoiding coding by using purpose built steps
Chapter 7: Transforming the Rowset
Converting rows to columns
Time for action - enhancing the films file by converting rows to columns
Converting row data to column data by using the Row Denormaliser step
Aggregating data with a Row Denormaliser step
Time for action - aggregating football matches data with the Row Denormaliser step
Using Row Denormaliser for aggregating data
Normalizing data
Time for action - enhancing the matches file by normalizing the dataset
Modifying the dataset with a Row Normaliser step
Summarizing the PDI steps that operate on sets of rows
Generating a custom time dimension dataset by using Kettle variables
Time for action - creating the time dimension dataset
Getting variables
Time for action - parameterizing the start and end date of the time dimension dataset
Using the Get Variables step
Chapter 8: Working with Databases
Introducing the Steel Wheels sample database
Connecting to the Steel Wheels database
Time for action - creating a connection to the Steel Wheels database
Connecting with Relational Database Management Systems
Exploring the Steel Wheels database
Time for action - exploring the sample database
A brief word about SQL
Exploring any configured database with the database explorer
Querying a database
Time for action - getting data about shipped orders
Getting data from the database with the Table input step.
Using the SELECT statement for generating a new dataset
Making flexible queries using parameters
Time for action - getting orders in a range of dates using parameters
Adding parameters to your queries
Making flexible queries by using Kettle variables
Time for action - getting orders in a range of dates by using Kettle variables
Using Kettle variables in your queries
Sending data to a database
Time for action - loading a table with a list of manufacturers
Inserting new data into a database table with the Table output step
Inserting or updating data by using other PDI steps
Time for action - inserting new products or updating existent ones
Time for action - testing the update of existent products
Inserting or updating with the Insert/Update Step
Eliminating data from a database
Time for action - deleting data about discontinued items
Deleting records of a database table with the Delete step
Chapter 9: Performing Advanced Operations with Databases
Preparing the environment
Time for action - populating the Jigsaw database
Exploring the Jigsaw database model
Looking up data in a database
Doing simple lookups
Time for action - using a Database lookup step to create a list of products to buy
Looking up values in a database with the Database lookup step
Performing complex lookups
Time for action - using a Database join step to create a list of suggested products to buy
Joining data from the database to the stream data by using a Database join step
Introducing dimensional modeling
Loading dimensions with data
Time for action - loading a region dimension with a Combination lookup/update step
Time for action - testing the transformation that loads the region dimension
Describing data with dimensions.
Loading Type I SCD with a Combination lookup/update step.
Notes:
Includes index.
Description based on online resource; title from PDF title page (ebrary, viewed November 23, 2013).
ISBN:
9781782165057
1782165053
OCLC:
862050189

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

We want your feedback!

Thanks for using the Penn Libraries new search tool. We encourage you to submit feedback as we continue to improve the site.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account