Learning Pentaho Data Integration 8 CE - Third Edition.
Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide About This Book Manipulate your data by exploring, transforming, validating, and integrating it using Pentaho Data Integration 8 CE A comprehensive guide exploring the features of Pentaho Data Integrati...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing,
2017.
|
Edición: | 3rd ed. |
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Getting Started with Pentaho Data Integration
- Pentaho Data Integration and Pentaho BI Suite
- Introducing Pentaho Data Integration
- Using PDI in real-world scenarios
- Loading data warehouses or data marts
- Integrating data
- Data cleansing
- Migrating information
- Exporting data
- Integrating PDI along with other Pentaho tools
- Installing PDI
- Launching the PDI Graphical Designer
- Spoon
- Starting and customizing Spoon
- Exploring the Spoon interface
- Extending the PDI functionality through the Marketplace
- Introducing transformations
- The basics about transformations
- Creating a Hello World! Transformation
- Designing a Transformation
- Previewing and running a Transformation
- Installing useful related software
- Summary
- Chapter 2: Getting Started with Transformations
- Designing and previewing transformations
- Getting familiar with editing features
- Using the mouseover assistance toolbar
- Adding steps and creating hops
- Working with grids
- Designing transformations
- Putting the editing features in practice
- Previewing and fixing errors as they appear
- Looking at the results in the execution results pane
- The Logging tab
- The Step Metrics tab
- Running transformations in an interactive fashion
- Understanding PDI data and metadata
- Understanding the PDI rowset
- Adding or modifying fields by using different PDI steps
- Explaining the PDI data types
- Handling errors
- Implementing the error handling functionality
- Customizing the error handling
- Summary
- Chapter 3: Creating Basic Task Flows
- Introducing jobs
- Learning the basics about jobs
- Creating a Simple Job
- Designing and running jobs.
- Revisiting the Spoon interface and the editing features
- Designing jobs
- Getting familiar with the job design process
- Looking at the results in the Execution results window
- The Logging tab
- The Job metrics tab
- Enriching your work by sending an email
- Running transformations from a Job
- Using the Transformation Job Entry
- Understanding and changing the flow of execution
- Changing the flow of execution based on conditions
- Forcing a status with an abort Job or success entry
- Changing the execution to be synchronous
- Managing files
- Creating a Job that moves some files
- Selecting files and folders
- Working with regular expressions
- Summarizing the Job entries that deal with files
- Customizing the file management
- Knowing the basics about Kettle variables
- Understanding the kettle.properties file
- How and when you can use variables
- Summary
- Chapter 4: Reading and Writing Files
- Reading data from files
- Reading a simple file
- Troubleshooting reading files
- Learning to read all kind of files
- Specifying the name and location of the file
- Reading several files at the same time
- Reading files that are compressed or located on a remote server
- Reading a file whose name is known at runtime
- Describing the incoming fields
- Reading Date fields
- Reading Numeric fields
- Reading only a subset of the file
- Reading the most common kinds of sources
- Reading text files
- Reading spreadsheets
- Reading XML files
- Reading JSON files
- Outputting data to files
- Creating a simple file
- Learning to create all kind of files and write data into them
- Providing the name and location of an output file
- Creating a file whose name is known only at runtime
- Creating several files whose name depend on the content of the file
- Describing the content of the output file.
- Formatting Date fields
- Formatting Numeric fields
- Creating the most common kinds of files
- Creating text files
- Creating spreadsheets
- Creating XML files
- Creating JSON files
- Working with Big Data and cloud sources
- Reading files from an AWS S3 instance
- Writing files to an AWS S3 instance
- Getting data from HDFS
- Sending data to HDFS
- Summary
- Chapter 5: Manipulating PDI Data and Metadata
- Manipulating simple fields
- Working with strings
- Extracting parts of strings using regular expressions
- Searching and replacing using regular expressions
- Doing some math with Numeric fields
- Operating with dates
- Performing simple operations on dates
- Subtracting dates with the Calculator step
- Getting information relative to the current date
- Using the Get System Info step
- Performing other useful operations on dates
- Getting the month names with a User Defined Java Class step
- Modifying the metadata of streams
- Working with complex structures
- Working with XML
- Introducing XML terminology
- Getting familiar with the XPath notation
- Parsing XML structures with PDI
- Reading an XML file with the Get data from XML step
- Parsing an XML structure stored in a field
- PDI Transformation and Job files
- Parsing JSON structures
- Introducing JSON terminology
- Getting familiar with the JSONPath notation
- Parsing JSON structures with PDI
- Reading a JSON file with the JSON input step
- Parsing a JSON structure stored in a field
- Summary
- Chapter 6: Controlling the Flow of Data
- Filtering data
- Filtering rows upon conditions
- Reading a file and getting the list of words found in it
- Filtering unwanted rows with a Filter rows step
- Filtering rows by using the Java Filter step
- Filtering data based on row numbers
- Splitting streams unconditionally
- Copying rows.
- Distributing rows
- Introducing partitioning and clustering
- Splitting the stream based on conditions
- Splitting a stream based on a simple condition
- Exploring PDI steps for splitting a stream based on conditions
- Merging streams in several ways
- Merging two or more streams
- Customizing the way of merging streams
- Looking up data
- Looking up data with a Stream lookup step
- Summary
- Chapter 7: Cleansing, Validating, and Fixing Data
- Cleansing data
- Cleansing data by example
- Standardizing information
- Improving the quality of data
- Introducing PDI steps useful for cleansing data
- Dealing with non-exact matches
- Cleansing by doing a fuzzy search
- Deduplicating non-exact matches
- Validating data
- Validating data with PDI
- Validating and reporting errors to the log
- Introducing common validations and their implementation with PDI
- Treating invalid data by splitting and merging streams
- Fixing data that doesn't match the rules
- Summary
- Chapter 8: Manipulating Data by Coding
- Doing simple tasks with the JavaScript step
- Using the JavaScript language in PDI
- Inserting JavaScript code using the JavaScript step
- Adding fields
- Modifying fields
- Organizing your code
- Controlling the flow using predefined constants
- Testing the script using the Test script button
- Parsing unstructured files with JavaScript
- Doing simple tasks with the Java Class step
- Using the Java language in PDI
- Inserting Java code using the Java Class step
- Learning to insert java code in a Java Class step
- Data types equivalence
- Adding fields
- Modifying fields
- Controlling the flow with the putRow() function
- Testing the Java Class using the Test class button
- Getting the most out of the Java Class step
- Receiving parameters
- Reading data from additional steps.
- Redirecting data to different target steps
- Parsing JSON structures
- Avoiding coding using purpose-built steps
- Summary
- Chapter 9: Transforming the Dataset
- Sorting data
- Sorting a dataset with the sort rows step
- Working on groups of rows
- Aggregating data
- Summarizing the PDI steps that operate on sets of rows
- Converting rows to columns
- Converting row data to column data using the Row denormaliser step
- Aggregating data with a Row Denormaliser step
- Normalizing data
- Modifying the dataset with a Row Normaliser step
- Going forward and backward across rows
- Picking rows backward and forward with the Analytic Query step
- Summary
- Chapter 10: Performing Basic Operations with Databases
- Connecting to a database and exploring its content
- Connecting with Relational Database Management Systems
- Exploring a database with the Database Explorer
- Previewing and getting data from a database
- Getting data from the database with the Table input step
- Using the Table input step to run flexible queries
- Adding parameters to your queries
- Using Kettle variables in your queries
- Inserting, updating, and deleting data
- Inserting new data into a database table
- Inserting or updating data with the Insert / Update step
- Deleting records of a database table with the Delete step
- Performing CRUD operations with more flexibility
- Verifying a connection, running DDL scripts, and doing other useful tasks
- Looking up data in different ways
- Doing simple lookups with the Database Value Lookup step
- Making a performance difference when looking up data in a database
- Performing complex database lookups
- Looking for data using a Database join step
- Looking for data using a Dynamic SQL row step
- Summary
- Chapter 11: Loading Data Marts with PDI
- Preparing the environment.