Cargando…

Learning Pentaho Data Integration 8 CE - Third Edition.

Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide About This Book Manipulate your data by exploring, transforming, validating, and integrating it using Pentaho Data Integration 8 CE A comprehensive guide exploring the features of Pentaho Data Integrati...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Roldan, Maria Carina
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham : Packt Publishing, 2017.
Edición:3rd ed.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Customer Feedback
  • Table of Contents
  • Preface
  • Chapter 1: Getting Started with Pentaho Data Integration
  • Pentaho Data Integration and Pentaho BI Suite
  • Introducing Pentaho Data Integration
  • Using PDI in real-world scenarios
  • Loading data warehouses or data marts
  • Integrating data
  • Data cleansing
  • Migrating information
  • Exporting data
  • Integrating PDI along with other Pentaho tools
  • Installing PDI
  • Launching the PDI Graphical Designer
  • Spoon
  • Starting and customizing Spoon
  • Exploring the Spoon interface
  • Extending the PDI functionality through the Marketplace
  • Introducing transformations
  • The basics about transformations
  • Creating a Hello World! Transformation
  • Designing a Transformation
  • Previewing and running a Transformation
  • Installing useful related software
  • Summary
  • Chapter 2: Getting Started with Transformations
  • Designing and previewing transformations
  • Getting familiar with editing features
  • Using the mouseover assistance toolbar
  • Adding steps and creating hops
  • Working with grids
  • Designing transformations
  • Putting the editing features in practice
  • Previewing and fixing errors as they appear
  • Looking at the results in the execution results pane
  • The Logging tab
  • The Step Metrics tab
  • Running transformations in an interactive fashion
  • Understanding PDI data and metadata
  • Understanding the PDI rowset
  • Adding or modifying fields by using different PDI steps
  • Explaining the PDI data types
  • Handling errors
  • Implementing the error handling functionality
  • Customizing the error handling
  • Summary
  • Chapter 3: Creating Basic Task Flows
  • Introducing jobs
  • Learning the basics about jobs
  • Creating a Simple Job
  • Designing and running jobs.
  • Revisiting the Spoon interface and the editing features
  • Designing jobs
  • Getting familiar with the job design process
  • Looking at the results in the Execution results window
  • The Logging tab
  • The Job metrics tab
  • Enriching your work by sending an email
  • Running transformations from a Job
  • Using the Transformation Job Entry
  • Understanding and changing the flow of execution
  • Changing the flow of execution based on conditions
  • Forcing a status with an abort Job or success entry
  • Changing the execution to be synchronous
  • Managing files
  • Creating a Job that moves some files
  • Selecting files and folders
  • Working with regular expressions
  • Summarizing the Job entries that deal with files
  • Customizing the file management
  • Knowing the basics about Kettle variables
  • Understanding the kettle.properties file
  • How and when you can use variables
  • Summary
  • Chapter 4: Reading and Writing Files
  • Reading data from files
  • Reading a simple file
  • Troubleshooting reading files
  • Learning to read all kind of files
  • Specifying the name and location of the file
  • Reading several files at the same time
  • Reading files that are compressed or located on a remote server
  • Reading a file whose name is known at runtime
  • Describing the incoming fields
  • Reading Date fields
  • Reading Numeric fields
  • Reading only a subset of the file
  • Reading the most common kinds of sources
  • Reading text files
  • Reading spreadsheets
  • Reading XML files
  • Reading JSON files
  • Outputting data to files
  • Creating a simple file
  • Learning to create all kind of files and write data into them
  • Providing the name and location of an output file
  • Creating a file whose name is known only at runtime
  • Creating several files whose name depend on the content of the file
  • Describing the content of the output file.
  • Formatting Date fields
  • Formatting Numeric fields
  • Creating the most common kinds of files
  • Creating text files
  • Creating spreadsheets
  • Creating XML files
  • Creating JSON files
  • Working with Big Data and cloud sources
  • Reading files from an AWS S3 instance
  • Writing files to an AWS S3 instance
  • Getting data from HDFS
  • Sending data to HDFS
  • Summary
  • Chapter 5: Manipulating PDI Data and Metadata
  • Manipulating simple fields
  • Working with strings
  • Extracting parts of strings using regular expressions
  • Searching and replacing using regular expressions
  • Doing some math with Numeric fields
  • Operating with dates
  • Performing simple operations on dates
  • Subtracting dates with the Calculator step
  • Getting information relative to the current date
  • Using the Get System Info step
  • Performing other useful operations on dates
  • Getting the month names with a User Defined Java Class step
  • Modifying the metadata of streams
  • Working with complex structures
  • Working with XML
  • Introducing XML terminology
  • Getting familiar with the XPath notation
  • Parsing XML structures with PDI
  • Reading an XML file with the Get data from XML step
  • Parsing an XML structure stored in a field
  • PDI Transformation and Job files
  • Parsing JSON structures
  • Introducing JSON terminology
  • Getting familiar with the JSONPath notation
  • Parsing JSON structures with PDI
  • Reading a JSON file with the JSON input step
  • Parsing a JSON structure stored in a field
  • Summary
  • Chapter 6: Controlling the Flow of Data
  • Filtering data
  • Filtering rows upon conditions
  • Reading a file and getting the list of words found in it
  • Filtering unwanted rows with a Filter rows step
  • Filtering rows by using the Java Filter step
  • Filtering data based on row numbers
  • Splitting streams unconditionally
  • Copying rows.
  • Distributing rows
  • Introducing partitioning and clustering
  • Splitting the stream based on conditions
  • Splitting a stream based on a simple condition
  • Exploring PDI steps for splitting a stream based on conditions
  • Merging streams in several ways
  • Merging two or more streams
  • Customizing the way of merging streams
  • Looking up data
  • Looking up data with a Stream lookup step
  • Summary
  • Chapter 7: Cleansing, Validating, and Fixing Data
  • Cleansing data
  • Cleansing data by example
  • Standardizing information
  • Improving the quality of data
  • Introducing PDI steps useful for cleansing data
  • Dealing with non-exact matches
  • Cleansing by doing a fuzzy search
  • Deduplicating non-exact matches
  • Validating data
  • Validating data with PDI
  • Validating and reporting errors to the log
  • Introducing common validations and their implementation with PDI
  • Treating invalid data by splitting and merging streams
  • Fixing data that doesn't match the rules
  • Summary
  • Chapter 8: Manipulating Data by Coding
  • Doing simple tasks with the JavaScript step
  • Using the JavaScript language in PDI
  • Inserting JavaScript code using the JavaScript step
  • Adding fields
  • Modifying fields
  • Organizing your code
  • Controlling the flow using predefined constants
  • Testing the script using the Test script button
  • Parsing unstructured files with JavaScript
  • Doing simple tasks with the Java Class step
  • Using the Java language in PDI
  • Inserting Java code using the Java Class step
  • Learning to insert java code in a Java Class step
  • Data types equivalence
  • Adding fields
  • Modifying fields
  • Controlling the flow with the putRow() function
  • Testing the Java Class using the Test class button
  • Getting the most out of the Java Class step
  • Receiving parameters
  • Reading data from additional steps.
  • Redirecting data to different target steps
  • Parsing JSON structures
  • Avoiding coding using purpose-built steps
  • Summary
  • Chapter 9: Transforming the Dataset
  • Sorting data
  • Sorting a dataset with the sort rows step
  • Working on groups of rows
  • Aggregating data
  • Summarizing the PDI steps that operate on sets of rows
  • Converting rows to columns
  • Converting row data to column data using the Row denormaliser step
  • Aggregating data with a Row Denormaliser step
  • Normalizing data
  • Modifying the dataset with a Row Normaliser step
  • Going forward and backward across rows
  • Picking rows backward and forward with the Analytic Query step
  • Summary
  • Chapter 10: Performing Basic Operations with Databases
  • Connecting to a database and exploring its content
  • Connecting with Relational Database Management Systems
  • Exploring a database with the Database Explorer
  • Previewing and getting data from a database
  • Getting data from the database with the Table input step
  • Using the Table input step to run flexible queries
  • Adding parameters to your queries
  • Using Kettle variables in your queries
  • Inserting, updating, and deleting data
  • Inserting new data into a database table
  • Inserting or updating data with the Insert / Update step
  • Deleting records of a database table with the Delete step
  • Performing CRUD operations with more flexibility
  • Verifying a connection, running DDL scripts, and doing other useful tasks
  • Looking up data in different ways
  • Doing simple lookups with the Database Value Lookup step
  • Making a performance difference when looking up data in a database
  • Performing complex database lookups
  • Looking for data using a Database join step
  • Looking for data using a Dynamic SQL row step
  • Summary
  • Chapter 11: Loading Data Marts with PDI
  • Preparing the environment.