Cargando…

Learning Pentaho Data Integration 8 CE - Third Edition.

Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide About This Book Manipulate your data by exploring, transforming, validating, and integrating it using Pentaho Data Integration 8 CE A comprehensive guide exploring the features of Pentaho Data Integrati...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autor principal:	Roldan, Maria Carina
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Birmingham : Packt Publishing, 2017.
Edición:	3rd ed.
Temas:	Data integration (Computer science) Data mining. Decision support systems. Data Mining Decision Support Systems, Management Intégration de données (Informatique) Exploration de données (Informatique) Systèmes d'aide à la décision. Data mining Decision support systems
Acceso en línea:	Texto completo

Tabla de Contenidos:

Cover
Title Page
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Getting Started with Pentaho Data Integration
Pentaho Data Integration and Pentaho BI Suite
Introducing Pentaho Data Integration
Using PDI in real-world scenarios
Loading data warehouses or data marts
Integrating data
Data cleansing
Migrating information
Exporting data
Integrating PDI along with other Pentaho tools
Installing PDI
Launching the PDI Graphical Designer
Spoon
Starting and customizing Spoon
Exploring the Spoon interface
Extending the PDI functionality through the Marketplace
Introducing transformations
The basics about transformations
Creating a Hello World! Transformation
Designing a Transformation
Previewing and running a Transformation
Installing useful related software
Summary
Chapter 2: Getting Started with Transformations
Designing and previewing transformations
Getting familiar with editing features
Using the mouseover assistance toolbar
Adding steps and creating hops
Working with grids
Designing transformations
Putting the editing features in practice
Previewing and fixing errors as they appear
Looking at the results in the execution results pane
The Logging tab
The Step Metrics tab
Running transformations in an interactive fashion
Understanding PDI data and metadata
Understanding the PDI rowset
Adding or modifying fields by using different PDI steps
Explaining the PDI data types
Handling errors
Implementing the error handling functionality
Customizing the error handling
Summary
Chapter 3: Creating Basic Task Flows
Introducing jobs
Learning the basics about jobs
Creating a Simple Job
Designing and running jobs.
Revisiting the Spoon interface and the editing features
Designing jobs
Getting familiar with the job design process
Looking at the results in the Execution results window
The Logging tab
The Job metrics tab
Enriching your work by sending an email
Running transformations from a Job
Using the Transformation Job Entry
Understanding and changing the flow of execution
Changing the flow of execution based on conditions
Forcing a status with an abort Job or success entry
Changing the execution to be synchronous
Managing files
Creating a Job that moves some files
Selecting files and folders
Working with regular expressions
Summarizing the Job entries that deal with files
Customizing the file management
Knowing the basics about Kettle variables
Understanding the kettle.properties file
How and when you can use variables
Summary
Chapter 4: Reading and Writing Files
Reading data from files
Reading a simple file
Troubleshooting reading files
Learning to read all kind of files
Specifying the name and location of the file
Reading several files at the same time
Reading files that are compressed or located on a remote server
Reading a file whose name is known at runtime
Describing the incoming fields
Reading Date fields
Reading Numeric fields
Reading only a subset of the file
Reading the most common kinds of sources
Reading text files
Reading spreadsheets
Reading XML files
Reading JSON files
Outputting data to files
Creating a simple file
Learning to create all kind of files and write data into them
Providing the name and location of an output file
Creating a file whose name is known only at runtime
Creating several files whose name depend on the content of the file
Describing the content of the output file.
Formatting Date fields
Formatting Numeric fields
Creating the most common kinds of files
Creating text files
Creating spreadsheets
Creating XML files
Creating JSON files
Working with Big Data and cloud sources
Reading files from an AWS S3 instance
Writing files to an AWS S3 instance
Getting data from HDFS
Sending data to HDFS
Summary
Chapter 5: Manipulating PDI Data and Metadata
Manipulating simple fields
Working with strings
Extracting parts of strings using regular expressions
Searching and replacing using regular expressions
Doing some math with Numeric fields
Operating with dates
Performing simple operations on dates
Subtracting dates with the Calculator step
Getting information relative to the current date
Using the Get System Info step
Performing other useful operations on dates
Getting the month names with a User Defined Java Class step
Modifying the metadata of streams
Working with complex structures
Working with XML
Introducing XML terminology
Getting familiar with the XPath notation
Parsing XML structures with PDI
Reading an XML file with the Get data from XML step
Parsing an XML structure stored in a field
PDI Transformation and Job files
Parsing JSON structures
Introducing JSON terminology
Getting familiar with the JSONPath notation
Parsing JSON structures with PDI
Reading a JSON file with the JSON input step
Parsing a JSON structure stored in a field
Summary
Chapter 6: Controlling the Flow of Data
Filtering data
Filtering rows upon conditions
Reading a file and getting the list of words found in it
Filtering unwanted rows with a Filter rows step
Filtering rows by using the Java Filter step
Filtering data based on row numbers
Splitting streams unconditionally
Copying rows.
Distributing rows
Introducing partitioning and clustering
Splitting the stream based on conditions
Splitting a stream based on a simple condition
Exploring PDI steps for splitting a stream based on conditions
Merging streams in several ways
Merging two or more streams
Customizing the way of merging streams
Looking up data
Looking up data with a Stream lookup step
Summary
Chapter 7: Cleansing, Validating, and Fixing Data
Cleansing data
Cleansing data by example
Standardizing information
Improving the quality of data
Introducing PDI steps useful for cleansing data
Dealing with non-exact matches
Cleansing by doing a fuzzy search
Deduplicating non-exact matches
Validating data
Validating data with PDI
Validating and reporting errors to the log
Introducing common validations and their implementation with PDI
Treating invalid data by splitting and merging streams
Fixing data that doesn't match the rules
Summary
Chapter 8: Manipulating Data by Coding
Doing simple tasks with the JavaScript step
Using the JavaScript language in PDI
Inserting JavaScript code using the JavaScript step
Adding fields
Modifying fields
Organizing your code
Controlling the flow using predefined constants
Testing the script using the Test script button
Parsing unstructured files with JavaScript
Doing simple tasks with the Java Class step
Using the Java language in PDI
Inserting Java code using the Java Class step
Learning to insert java code in a Java Class step
Data types equivalence
Adding fields
Modifying fields
Controlling the flow with the putRow() function
Testing the Java Class using the Test class button
Getting the most out of the Java Class step
Receiving parameters
Reading data from additional steps.
Redirecting data to different target steps
Parsing JSON structures
Avoiding coding using purpose-built steps
Summary
Chapter 9: Transforming the Dataset
Sorting data
Sorting a dataset with the sort rows step
Working on groups of rows
Aggregating data
Summarizing the PDI steps that operate on sets of rows
Converting rows to columns
Converting row data to column data using the Row denormaliser step
Aggregating data with a Row Denormaliser step
Normalizing data
Modifying the dataset with a Row Normaliser step
Going forward and backward across rows
Picking rows backward and forward with the Analytic Query step
Summary
Chapter 10: Performing Basic Operations with Databases
Connecting to a database and exploring its content
Connecting with Relational Database Management Systems
Exploring a database with the Database Explorer
Previewing and getting data from a database
Getting data from the database with the Table input step
Using the Table input step to run flexible queries
Adding parameters to your queries
Using Kettle variables in your queries
Inserting, updating, and deleting data
Inserting new data into a database table
Inserting or updating data with the Insert / Update step
Deleting records of a database table with the Delete step
Performing CRUD operations with more flexibility
Verifying a connection, running DDL scripts, and doing other useful tasks
Looking up data in different ways
Doing simple lookups with the Database Value Lookup step
Making a performance difference when looking up data in a database
Performing complex database lookups
Looking for data using a Database join step
Looking for data using a Dynamic SQL row step
Summary
Chapter 11: Loading Data Marts with PDI
Preparing the environment.

Learning Pentaho Data Integration 8 CE - Third Edition.

Ejemplares similares