Cargando…

Exploring data with RapidMiner : explore, understand, and prepare real data using rapidminer's practical tips and tricks /

A step-by-step tutorial style using examples so that users of different levels will benefit from the facilities offered by RapidMiner. If you are a computer scientist or an engineer who has real data from which you want to extract value, this book is ideal for you. You will need to have at least a b...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Chisholm, Andrew, 1959- (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham, UK : Packt Publishing, 2013.
Colección:Community experience distilled.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Cover; Copyright; Credits; About the Author; About the Reviewer; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Setting the Scene; A process framework; Data volume and velocity; Datavariety, formats, and meanings; Missing data; Cleaning data; Visualizing data; Resource constraints; Terminology; Accompanying material; Summary; Chapter 2: Loading Data; Reading files; Alternative delimiters; Reading complete lines; Reading large numbers of attributes; Splitting files into smaller pieces; Databases; The Read Database operator; Large datasets; Using macros; Summary.
  • Chapter 3: Visualizing DataGetting started; Statistical summaries; Relationships between attributes; Scatter plots; Scatter 3D color; Parallel and deviation; Quartile color; Time series data; Plotting series; Using the survey plotter; Relations between examples; Using histograms; Using block plots; Summary; Chapter 4: Parsing and Converting Attributes; Generating attributes; Date functions; Regular expression functions; Generating extracts; Regular expressions; XPath; Renaming attributes; Searching and replacing attribute values; Using the Map operator; Using the Replace operator.
  • Using Replace (Dictionary)Summary; Chapter 5: Outliers; Manual inspection; Increasing the data volume; Rules for handling outliers; Automated detection of example outliers; Detect Outlier (Distances); Detect Outlier (Densities); Detect Outlier (LOF); Detect Outliers (COF); Summary; Chapter 6: Missing Values; Missing or empty?; Types of missing data; Missing completely at random; Missing at random; Not missing at random; Categorizing missing data; Finding MCAR data; Finding MAR data; Finding NMAR data; A cautionary note; Effect of missing data; Options for handling missing data.
  • Returning to the root causeIgnore it; Manual editing; Deletion of examples; Deletion of attributes; Imputation with single values; Modeling; Summary; Chapter 7: Transforming Data; Creating new attributes; Aggregation; Using pivoting; Using de-pivoting; Summary; Chapter 8: Reducing Data Size; Removing examples using sampling; Removing attributes; Removing useless attributes; Weighting attributes; Selecting attributes using models; Summary; Chapter 9: Resource Constraints; Measuring and estimating performance; Measuring performance; Adding memory; Parallel processing; Restructuring processes.