Mastering data analysis with R : gain clear insights into your data and solve real-world data science problems with R--from data munging to modeling and visualization /
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham, UK :
Packt Publishing,
2015.
|
Colección: | Community experience distilled.
|
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Cover ; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Hello, Data!; Loading text files of a reasonable size; Data files larger than the physical memory; Benchmarking text file parsers; Loading a subset of text files; Filtering flat files before loading to R; Loading data from databases; Setting up the test environment; MySQL and MariaDB; PostgreSQL; Oracle database; ODBC database access; Using a graphical user interface to connect to databases; Other database backends; Importing data from other statistical systems
- Loading Excel spreadsheetsSummary; Chapter 2: Getting Data from the Web; Loading datasets from the Internet; Other popular online data formats; Reading data from HTML tables; Reading tabular data from static Web pages; Scraping data from other online sources; R packages to interact with data source APIs; Socrata Open Data API; Finance APIs; Fetching time series with Quandl; Google documents and analytics; Online search trends; Historical weather data; Other online data sources; Summary; Chapter 3: Filtering and Summarizing Data; Drop needless data; Drop needless data in an efficient way
- Drop needless data in another efficient wayAggregation; Quicker aggregation with base R commands; Convenient helper functions; High-performance helper functions; Aggregate with data.table; Running benchmarks; Summary functions; Adding up the number of cases in subgroups; Summary; Chapter 4: Restructuring Data; Transposing matrices; Filtering data by string matching; Rearranging data; dplyr versus data.table; Computing new variables; Memory profiling; Creating multiple variables at a time; Computing new variables with dplyr; Merging datasets; Reshaping data in a flexible way
- Converting wide tables to the long table formatConverting long tables to the wide table format; Tweaking performance; The evolution of the reshape packages; Summary; Chapter 5: Building Models (authored by Renata Nemeth and Gergely Toth); The motivation behind multivariate models; Linear regression with continuous predictors; Model interpretation; Multiple predictors; Model assumptions; How well does the line fit in the data?; Discrete predictors; Summary; Chapter 6: Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth); The modeling workflow; Logistic regression
- Data considerationsGoodness of model fit; Model comparison; Models for count data; Poisson regression; Negative binomial regression; Multivariate non-linear models; Summary; Chapter 7: Unstructured Data; Importing the corpus; Cleaning the corpus; Visualizing the most frequent words in the corpus; Further cleanup; Stemming words; Lemmatisation; Analyzing the associations among terms; Some other metrics; The segmentation of documents; Summary; Chapter 8: Polishing Data; The types and origins of missing data; Identifying missing data; By-passing missing values