Cargando…

Learning Apache Mahout : acquire practical skills in Big Data Analytics and explore data science with Apache Mahout /

If you are a Java developer and want to use Mahout and machine learning to solve Big Data Analytics use cases then this book is for you. Familiarity with shell scripts is assumed but no prior experience is required.

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Tiwary, Chandramani (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham, UK : Packt Publishing, 2015.
Colección:Community experience distilled.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Introduction to Mahout; Why Mahout; Simple techniques and more data is better; Sampling is difficult; Community and license; When Mahout; Data too large for single machine; Data already on Hadoop; Algorithms implemented in Mahout; How Mahout; Setting up the development environment ; Configuring Maven; Configuring Mahout; Configuring Eclipse with the Maven plugin and Mahout; Mahout command line; Clustering example; A classification example
  • Mahout API
  • Java program exampleThe dataset; Parallel versus in-memory execution mode; Summary; Chapter 2: Core Concepts in Machine Learning; Supervised learning; Determine the objective; Decide the training data; Create and clean the training set; Feature extraction; Train the models; Bagging; Boosting; Validation; Holdout-set validation; K-fold cross validation; Evaluation; Bias-variance trade-off; Function complexity and amount of training data; Dimensionality of the input space; Noise in data; Unsupervised learning; Cluster analysis; Objective; Feature representation
  • Algorithm for clusteringA stopping criteria; Frequent pattern mining; Measures for identifying interesting rules; Things to consider; Recommender system; Collaborative filtering; Cold start; Scalability; Sparsity; Content-based filtering; Model efficacy; Classification; Confusion matrix; ROC curve and AUC; Regression; Mean absolute error; Root mean squared error; R-square; Adjusted R-square; Recommendation system; Score difference; Precision and recall; Clustering; The internal evaluation; External evaluation; Summary; Chapter 3: Feature Engineering; Feature engineering; Feature construction
  • Categorical featuresContinuous features; Feature extraction; Feature selection; Filter-based feature selection; Wrapper-based feature selection; Embedded feature selection; Dimensionality reduction; Summary; Chapter 4: Classification with Mahout; Classification; White box models; Black box models; Logistic regression; Mahout logistic regression command line; Getting the data; Model building via command line; Train the model command line option; Testing the model; Prediction; Adaptive regression model; Code example with logistic regression; Train the model
  • The LogisticRegressionParameter and CsvRecordFactory classCode example without the parameter class; Testing the online regression model; Getting predictions from OnlineLogisticRegression; CrossFoldLearner example; Random forest; Bagging; Random subsets of features; Out-of-bag error estimate; Random forest using the command line; Predictions from random forest; Naïve Bayes classifier; Numeric features with naïve Bayes; Command line; Summary; Chapter 5: Frequent Pattern Mining and Topic Modeling; Frequent pattern mining; Building FP Tree; Constructing the tree