Scikit-learn Cookbook : over 50 recipes to incorporate scikit-learn into every step of the data science pipeline, from feature extraction to model building and model evaluation /
If you're a data scientist already familiar with Python but not Scikit-Learn, or are familiar with other programming languages like R and want to take the plunge with the gold standard of Python machine learning libraries, then this is the book for you.
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham, U.K. :
Packt Publishing,
2014.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Premodel Workflow; Introduction; Getting sample data from external sources; Creating sample data for toy analysis; Scaling data to the standard normal; Creating binary features through thresholding; Working with categorical variables; Binarizing label features; Imputing missing values through various strategies; Using Pipelines for multiple preprocessing steps; Reducing dimensionality with PCA; Using factor analysis for decomposition
- Kernel PCA for nonlinear dimensionality reductionUsing truncated SVD to reduce dimensionality; Decomposition to classify with DictionaryLearning; Putting it all together with Pipelines; Using Gaussian processes for regression; Defining the Gaussian process object directly; Using stochastic gradient descent for regression; Chapter 2: Working with Linear Models; Introduction; Fitting a line through data; Evaluating the linear regression model; Using ridge regression to overcome linear regression's shortfalls; Optimizing the ridge regression parameter; Using sparsity to regularize models
- Taking a more fundamental approach to regularization with LARSUsing linear methods for classification
- logistic regression; Directly applying Bayesian ridge regression; Using boosting to learn from errors; Chapter 3: Building Models with Distance Metrics; Introduction; Using KMeans to cluster data; Optimizing the number of centroids; Assessing cluster correctness; Using MiniBatch KMeans to handle more data; Quantizing an image with KMeans clustering; Finding the closest objects in the feature space; Probabilistic clustering with Gaussian Mixture Models; Using KMeans for outlier detection
- Using k-NN for regressionChapter 4: Classifying Data with scikit-learn; Introduction; Doing basic classifications with Decision Trees; Tuning a Decision Tree model; Using many Decision Trees
- random forests; Tuning a random forest model; Classifying data with Support Vector Machines; Generalizing with multiclass classification; Using LDA for classification; Working with QDA
- a nonlinear LDA; Using Stochastic Gradient Descent for classification; Classifying documents with Naïve Bayes; Label propagation with semi-supervised learning; Chapter 5: Post-model Workflow; Introduction
- K-fold cross validationAutomatic cross validation; Cross validation with ShuffleSplit; Stratified k-fold; Poor man's grid search; Brute force grid search; Using dummy estimators to compare results; Regression model evaluation; Feature selection; Feature selection on L1 norms; Persisting models with joblib; Index