Python Data Mining Quick Start Guide : a Beginner's Guide to Extracting Valuable Insights from Your Data.
This book is an introduction to data mining and its practical demonstration of working with real-world data sets. With this book, you will be able to extract useful insights using common Python libraries. You will also learn key stages like data loading, cleaning, analysis, visualization to build an...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing, Limited,
2019.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Chapter 1: Data Mining and Getting Started with Python Tools; Descriptive, predictive, and prescriptive analytics; What will and will not be covered in this book; Recommended readings for further explanation; Setting up Python environments for data mining; Installing the Anaconda distribution and Conda package manager; Installing on Linux; Installing on Windows; Installing on macOS; Launching the Spyder IDE; Launching a Jupyter Notebook; Installing high-performance Python distribution
- Recommended libraries and how to installRecommended libraries; Summary; Chapter 2: Basic Terminology and Our End-to-End Example; Basic data terminology; Sample spaces; Variable types; Data types; Basic summary statistics; An end-to-end example of data mining in Python; Loading data into memory
- viewing and managing with ease using pandas; Plotting and exploring data
- harnessing the power of Seaborn; Transforming data
- PCA and LDA with scikit-learn; Quantifying separations
- k-means clustering and the silhouette score; Making decisions or predictions; Summary
- Chapter 3: Collecting, Exploring, and Visualizing DataTypes of data sources and loading into pandas; Databases; Basic Structured Query Language (SQL) queries; Disks; Web sources; From URLs; From Scikit-learn and Seaborn-included sets; Access, search, and sanity checks with pandas; Basic plotting in Seaborn; Popular types of plots for visualizing data; Scatter plots; Histograms; Jointplots; Violin plots; Pairplots; Summary; Chapter 4: Cleaning and Readying Data for Analysis; The scikit-learn transformer API; Cleaning input data; Missing values; Finding and removing missing values
- Imputing to replace the missing valuesFeature scaling; Normalization; Standardization; Handling categorical data; Ordinal encoding; One-hot encoding; Label encoding; High-dimensional data; Dimension reduction; Feature selection; Feature filtering; The variance threshold; The correlation coefficient; Wrapper methods; Sequential feature selection; Transformation; PCA; LDA; Summary; Chapter 5: Grouping and Clustering Data; Introducing clustering concepts; Location of the group; Euclidean space (centroids); Non-Euclidean space (medioids); Similarity; Euclidean space; The Euclidean distance
- The Manhattan distanceMaximum distance; Non-Euclidean space; The cosine distance; The Jaccard distance; Termination condition; With known number of groupings; Without known number of groupings; Quality score and silhouette score; Clustering methods; Means separation; K-means; Finding k; K-means++; Mini batch K-means; Hierarchical clustering; Reuse the dendrogram to find number of clusters; Plot dendrogram; Density clustering; Spectral clustering; Summary; Chapter 6: Prediction with Regression and Classification; Scikit-learn Estimator API; Introducing prediction concepts