Pandas Basics
This book is intended for those who plan to become data scientists as well as anyone who needs to perform data cleaning tasks using Pandas and NumPy. --
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Bloomfield :
Mercury Learning & Information,
2022.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright
- Dedication
- Contents
- Preface
- Chapter 1: Introduction to Python
- Tools for Python
- easy_install and pip
- virtualenv
- IPython
- Python Installation
- Setting the PATH Environment Variable (Windows Only)
- Launching Python on Your Machine
- The Python Interactive Interpreter
- Python Identifiers
- Lines, Indentation, and Multi-lines
- Quotations and Comments
- Saving Your Code in a Module
- Some Standard Modules
- The help() and dir() Functions
- Compile Time and Runtime Code Checking
- Simple Data Types
- Working with Numbers
- Working with Other Bases
- The chr() Function
- The round() Function
- Formatting Numbers
- Working with Fractions
- Unicode and UTF-8
- Working with Unicode
- Working with Strings
- Comparing Strings
- Formatting Strings
- Uninitialized Variables and the Value None
- Slicing and Splicing Strings
- Testing for Digits and Alphabetic Characters
- Search and Replace a String in Other Strings
- Remove Leading and Trailing Characters
- Printing Text without NewLine Characters
- Text Alignment
- Working with Dates
- Converting Strings to Dates
- Exception Handling
- Handling User Input
- Command-line Arguments
- Summary
- Chapter 2: Working with Data
- Dealing with Data: What Can Go Wrong?
- What is Data Drift?
- What are Datasets?
- Data Preprocessing
- Data Types
- Preparing Datasets
- Discrete Data Versus Continuous Data
- Binning Continuous Data
- Scaling Numeric Data via Normalization
- Scaling Numeric Data via Standardization
- Scaling Numeric Data via Robust Standardization
- What to Look for in Categorical Data
- Mapping Categorical Data to Numeric Values
- Working with Dates
- Working with Currency
- Working with Outliers and Anomalies
- Outlier Detection/Removal
- Finding Outliers with NumPy
- Finding Outliers with Pandas
- Calculating Z-scores to Find Outliers
- Finding Outliers with SkLearn (Optional)
- Working with Missing Data
- Imputing Values: When is Zero a Valid Value?
- Dealing with Imbalanced Datasets
- What is SMOTE?
- SMOTE extensions
- The Bias-Variance Tradeoff
- Types of Bias in Data
- Analyzing Classifiers (Optional)
- What is LIME?
- What is ANOVA?
- Summary
- Chapter 3: Introduction to Probability and Statistics
- What is a Probability?
- Calculating the Expected Value
- Random Variables
- Discrete versus Continuous Random Variables
- Well-known Probability Distributions
- Fundamental Concepts in Statistics
- The Mean
- The Median
- The Mode
- The Variance and Standard Deviation
- Population, Sample, and Population Variance
- Chebyshev's Inequality
- What is a p-value?
- The Moments of a Function (Optional)
- What is Skewness?
- What is Kurtosis?
- Data and Statistics
- The Central Limit Theorem
- Correlation versus Causation
- Statistical Inferences
- Statistical Terms: RSS, TSS, R^2, and F1 Score