Learning pandas /
Get to grips with pandas--a versatile and high-performance Python library for data manipulation, analysis, and discoveryAbout This Book* Get comfortable using pandas and Python as an effective data exploration and analysis tool* Explore pandas through a framework of data analysis, with an explanatio...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing,
2017.
|
Edición: | Second edition. |
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: pandas and Data Analysis
- Introducing pandas
- Data manipulation, analysis, science, and pandas
- Data manipulation
- Data analysis
- Data science
- Where does pandas fit?
- The process of data analysis
- The process
- Ideation
- Retrieval
- Preparation
- Exploration
- Modeling
- Presentation
- Reproduction
- A note on being iterative and agile
- Relating the book to the process
- Concepts of data and analysis in our tour of pandas
- Types of data
- Structured
- Unstructured
- Semi-structured
- Variables
- Categorical
- Continuous
- Discrete
- Time series data
- General concepts of analysis and statistics
- Quantitative versus qualitative data/analysis
- Single and multivariate analysis
- Descriptive statistics
- Inferential statistics
- Stochastic models
- Probability and Bayesian statistics
- Correlation
- Regression
- Other Python libraries of value with pandas
- Numeric and scientific computing
- NumPy and SciPy
- Statistical analysis
- StatsModels
- Machine learning
- scikit-learn
- PyMC
- stochastic Bayesian modeling
- Data visualization
- matplotlib and seaborn
- Matplotlib
- Seaborn
- Summary
- Chapter 2: Up and Running with pandas
- Installation of Anaconda
- IPython and Jupyter Notebook
- IPython
- Jupyter Notebook
- Introducing the pandas Series and DataFrame
- Importing pandas
- The pandas Series
- The pandas DataFrame
- Loading data from files into a DataFrame
- Visualization
- Summary
- Chapter 3: Representing Univariate Data with the Series
- Configuring pandas
- Creating a Series
- Creating a Series using Python lists and dictionaries
- Creation using NumPy functions
- Creation using a scalar value.
- The .index and .values properties
- The size and shape of a Series
- Specifying an index at creation
- Heads, tails, and takes
- Retrieving values in a Series by label or position
- Lookup by label using the operator and the .ix property
- Explicit lookup by position with .iloc
- Explicit lookup by labels with .loc
- Slicing a Series into subsets
- Alignment via index labels
- Performing Boolean selection
- Re-indexing a Series
- Modifying a Series in-place
- Summary
- Chapter 4: Representing Tabular and Multivariate Data with the DataFrame
- Configuring pandas
- Creating DataFrame objects
- Creating a DataFrame using NumPy function results
- Creating a DataFrame using a Python dictionary and pandas Series objects
- Creating a DataFrame from a CSV file
- Accessing data within a DataFrame
- Selecting the columns of a DataFrame
- Selecting rows of a DataFrame
- Scalar lookup by label or location using .at and .iat
- Slicing using the operator
- Selecting rows using Boolean selection
- Selecting across both rows and columns
- Summary
- Chapter 5: Manipulating DataFrame Structure
- Configuring pandas
- Renaming columns
- Adding new columns with and .insert()
- Adding columns through enlargement
- Adding columns using concatenation
- Reordering columns
- Replacing the contents of a column
- Deleting columns
- Appending new rows
- Concatenating rows
- Adding and replacing rows via enlargement
- Removing rows using .drop()
- Removing rows using Boolean selection
- Removing rows using a slice
- Summary
- Chapter 6: Indexing Data
- Configuring pandas
- The importance of indexes
- The pandas index types
- The fundamental type
- Index
- Integer index labels using Int64Index and RangeIndex
- Floating-point labels using Float64Index
- Representing discrete intervals using IntervalIndex.
- Categorical values as an index
- CategoricalIndex
- Indexing by date and time using DatetimeIndex
- Indexing periods of time using PeriodIndex
- Working with Indexes
- Creating and using an index with a Series or DataFrame
- Selecting values using an index
- Moving data to and from the index
- Reindexing a pandas object
- Hierarchical indexing
- Summary
- Chapter 7: Categorical Data
- Configuring pandas
- Creating Categoricals
- Renaming categories
- Appending new categories
- Removing categories
- Removing unused categories
- Setting categories
- Descriptive information of a Categorical
- Munging school grades
- Summary
- Chapter 8: Numerical and Statistical Methods
- Configuring pandas
- Performing numerical methods on pandas objects
- Performing arithmetic on a DataFrame or Series
- Getting the counts of values
- Determining unique values (and their counts)
- Finding minimum and maximum values
- Locating the n-smallest and n-largest values
- Calculating accumulated values
- Performing statistical processes on pandas objects
- Retrieving summary descriptive statistics
- Measuring central tendency: mean, median, and mode
- Calculating the mean
- Finding the median
- Determining the mode
- Calculating variance and standard deviation
- Measuring variance
- Finding the standard deviation
- Determining covariance and correlation
- Calculating covariance
- Determining correlation
- Performing discretization and quantiling of data
- Calculating the rank of values
- Calculating the percent change at each sample of a series
- Performing moving-window operations
- Executing random sampling of data
- Summary
- Chapter 9: Accessing Data
- Configuring pandas
- Working with CSV and text/tabular format data
- Examining the sample CSV data set
- Reading a CSV file into a DataFrame.
- Specifying the index column when reading a CSV file
- Data type inference and specification
- Specifying column names
- Specifying specific columns to load
- Saving DataFrame to a CSV file
- Working with general field-delimited data
- Handling variants of formats in field-delimited data
- Reading and writing data in Excel format
- Reading and writing JSON files
- Reading HTML data from the web
- Reading and writing HDF5 format files
- Accessing CSV data on the web
- Reading and writing from/to SQL databases
- Reading data from remote data services
- Reading stock data from Yahoo! and Google Finance
- Retrieving options data from Google Finance
- Reading economic data from the Federal Reserve Bank of St. Louis
- Accessing Kenneth French's data
- Reading from the World Bank
- Summary
- Chapter 10: Tidying Up Your Data
- Configuring pandas
- What is tidying your data?
- How to work with missing data
- Determining NaN values in pandas objects
- Selecting out or dropping missing data
- Handling of NaN values in mathematical operations
- Filling in missing data
- Forward and backward filling of missing values
- Filling using index labels
- Performing interpolation of missing values
- Handling duplicate data
- Transforming data
- Mapping data into different values
- Replacing values
- Applying functions to transform data
- Summary
- Chapter 11: Combining, Relating, and Reshaping Data
- Configuring pandas
- Concatenating data in multiple objects
- Understanding the default semantics of concatenation
- Switching axes of alignment
- Specifying join type
- Appending versus concatenation
- Ignoring the index labels
- Merging and joining data
- Merging data from multiple pandas objects
- Specifying the join semantics of a merge operation
- Pivoting data to and from value and indexes
- Stacking and unstacking.
- Stacking using non-hierarchical indexes
- Unstacking using hierarchical indexes
- Melting data to and from long and wide format
- Performance benefits of stacked data
- Summary
- Chapter 12: Data Aggregation
- Configuring pandas
- The split, apply, and combine (SAC) pattern
- Data for the examples
- Splitting data
- Grouping by a single column's values
- Accessing the results of a grouping
- Grouping using multiple columns
- Grouping using index levels
- Applying aggregate functions, transforms, and filters
- Applying aggregation functions to groups
- Transforming groups of data
- The general process of transformation
- Filling missing values with the mean of the group
- Calculating normalized z-scores with a transformation
- Filtering groups from aggregation
- Summary
- Chapter 13: Time-Series Modelling
- Setting up the IPython notebook
- Representation of dates, time, and intervals
- The datetime, day, and time objects
- Representing a point in time with a Timestamp
- Using a Timedelta to represent a time interval
- Introducing time-series data
- Indexing using DatetimeIndex
- Creating time-series with specific frequencies
- Calculating new dates using offsets
- Representing data intervals with date offsets
- Anchored offsets
- Representing durations of time using Period
- Modelling an interval of time with a Period
- Indexing using the PeriodIndex
- Handling holidays using calendars
- Normalizing timestamps using time zones
- Manipulating time-series data
- Shifting and lagging
- Performing frequency conversion on a time-series
- Up and down resampling of a time-series
- Time-series moving-window operations
- Summary
- Chapter 14: Visualization
- Configuring pandas
- Plotting basics with pandas
- Creating time-series charts
- Adorning and styling your time-series plot.