Statistics, Data Mining, and Machine Learning in Astronomy : A Practical Python Guide for the Analysis of Survey Data /
More specifically, "this book is mostly about how to estimate the empirical pdf [probability density function] f(x) from data (including multidimensional cases), how to statistically describe the resulting estimate and its uncertainty, how to compare it to models specified via h(x) (including e...
Autores principales: | , , , |
---|---|
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Princeton, N.J. :
Princeton University Press,
2014.
|
Colección: | Book collections on Project MUSE.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- I. Introduction
- 1. About the Book and Supporting Material
- 1.1. What do Data Mining, Machine Learning, and Knowledge Discovery mean?
- 1.2. What is this book about?
- 1.3. An incomplete survey of the relevant literature
- 1.4. Introduction to the Python Language and the Git Code Management Tool
- 1.5. Description of surveys and data sets used in examples
- 1.6. Plotting and visualizing the data in this book
- 1.7. How to efficiently use this book
- References
- 2. Fast Computation on Massive Data Sets
- 2.1. Data types and Data Management systems
- 2.2. Analysis of algorithmic efficiency
- 2.3. Seven types of computational Problem[s]
- 2.4. Seven strategies for speeding things up
- 2.5. Case studies: Speedup strategies in practice
- References.
- II. Statistical Frameworks and Exploratory Data Analysis
- 3. Probability and Statistical Distributions
- 3.1. Brief overview of probability and random variables
- 3.2. Descriptive statistics
- 3.3. Common Univariate Distribution Functions
- 3.4. The Central Limit Theorem
- 3.5. Bivariate and Multivariate Distribution Functions
- 3.6. Correlation coefficients
- 3.7. Random number generation for arbitrary distributions
- References
- 4. Classical Statistical Inference
- 4.1. Classical vs. Bayesian Statistical Inference
- 4.2. Maximum Likelihood Estimation (MLE)
- 4.3. The goodness of Fit and Model Selection
- 4.4. ML Applied to Gaussian Mixtures: The Expectation Maximization Algorithm
- 4.5. Confidence estimates: the bootstrap and the jackknife
- 4.6. Hypothesis testing
- 4.7. Comparison of distributions
- 4.8. Nonparametric modeling and histograms
- 4.9. Selection effects and Luminosity Function Estimation
- 4.10. Summary
- References
- 5 Bayesian Statistical Inference
- 5.1. Introduction to the Bayesian method
- 5.2. Bayesian priors
- 5.3. Bayesian parameter uncertainty quantification
- 5.4. Bayesian model selection
- 5.5. Nonuniform priors: Eddington, Malmquist, and Lutz-Kelker biases
- 5.6. Simple examples of Bayesian analysis: Parameter estimation
- 5.7. Simple examples of Bayesian analysis: Model selection
- 5.8. Numerical methods for complex problems (MCMC)
- 5.9. Summary of pros and cons for classical and Bayesian methods
- References.
- III. Data Mining and Machine Learning
- 6 Searching for Structure in Point Data
- 6.1. Nonparametric density estimation
- 6.2. Nearest-neighbor density estimation
- 6.3. Parametric density estimation
- 6.4. Finding clusters in data
- 6.5. Correlation functions
- 6.6. Which density estimation and clustering algorithms should I use?
- References
- 7 Dimensionality and its reduction
- 7.1. The curse of dimensionality
- 7.2. The data sets used in this chapter
- 7.3. Principal component analysis
- 7.4. Nonnegative matrix factorization
- 7.5. Manifold learning
- 7.6. Independent component analysis and projection pursuit
- 7.7. Which dimensionality reduction technique should I use?
- References
- 8 Regression and model fitting
- 8.1. Formulation of the regression problem
- 8.2. Regression for linear models
- 8.3. Regularization and penalizing the likelihood
- 8.4. Principal component regression
- 8.5. Kernel regression
- 8.6. Locally linear regression
- 8.7. Nonlinear regression
- 8.8. Uncertainties in the data
- 8.9. Regression that is robust to outliers
- 8.10. Gaussian process regression
- 8.11. Overfitting, underfitting, and cross-validation
- 8.12. Which regression method should I use?
- References.
- III. Data Mining and Machine Learning (continued)
- 9 Classification
- 9.1. Data sets used in this chapter
- 9.2. Assigning categories: Classification
- 9.3. Generative classification
- 9.4. K-nearest-neighbor classifier
- 9.5. Discriminative classification
- 9.6. Support vector machines
- 9.7. Decision trees
- 9.8. Evaluating classifiers: ROC Curves
- 9.9. Which classifier should I use?
- References
- 10 Time Series Analysis
- 10.1. Main concepts for Time Series Analysis
- 10.2. Modeling toolkit for Time Series Analysis
- 10.3. Analysis of Periodic Time Series
- 10.4. Temporally localized signals
- 10.5. Analysis of Stochastic Processes
- 10.6. Which method should I use for Time Series Analysis?
- References.
- IV. Appendices
- A An Introduction to Scientific Computing with Python
- A.1. A brief history of Python
- A.2. The ScyPy universe
- A.3. Getting started with Python
- A.4. IPython: The basics of interactive computing
- A.5. Introduction to NumPy
- A.6. Visualization with Matplotlib
- A.7. Overview of useful NumPy/SciPy modules
- A.8. Efficient coding with Python and NumPy
- A.9. Wrapping existing code in Python
- A.10. Other resources
- B AstroML: Machine Learning for Astronomy
- B.1. Introduction
- B.2. Dependencies
- B.3. Tools included in AstroML v0.1
- C Astronomical Flux Measurements and Magnitudes
- C.1. The definition of the specific flux
- C.2. Wavelength window function for astronomical measurements
- C.3. The astronomical magnitude systems
- D SQL Query for Downloading SDSS Data
- E Approximating the Fourier Transform with the FFT
- References.