Cargando…

Statistical learning for biomedical data /

This highly motivating introduction to statistical learning machines explains underlying principles in nontechnical language, using many examples and figures.

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Malley, James D.
Otros Autores: Malley, Karen G., Pajevic, Sinisa
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Cambridge ; New York : Cambridge University Press, 2011.
Colección:Practical guides to biostatistics and epidemiology.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • pt. 1. Introduction
  • pt. 2. A machine toolkit
  • pt. 3. Analysis fundamentals
  • pt. 4. Machine strategies.
  • Part I. Introduction
  • 1. Prologue
  • 1.1. Machines that learn
  • some recent history
  • 1.2. Twenty canonical questions
  • 1.3. Outline of the book
  • 1.4. A comment about example datasets
  • 1.5. Software
  • 2. The landscape of learning machines
  • 2.1. Introduction
  • 2.2. Types of data for learning machines
  • 2.3. Will that be supervised or unsupervised?
  • 2.4. An unsupervised example
  • 2.5. More lack of supervision
  • where are the parents?
  • 2.6. Engines, complex and primitive
  • 2.7. Model richness means what, exactly?
  • 2.8. Membership or probability of membership?
  • 2.9. A taxonomy of machines?
  • 2.10. A note of caution
  • one of many
  • 2.11. Highlights from the theory
  • 3. A mangle of machines
  • 3.1. Introduction
  • 3.2. Linear regression
  • 3.3. Logistic regression
  • 3.4. Linear discriminant
  • 3.5. Bayes classifiers
  • regular and naïve
  • 3.6. Logic regression
  • 3.7. k-Nearest neighbors
  • 3.8. Support vector machines
  • 3.9. Neural networks
  • 3.10. Boosting
  • 3.11. Evolutionary and genetic algorithms
  • 4. Three examples and several machines
  • 4.1. Introduction
  • 4.2. Simulated cholesterol data
  • 4.3. Lupus data
  • 4.4. Stroke data
  • 4.5. Biomedical means unbalanced
  • 4.6. Measures of machine performance
  • 4.7. Linear analysis of cholesterol data
  • 4.8. Nonlinear analysis of cholesterol data
  • 4.9. Analysis of the lupus data
  • 4.10. Analysis of the stroke data
  • 4.11. Further analysis of the lupus and stroke data
  • Part II. A machine toolkit
  • 5. Logistic regression
  • 5.1. Introduction
  • 5.2. Inside and around the model
  • 5.3. Interpreting the coefficients
  • 5.4. Using logistic regression as a decision rule
  • 5.5. Logistic regression applied to the cholesterol data
  • 5.6. A cautionary note
  • 5.7. Another cautionary note
  • 5.8. Probability estimates and decision rules
  • 5.9. Evaluating the goodness-of-fit of a logistic regression model
  • 5.10. Calibrating a logistic regression
  • 5.11. Beyond calibration
  • 5.12. Logistic regression and reference models
  • 6. A single decision tree
  • 6.1. Introduction
  • 6.2. Dropping down trees
  • 6.3. Growing a tree
  • 6.4. Selecting features, making splits
  • 6.5. Good split, bad split
  • 6.6. Finding good features for making splits
  • 6.7. Misreading trees
  • 6.8. Stopping and pruning rules
  • 6.9. Using functions of the features
  • 6.10. Unstable trees?
  • 6.11. Variable importance
  • growing on trees?
  • 6.12. Permuting for importance
  • 6.13. The continuing mystery of trees
  • 7. Random Forests
  • trees everywhere
  • 7.1. Random Forests in less than five minutes
  • 7.2. Random treks through the data
  • 7.3. Random treks through the features
  • 7.4. Walking through the forest
  • 7.5. Weighted and unweighted voting
  • 7.6. Finding subsets in the data using proximities
  • 7.7. Applying Random Forests to the Stroke data
  • 7.8. Random Forests in the universe of machines
  • Part III. Analysis fundamentals
  • 8. Merely two variables
  • 8.1. Introduction
  • 8.2. Understanding correlations
  • 8.3. Hazards of correlations
  • 8.4. Correlations big and small
  • 9. More than two variables
  • 9.1. Introduction
  • 9.2. Tiny problems, large consequences
  • 9.3. Mathematics to the rescue?
  • 9.4. Good models need not be unique
  • 9.5. Contexts and coefficients
  • 9.6. Interpreting and testing coefficients in models
  • 9.7. Merging models, pooling lists, ranking features
  • 10. Resampling methods
  • 10.1. Introduction
  • 10.2. The bootstrap
  • 10.3. When the bootstrap works
  • 10.4. When the bootstrap doesn't work
  • 10.5. Resampling from a single group in different ways
  • 10.6. Resampling from groups with unequal sizes
  • 10.7. Resampling from small datasets
  • 10.8. Permutation methods
  • 10.9. Still more on permutation methods
  • 11. Error analysis and model validation
  • 11.1. Introduction
  • 11.2. Errors? What errors?
  • 11.3. Unbalanced data, unbalanced errors
  • 11.4. Error analysis for a single machine
  • 11.5. Cross-validation error estimation
  • 11.6. Cross-validation or cross-training?
  • 11.7. The leave-one-out method
  • 11.8. The out-of-bag method
  • 11.9. Intervals for error estimates for a single machine
  • 11.10. Tossing random coins into the abyss
  • 11.11. Error estimates for unbalanced data
  • 11.12. Confidence intervals for comparing error values
  • 11.13. Other measures of machine accuracy
  • 11.14. Benchmarking and winning the lottery
  • 11.15. Error analysis for predicting continuous outcomes
  • Part IV. Machine strategies
  • 12. Ensemble methods
  • let's take a vote
  • 12.1. Pools of machines
  • 12.2. Weak correlation with outcome can be good enough
  • 12.3. Model averaging
  • 13. Summary and conclusions
  • 13.1. Where have we been?
  • 13.2. So many machines
  • 13.3. Binary decision or probability estimate?
  • 13.4. Survival machines? Risk machines?
  • 13.5. And where are we going?