Tabla de Contenidos:
  • Machine generated contents note: 1. Introduction: Data-Analytic Thinking
  • The Ubiquity of Data Opportunities
  • Example: Hurricane Frances
  • Example: Predicting Customer Churn
  • Data Science, Engineering, and Data-Driven Decision Making
  • Data Processing and "Big Data"
  • From Big Data 1.0 to Big Data 2.0
  • Data and Data Science Capability as a Strategic Asset
  • Data-Analytic Thinking
  • This Book
  • Data Mining and Data Science, Revisited
  • Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist
  • Summary
  • 2. Business Problems and Data Science Solutions
  • Fundamental concepts: A set of canonical data mining tasks; The data mining process; Supervised versus unsupervised data mining
  • From Business Problems to Data Mining Tasks
  • Supervised Versus Unsupervised Methods
  • Data Mining and Its Results
  • The Data Mining Process
  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment
  • Implications for Managing the Data Science Team
  • Other Analytics Techniques and Technologies
  • Statistics
  • Database Querying
  • Data Warehousing
  • Regression Analysis
  • Machine Learning and Data Mining
  • Answering Business Questions with These Techniques
  • Summary
  • 3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
  • Fundamental concepts: Identifying informative attributes; Segmenting data by progressive attribute selection
  • Exemplary techniques: Finding correlations; Attribute/variable selection; Tree induction
  • Models, Induction, and Prediction
  • Supervised Segmentation
  • Selecting Informative Attributes
  • Example: Attribute Selection with Information Gain
  • Supervised Segmentation with Tree-Structured Models
  • Visualizing Segmentations
  • Trees as Sets of Rules
  • Probability Estimation
  • Example: Addressing the Churn Problem with Tree Induction
  • Summary
  • 4. Fitting a Model to Data
  • Fundamental concepts: Finding "optimal" model parameters based on data; Choosing the goal for data mining; Objective functions; Loss functions
  • Exemplary techniques: Linear regression; Logistic regression; Support-vector machines
  • Classification via Mathematical Functions
  • Linear Discriminant Functions
  • Optimizing an Objective Function
  • An Example of Mining a Linear Discriminant from Data
  • Linear Discriminant Functions for Scoring and Ranking Instances
  • Support Vector Machines, Briefly
  • Regression via Mathematical Functions
  • Class Probability Estimation and Logistic "Regression"
  • Logistic Regression: Some Technical Details
  • Example: Logistic Regression versus Tree Induction
  • Nonlinear Functions, Support Vector Machines, and Neural Networks
  • Summary
  • 5. Overfitting and Its Avoidance
  • Fundamental concepts: Generalization; Fitting and overfitting; Complexity control
  • Exemplary techniques: Cross-validation; Attribute selection; Tree pruning; Regularization
  • Generalization
  • Overfitting
  • Overfitting Examined
  • Holdout Data and Fitting Graphs
  • Overfitting in Tree Induction
  • Overfitting in Mathematical Functions
  • Example: Overfitting Linear Functions
  • Example: Why Is Overfitting Bad?
  • From Holdout Evaluation to Cross-Validation
  • The Churn Dataset Revisited
  • Learning Curves
  • Overfitting Avoidance and Complexity Control
  • Avoiding Overfitting with Tree Induction
  • A General Method for Avoiding Overfitting
  • Avoiding Overfitting for Parameter Optimization
  • Summary
  • 6. Similarity, Neighbors, and Clusters
  • Fundamental concepts: Calculating similarity of objects described by data; Using similarity for prediction; Clustering as similarity-based segmentation
  • Exemplary techniques: Searching for similar entities; Nearest neighbor methods; Clustering methods; Distance metrics for calculating similarity
  • Similarity and Distance
  • Nearest-Neighbor Reasoning
  • Example: Whiskey Analytics
  • Nearest Neighbors for Predictive Modeling
  • How Many Neighbors and How Much Influence?
  • Geometric Interpretation, Overfitting, and Complexity Control
  • Issues with Nearest-Neighbor Methods
  • Some Important Technical Details Relating to Similarities and Neighbors
  • Heterogeneous Attributes
  • Other Distance Functions
  • Combining Functions: Calculating Scores from Neighbors
  • Clustering
  • Example: Whiskey Analytics Revisited
  • Hierarchical Clustering
  • Nearest Neighbors Revisited: Clustering Around Centroids
  • Example: Clustering Business News Stories
  • Understanding the Results of Clustering
  • Using Supervised Learning to Generate Cluster Descriptions
  • Stepping Back: Solving a Business Problem Versus Data Exploration
  • Summary
  • 7. Decision Analytic Thinking I: What Is a Good Model?
  • Fundamental concepts: Careful consideration of what is desired from data science results; Expected value as a key evaluation framework; Consideration of appropriate comparative baselines
  • Exemplary techniques: Various evaluation metrics; Estimating costs and benefits; Calculating expected profit; Creating baseline methods for comparison
  • Evaluating Classifiers
  • Plain Accuracy and Its Problems
  • The Confusion Matrix
  • Problems with Unbalanced Classes
  • Problems with Unequal Costs and Benefits
  • Generalizing Beyond Classification
  • A Key Analytical Framework: Expected Value
  • Using Expected Value to Frame Classifier Use
  • Using Expected Value to Frame Classifier Evaluation
  • Evaluation, Baseline Performance, and Implications for Investments in Data
  • Summary
  • 8. Visualizing Model Performance
  • Fundamental concepts: Visualization of model performance under various kinds of uncertainty; Further consideration of what is desired from data mining results
  • Exemplary techniques: Profit curves; Cumulative response curves; Lift curves; ROC curves
  • Ranking Instead of Classifying
  • Profit Curves
  • ROC Graphs and Curves
  • The Area Under the ROC Curve (AUC)
  • Cumulative Response and Lift Curves
  • Example: Performance Analytics for Churn Modeling
  • Summary
  • 9. Evidence and Probabilities
  • Fundamental concepts: Explicit evidence combination with Bayes' Rule; Probabilistic reasoning via assumptions of conditional independence
  • Exemplary techniques: Naive Bayes classification; Evidence lift
  • Example: Targeting Online Consumers With Advertisements
  • Combining Evidence Probabilistically
  • Joint Probability and Independence
  • Bayes' Rule
  • Applying Bayes' Rule to Data Science
  • Conditional Independence and Naive Bayes
  • Advantages and Disadvantages of Naive Bayes
  • A Model of Evidence "Lift"
  • Example: Evidence Lifts from Facebook "Likes"
  • Evidence in Action: Targeting Consumers with Ads
  • Summary
  • 10. Representing and Mining Text
  • Fundamental concepts: The importance of constructing mining-friendly data representations; Representation of text for data mining
  • Exemplary techniques: Bag of words representation; TFIDF calculation; N-grams; Stemming; Named entity extraction; Topic models
  • Why Text Is Important
  • Why Text Is Difficult
  • Representation
  • Bag of Words
  • Term Frequency
  • Measuring Sparseness: Inverse Document Frequency
  • Combining Them: TFIDF
  • Example: Jazz Musicians
  • The Relationship of IDF to Entropy
  • Beyond Bag of Words
  • N-gram Sequences
  • Named Entity Extraction
  • Topic Models
  • Example: Mining News Stories to Predict Stock Price Movement
  • The Task
  • The Data
  • Data Preprocessing
  • Results
  • Summary
  • 11. Decision Analytic Thinking II: Toward Analytical Engineering
  • Fundamental concept: Solving business problems with data science starts with analytical engineering: designing an analytical solution, based on the data, tools, and techniques available
  • Exemplary technique: Expected value as a framework for data science solution design
  • Targeting the Best Prospects for a Charity Mailing
  • The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces
  • A Brief Digression on Selection Bias
  • Our Churn Example Revisited with Even More Sophistication
  • The Expected Value Framework: Structuring a More Complicated Business Problem
  • Assessing the Influence of the Incentive
  • From an Expected Value Decomposition to a Data Science Solution
  • Summary
  • 12.
  • Other Data Science Tasks and Techniques
  • Fundamental concepts: Our fundamental concepts as the basis of many common data science techniques; The importance of familiarity with the building blocks of data science
  • Exemplary techniques: Association and co-occurrences; Behavior profiling; Link prediction; Data reduction; Latent information mining; Movie recommendation; Bias-variance decomposition of error; Ensembles of models; Causal reasoning from data
  • Co-occurrences and Associations: Finding Items That Go Together
  • Measuring Surprise: Lift and Leverage
  • Example: Beer and Lottery Tickets
  • Associations Among Facebook Likes
  • Profiling: Finding Typical Behavior
  • Link Prediction and Social Recommendation
  • Data Reduction, Latent Information, and Movie Recommendation
  • Bias, Variance, and Ensemble Methods
  • Data-Driven Causal Explanation and a Viral Marketing Example
  • Summary
  • 13. Data Science and Business Strategy
  • Fundamental concepts: Our principles as the basis of success for a data-driven business; Acquiring and sustaining competitive advantage via data science; The importance of careful curation of data science capability
  • Thinking Data-Analytically, Redux
  • Achieving Competitive Advantage with Data Science
  • Sustaining Competitive Advantage with Data Science
  • Formidable Historical Advantage
  • Unique Intellectual Property
  • Unique Intangible Collateral Assets
  • Superior Data Scientists
  • Superior Data Science Management
  • Attracting and Nurturing Data Scientists and Their Teams
  • Examine Data Science Case Studies
  • Be Ready to Accept Creative Ideas from Any Source
  • Be Ready to Evaluate Proposals for Data Science Projects
  • Example Data Mining Proposal.
  • Note continued: Flaws in the Big Red Proposal
  • A Firm's Data Science Maturity
  • 14. Conclusion
  • The Fundamental Concepts of Data Science
  • Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
  • Changing the Way We Think about Solutions to Business Problems
  • What Data Can't Do: Humans in the Loop, Revisited
  • Privacy, Ethics, and Mining Data About Individuals
  • Is There More to Data Science?
  • Final Example: From Crowd-Sourcing to Cloud-Sourcing
  • Final Words.