Data science for business : what you need to know about data mining and data-analytic thinking /
Annotation
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Otros Autores: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Sebastopol, CA :
O'Reilly Media,
2013.
|
Edición: | 1st ed. |
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Machine generated contents note: 1. Introduction: Data-Analytic Thinking
- The Ubiquity of Data Opportunities
- Example: Hurricane Frances
- Example: Predicting Customer Churn
- Data Science, Engineering, and Data-Driven Decision Making
- Data Processing and "Big Data"
- From Big Data 1.0 to Big Data 2.0
- Data and Data Science Capability as a Strategic Asset
- Data-Analytic Thinking
- This Book
- Data Mining and Data Science, Revisited
- Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist
- Summary
- 2. Business Problems and Data Science Solutions
- Fundamental concepts: A set of canonical data mining tasks; The data mining process; Supervised versus unsupervised data mining
- From Business Problems to Data Mining Tasks
- Supervised Versus Unsupervised Methods
- Data Mining and Its Results
- The Data Mining Process
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
- Implications for Managing the Data Science Team
- Other Analytics Techniques and Technologies
- Statistics
- Database Querying
- Data Warehousing
- Regression Analysis
- Machine Learning and Data Mining
- Answering Business Questions with These Techniques
- Summary
- 3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
- Fundamental concepts: Identifying informative attributes; Segmenting data by progressive attribute selection
- Exemplary techniques: Finding correlations; Attribute/variable selection; Tree induction
- Models, Induction, and Prediction
- Supervised Segmentation
- Selecting Informative Attributes
- Example: Attribute Selection with Information Gain
- Supervised Segmentation with Tree-Structured Models
- Visualizing Segmentations
- Trees as Sets of Rules
- Probability Estimation
- Example: Addressing the Churn Problem with Tree Induction
- Summary
- 4. Fitting a Model to Data
- Fundamental concepts: Finding "optimal" model parameters based on data; Choosing the goal for data mining; Objective functions; Loss functions
- Exemplary techniques: Linear regression; Logistic regression; Support-vector machines
- Classification via Mathematical Functions
- Linear Discriminant Functions
- Optimizing an Objective Function
- An Example of Mining a Linear Discriminant from Data
- Linear Discriminant Functions for Scoring and Ranking Instances
- Support Vector Machines, Briefly
- Regression via Mathematical Functions
- Class Probability Estimation and Logistic "Regression"
- Logistic Regression: Some Technical Details
- Example: Logistic Regression versus Tree Induction
- Nonlinear Functions, Support Vector Machines, and Neural Networks
- Summary
- 5. Overfitting and Its Avoidance
- Fundamental concepts: Generalization; Fitting and overfitting; Complexity control
- Exemplary techniques: Cross-validation; Attribute selection; Tree pruning; Regularization
- Generalization
- Overfitting
- Overfitting Examined
- Holdout Data and Fitting Graphs
- Overfitting in Tree Induction
- Overfitting in Mathematical Functions
- Example: Overfitting Linear Functions
- Example: Why Is Overfitting Bad?
- From Holdout Evaluation to Cross-Validation
- The Churn Dataset Revisited
- Learning Curves
- Overfitting Avoidance and Complexity Control
- Avoiding Overfitting with Tree Induction
- A General Method for Avoiding Overfitting
- Avoiding Overfitting for Parameter Optimization
- Summary
- 6. Similarity, Neighbors, and Clusters
- Fundamental concepts: Calculating similarity of objects described by data; Using similarity for prediction; Clustering as similarity-based segmentation
- Exemplary techniques: Searching for similar entities; Nearest neighbor methods; Clustering methods; Distance metrics for calculating similarity
- Similarity and Distance
- Nearest-Neighbor Reasoning
- Example: Whiskey Analytics
- Nearest Neighbors for Predictive Modeling
- How Many Neighbors and How Much Influence?
- Geometric Interpretation, Overfitting, and Complexity Control
- Issues with Nearest-Neighbor Methods
- Some Important Technical Details Relating to Similarities and Neighbors
- Heterogeneous Attributes
- Other Distance Functions
- Combining Functions: Calculating Scores from Neighbors
- Clustering
- Example: Whiskey Analytics Revisited
- Hierarchical Clustering
- Nearest Neighbors Revisited: Clustering Around Centroids
- Example: Clustering Business News Stories
- Understanding the Results of Clustering
- Using Supervised Learning to Generate Cluster Descriptions
- Stepping Back: Solving a Business Problem Versus Data Exploration
- Summary
- 7. Decision Analytic Thinking I: What Is a Good Model?
- Fundamental concepts: Careful consideration of what is desired from data science results; Expected value as a key evaluation framework; Consideration of appropriate comparative baselines
- Exemplary techniques: Various evaluation metrics; Estimating costs and benefits; Calculating expected profit; Creating baseline methods for comparison
- Evaluating Classifiers
- Plain Accuracy and Its Problems
- The Confusion Matrix
- Problems with Unbalanced Classes
- Problems with Unequal Costs and Benefits
- Generalizing Beyond Classification
- A Key Analytical Framework: Expected Value
- Using Expected Value to Frame Classifier Use
- Using Expected Value to Frame Classifier Evaluation
- Evaluation, Baseline Performance, and Implications for Investments in Data
- Summary
- 8. Visualizing Model Performance
- Fundamental concepts: Visualization of model performance under various kinds of uncertainty; Further consideration of what is desired from data mining results
- Exemplary techniques: Profit curves; Cumulative response curves; Lift curves; ROC curves
- Ranking Instead of Classifying
- Profit Curves
- ROC Graphs and Curves
- The Area Under the ROC Curve (AUC)
- Cumulative Response and Lift Curves
- Example: Performance Analytics for Churn Modeling
- Summary
- 9. Evidence and Probabilities
- Fundamental concepts: Explicit evidence combination with Bayes' Rule; Probabilistic reasoning via assumptions of conditional independence
- Exemplary techniques: Naive Bayes classification; Evidence lift
- Example: Targeting Online Consumers With Advertisements
- Combining Evidence Probabilistically
- Joint Probability and Independence
- Bayes' Rule
- Applying Bayes' Rule to Data Science
- Conditional Independence and Naive Bayes
- Advantages and Disadvantages of Naive Bayes
- A Model of Evidence "Lift"
- Example: Evidence Lifts from Facebook "Likes"
- Evidence in Action: Targeting Consumers with Ads
- Summary
- 10. Representing and Mining Text
- Fundamental concepts: The importance of constructing mining-friendly data representations; Representation of text for data mining
- Exemplary techniques: Bag of words representation; TFIDF calculation; N-grams; Stemming; Named entity extraction; Topic models
- Why Text Is Important
- Why Text Is Difficult
- Representation
- Bag of Words
- Term Frequency
- Measuring Sparseness: Inverse Document Frequency
- Combining Them: TFIDF
- Example: Jazz Musicians
- The Relationship of IDF to Entropy
- Beyond Bag of Words
- N-gram Sequences
- Named Entity Extraction
- Topic Models
- Example: Mining News Stories to Predict Stock Price Movement
- The Task
- The Data
- Data Preprocessing
- Results
- Summary
- 11. Decision Analytic Thinking II: Toward Analytical Engineering
- Fundamental concept: Solving business problems with data science starts with analytical engineering: designing an analytical solution, based on the data, tools, and techniques available
- Exemplary technique: Expected value as a framework for data science solution design
- Targeting the Best Prospects for a Charity Mailing
- The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces
- A Brief Digression on Selection Bias
- Our Churn Example Revisited with Even More Sophistication
- The Expected Value Framework: Structuring a More Complicated Business Problem
- Assessing the Influence of the Incentive
- From an Expected Value Decomposition to a Data Science Solution
- Summary
- 12.
- Other Data Science Tasks and Techniques
- Fundamental concepts: Our fundamental concepts as the basis of many common data science techniques; The importance of familiarity with the building blocks of data science
- Exemplary techniques: Association and co-occurrences; Behavior profiling; Link prediction; Data reduction; Latent information mining; Movie recommendation; Bias-variance decomposition of error; Ensembles of models; Causal reasoning from data
- Co-occurrences and Associations: Finding Items That Go Together
- Measuring Surprise: Lift and Leverage
- Example: Beer and Lottery Tickets
- Associations Among Facebook Likes
- Profiling: Finding Typical Behavior
- Link Prediction and Social Recommendation
- Data Reduction, Latent Information, and Movie Recommendation
- Bias, Variance, and Ensemble Methods
- Data-Driven Causal Explanation and a Viral Marketing Example
- Summary
- 13. Data Science and Business Strategy
- Fundamental concepts: Our principles as the basis of success for a data-driven business; Acquiring and sustaining competitive advantage via data science; The importance of careful curation of data science capability
- Thinking Data-Analytically, Redux
- Achieving Competitive Advantage with Data Science
- Sustaining Competitive Advantage with Data Science
- Formidable Historical Advantage
- Unique Intellectual Property
- Unique Intangible Collateral Assets
- Superior Data Scientists
- Superior Data Science Management
- Attracting and Nurturing Data Scientists and Their Teams
- Examine Data Science Case Studies
- Be Ready to Accept Creative Ideas from Any Source
- Be Ready to Evaluate Proposals for Data Science Projects
- Example Data Mining Proposal.
- Note continued: Flaws in the Big Red Proposal
- A Firm's Data Science Maturity
- 14. Conclusion
- The Fundamental Concepts of Data Science
- Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
- Changing the Way We Think about Solutions to Business Problems
- What Data Can't Do: Humans in the Loop, Revisited
- Privacy, Ethics, and Mining Data About Individuals
- Is There More to Data Science?
- Final Example: From Crowd-Sourcing to Cloud-Sourcing
- Final Words.