Cargando…

The art and science of analyzing software data /

This book provides valuable information on analysis techniques often used to derive insight from software data. It shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science....

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Otros Autores:	Bird, Christian (Editor ), Menzies, Tim (Editor ), Zimmermann, Thomas, Ph. D. (Editor )
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Amsterdam ; Boston : Morgan Kaufmann/Elsevier, [2015]
Temas:	Data mining. Computer programming > Management. Exploration de donn�ees (Informatique) COMPUTERS > Database Management > Data Mining. Computer programming > Management Data mining
Acceso en línea:	Texto completo

Tabla de Contenidos:

Ch. 1 Past, Present, and Future of Analyzing Software Data
1.1. Definitions
1.2. The Past: Origins
1.2.1. Generation 1: Preliminary Work
1.2.2. Generation 2: Academic Experiments
1.2.3. Generation 3: Industrial Experiments
1.2.4. Generation 4: Data Science Everywhere
1.3. Present Day
1.4. Conclusion
Acknowledgments
References
ch. 2 Mining Patterns and Violations Using Concept Analysis
2.1. Introduction
2.1.1. Contributions
2.2. Patterns and Blocks
2.3.Computing All Blocks
2.3.1. Algorithm in a Nutshell
2.4. Mining Shopping Carts with Colibri
2.5. Violations
2.6. Finding Violations
2.7. Two Patterns or One Violation?
2.8. Performance
2.9. Encoding Order
2.10. Inlining
2.11. Related Work
2.11.1. Mining Patterns
2.11.2. Mining Violations
2.11.3. PR-Miner
2.12. Conclusions
Acknowledgments
References
ch. 3 Analyzing Text in Software Projects
3.1. Introduction.
3.2. Textual Software Project Data and Retrieval
3.2.1. Textual Data
3.2.2. Text Retrieval
3.3. Manual Coding
3.3.1. Coding Process
3.3.2. Challenges
3.4. Automated Analysis
3.4.1. Topic Modeling
3.4.2. Part-of-Speech Tagging and Relationship Extraction
3.4.3.n-Grams
3.4.4. Clone Detection
3.4.5. Visualization
3.5. Two Industrial Studies
3.5.1. Naming the Pain in Requirements Engineering: A Requirements Engineering Survey
3.5.2. Clone Detection in Requirements Specifications
3.6. Summary
References
ch. 4 Synthesizing Knowledge from Software Development Artifacts
4.1. Problem Statement
4.2. Artifact Lifecycle Models
4.2.1. Example: Patch Lifecycle
4.2.2. Model Extraction
4.3. Code Review
4.3.1. Mozilla Project
4.3.2. WebKit Project
4.3.3. Blink Project
4.4. Lifecycle Analysis
4.4.1. Mozilla Firefox
4.4.2. WebKit
4.4.3. Blink
4.5. Other Applications
4.6. Conclusion
References.
Ch. 5 A Practical Guide to Analyzing IDE Usage Data
5.1. Introduction
5.2. Usage Data Research Concepts
5.2.1. What is Usage Data and Why Should We Analyze it?
5.2.2. Selecting Relevant Data on the Basis of a Goal
5.2.3. Privacy Concerns
5.2.4. Study Scope
5.3. How to Collect Data
5.3.1. Eclipse Usage Data Collector
5.3.2. Mylyn and the Eclipse Mylyn Monitor
5.3.3. CodingSpectator
5.3.4. Build it Yourself for Visual Studio
5.4. How to Analyze Usage Data
5.4.1. Data Anonymity
5.4.2. Usage Data Format
5.4.3. Magnitude Analysis
5.4.4. Categorization Analysis
5.4.5. Sequence Analysis
5.4.6. State Model Analysis
5.4.7. The Critical Incident Technique
5.4.8. Including Data from Other Sources
5.5. Limits of What You Can Learn from Usage Data
5.6. Conclusion
5.7. Code Listings
Acknowledgments
References
ch. 6 Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data
6.1. Introduction.
6.2. Applications of LDA in Software Analysis
6.3. How LDA Works
6.4. LDA Tutorial
6.4.1. Materials
6.4.2. Acquiring Software-Engineering Data
6.4.3. Text Analysis and Data Transformation
6.4.4. Applying LDA
6.4.5. LDA Output Summarization
6.5. Pitfalls and Threats to Validity
6.5.1. Criterion Validity
6.5.2. Construct Validity
6.5.3. Internal Validity
6.5.4. External Validity
6.5.5. Reliability
6.6. Conclusions
References
ch. 7 Tools and Techniques for Analyzing Product and Process Data
7.1. Introduction
7.2.A Rational Analysis Pipeline
7.2.1. Getting the Data
7.2.2. Selecting
7.2.3. Processing
7.2.4. Summarizing
7.2.5. Plumbing
7.3. Source Code Analysis
7.3.1. Heuristics
7.3.2. Lexical Analysis
7.3.3. Parsing and Semantic Analysis
7.3.4. Third-Party Tools
7.4.Compiled Code Analysis
7.4.1. Assembly Language
7.4.2. Machine Code
7.4.3. Dealing with Name Mangling
7.4.4. Byte Code.
7.4.5. Dynamic Linking
7.4.6. Libraries
7.5. Analysis of Configuration Management Data
7.5.1. Obtaining Repository Data
7.5.2. Analyzing Metadata
7.5.3. Analyzing Time Series Snapshots
7.5.4. Analyzing a Checked Out Repository
7.5.5.Combining Files with Metadata
7.5.6. Assembling Repositories
7.6. Data Visualization
7.6.1. Graphs
7.6.2. Declarative Diagrams
7.6.3. Charts
7.6.4. Maps
7.7. Concluding Remarks
References
ch. 8 Analyzing Security Data
8.1. Vulnerability
8.1.1. Exploits
8.2. Security Data "Gotchas"
8.2.1. Gotcha #1. Having Vulnerabilities is Normal
8.2.2. Gotcha #2. "More Vulnerabilities" Does not Always Mean "Less Secure"
8.2.3. Gotcha #3. Design-Level Flaws are not Usually Tracked
8.2.4. Gotcha #4. Security is Negatively Defined
8.3. Measuring Vulnerability Severity
8.3.1. CVSS Overview
8.3.2. Example CVSS Application
8.3.3. Criticisms of the CVSS.
8.4. Method of Collecting and Analyzing Vulnerability Data
8.4.1. Step 1. Trace Reported Vulnerabilities Back to Fixes
8.4.2. Step 2. Aggregate Source Control Logs
8.4.3. Step 3a. Determine Vulnerability Coverage
8.4.4. Step 3c. Classify According to Engineering Mistake
8.5. What Security Data has Told Us Thus Far
8.5.1. Vulnerabilities have Socio-Technical Elements
8.5.2. Vulnerabilities have Long, Complex Histories
8.6. Summary
References
ch. 9 A Mixed Methods Approach to Mining Code Review Data: Examples and a Study of Multicommit Reviews and Pull Requests
9.1. Introduction
9.2. Motivation for a Mixed Methods Approach
9.3. Review Process and Data
9.3.1. Software Inspection
9.3.2. OSS Code Review
9.3.3. Code Review at Microsoft
9.3.4. Google-Based Gerrit Code Review
9.3.5. GitHub Pull Requests
9.3.6. Data Measures and Attributes
9.4. Quantitative Replication Study: Code Review on Branches.
9.4.1. Research Question 1-Commits per Review
9.4.2. Research Question 2-Size of Commits
9.4.3. Research Question 3-Review Interval
9.4.4. Research Question 4-Reviewer Participation
9.4.5. Conclusion
9.5. Qualitative Approaches
9.5.1. Sampling Approaches
9.5.2. Data Collection
9.5.3. Qualitative Analysis of Microsoft Data
9.5.4. Applying Grounded Theory to Archival Data to Understand OSS Review
9.6. Triangulation
9.6.1. Using Surveys to Triangulate Qualitative Findings
9.6.2. How Multicommit Branches are Reviewed in Linux
9.6.3. Closed Coding: Branch or Revision on GitHub and Gerrit
9.6.4. Understanding Why Pull Requests are Rejected
9.7. Conclusion
References
ch. 10 Mining Android Apps for Anomalies
10.1. Introduction
10.2. Clustering Apps by Description
10.2.1. Collecting Applications
10.2.2. Preprocessing Descriptions with NLP
10.2.3. Identifying Topics with LDA
10.2.4. Clustering Apps with K-means.
10.2.5. Finding the Best Number of Clusters
10.2.6. Resulting App Clusters
10.3. Identifying Anomalies by APIs
10.3.1. Extracting API Usage
10.3.2. Sensitive and Rare APIs
10.3.3. Distance-Based Outlier Detection
10.3.4. CHABADA as a Malware Detector
10.4. Evaluation
10.4.1. RQ1: Anomaly Detection
10.4.2. RQ2: Feature Selection
10.4.3. RQ3: Malware Detection
10.4.4. Limitations and Threats to Validity
10.5. Related Work
10.5.1. Mining App Descriptions
10.5.2. Behavior/Description Mismatches
10.5.3. Detecting Malicious Apps
10.6. Conclusion and Future Work
Acknowledgments
References
ch. 11 Change Coupling Between Software Artifacts: Learning from Past Changes
11.1. Introduction
11.2. Change Coupling
11.2.1. Why Do Artifacts Co-Change?
11.2.2. Benefits of Using Change Coupling
11.3. Change Coupling Identification Approaches
11.3.1. Raw Counting
11.3.2. Association Rules
11.3.3. Time-Series Analysis.
11.4. Challenges in Change Coupling Identification
11.4.1. Impact of Commit Practices
11.4.2. Practical Advice for Change Coupling Detection
11.4.3. Alternative Approaches
11.5. Change Coupling Applications
11.5.1. Change Prediction and Change Impact Analysis
11.5.2. Discovery of Design Flaws and Opportunities for Refactoring
11.5.3. Architecture Evaluation
11.5.4. Coordination Requirements and Socio-Technical Congruence
11.6. Conclusion
References
ch. 12 Applying Software Data Analysis in Industry Contexts: When Research Meets Reality
12.1. Introduction
12.2. Background
12.2.1. Fraunhofer's Experience in Software Measurement
12.2.2. Terminology
12.2.3. Empirical Methods
12.2.4. Applying Software Measurement in Practice-The General Approach
12.3. Six Key Issues when Implementing a Measurement Program in Industry
12.3.1. Stakeholders, Requirements, and Planning: The Groundwork for a Successful Measurement Program.
12.3.2. Gathering Measurements-How, When, and Who
12.3.3. All Data, No Information-When the Data is not What You Need or Expect
12.3.4. The Pivotal Role of Subject Matter Expertise
12.3.5. Responding to Changing Needs
12.3.6. Effective Ways to Communicate Analysis Results to the Consumers
12.4. Conclusions
References
ch. 13 Using Data to Make Decisions in Software Engineering: Providing a Method to our Madness
13.1. Introduction
13.2. Short History of Software Engineering Metrics
13.3. Establishing Clear Goals
13.3.1. Benchmarking
13.3.2. Product Goals
13.4. Review of Metrics
13.4.1. Contextual Metrics
13.4.2. Constraint Metrics
13.4.3. Development Metrics
13.5. Challenges with Data Analysis on Software Projects
13.5.1. Data Collection
13.5.2. Data Interpretation
13.6. Example of Changing Product Development Through the Use of Data
13.7. Driving Software Engineering Processes with Data
References.
Ch. 14 Community Data for OSS Adoption Risk Management
14.1. Introduction
14.2. Background
14.2.1. Risk and Open Source Software Basic Concepts
14.2.2. Modeling and Analysis Techniques
14.3. An Approach to OSS Risk Adoption Management
14.4. OSS Communities Structure and Behavior Analysis: The XWiki Case
14.4.1. OSS Community Social Network Analysis
14.4.2. Statistical Analytics of Software Quality, OSS Communities' Behavior and OSS Projects
14.4.3. Risk Indicators Assessment via Bayesian Networks
14.4.4. OSS Ecosystems Modeling and Reasoning in i*
14.4.5. Integrating the Analysis for a Comprehensive Risk Assessment
14.5.A Risk Assessment Example: The Moodbile Case
14.6. Related Work
14.6.1. Data Analysis in OSS Communities
14.6.2. Risk Modeling and Analysis via Goal-oriented Techniques
14.7. Conclusions
Acknowledgments
References.
Ch. 15 Assessing the State of Software in a Large Enterprise: A 12-Year Retrospective
15.1. Introduction
15.2. Evolution of the Process and the Assessment
15.3. Impact Summary of the State of Avaya Software Report
15.4. Assessment Approach and Mechanisms
15.4.1. Evolution of the Approach Over Time
15.5. Data Sources
15.5.1. Data Accuracy
15.5.2. Types of Data Analyzed
15.6. Examples of Analyses
15.6.1. Demographic Analyses
15.6.2. Analysis of Predictability
15.6.3. Risky File Management
15.7. Software Practices
15.7.1. Original Seven Key Software Areas
15.7.2. Four Practices Tracked as Representative
15.7.3. Example Practice Area-Design Quality In
15.7.4. Example Individual Practice-Static Analysis
15.8. Assessment Follow-up: Recommendations and Impact
15.8.1. Example Recommendations
15.8.2. Deployment of Recommendations
15.9. Impact of the Assessments
15.9.1. Example: Automated Build Management.
15.9.2. Example: Deployment of Risky File Management
15.9.3. Improvement in Customer Quality Metric (CQM)
15.10. Conclusions
15.10.1. Impact of the Assessment Process
15.10.2. Factors Contributing to Success
15.10.3.Organizational Attributes
15.10.4. Selling the Assessment Process
15.10.5. Next Steps
15.11. Appendix
15.11.1. Example Questions Used for Input Sessions
Acknowledgments
References
ch. 16 Lessons Learned from Software Analytics in Practice
16.1. Introduction
16.2. Problem Selection
16.3. Data Collection
16.3.1. Datasets
16.3.2. Data Extraction
16.4. Descriptive Analytics
16.4.1. Data Visualization
16.4.2. Reporting via Statistics
16.5. Predictive Analytics
16.5.1.A Predictive Model for all Conditions
16.5.2. Performance Evaluation
16.5.3. Prescriptive Analytics
16.6. Road Ahead
References
ch. 17 Code Comment Analysis for Improving Software Quality
17.1. Introduction.
17.1.1. Benefits of Studying and Analyzing Code Comments
17.1.2. Challenges of Studying and Analyzing Code Comments
17.1.3. Code Comment Analysis for Specification Mining and Bug Detection
17.2. Text Analytics: Techniques, Tools, and Measures
17.2.1. Natural Language Processing
17.2.2. Machine Learning
17.2.3. Analysis Tools
17.2.4. Evaluation Measures
17.3. Studies of Code Comments
17.3.1. Content of Code Comments
17.3.2.Common Topics of Code Comments
17.4. Automated Code Comment Analysis for Specification Mining and Bug Detection
17.4.1. What Should We Extract?
17.4.2. How Should We Extract Information?
17.4.3. Additional Reading
17.5. Studies and Analysis of API Documentation
17.5.1. Studies of API Documentation
17.5.2. Analysis of API Documentation
17.6. Future Directions and Challenges
References
ch. 18 Mining Software Logs for Goal-Driven Root Cause Analysis
18.1. Introduction.
18.2. Approaches to Root Cause Analysis
18.2.1. Rule-Based Approaches
18.2.2. Probabilistic Approaches
18.2.3. Model-Based Approaches
18.3. Root Cause Analysis Framework Overview
18.4. Modeling Diagnostics for Root Cause Analysis
18.4.1. Goal Models
18.4.2. Antigoal Models
18.4.3. Model Annotations
18.4.4. Loan Application Scenario
18.5. Log Reduction
18.5.1. Latent Semantic Indexing
18.5.2. Probabilistic Latent Semantic Indexing
18.6. Reasoning Techniques
18.6.1. Markov Logic Networks
18.7. Root Cause Analysis for Failures Induced by Internal Faults
18.7.1. Knowledge Representation
18.7.2. Diagnosis
18.8. Root Cause Analysis for Failures due to External Threats
18.8.1. Antigoal Model Rules
18.8.2. Inference
18.9. Experimental Evaluations
18.9.1. Detecting Root Causes due to Internal Faults
18.9.2. Detecting Root Causes due to External Actions
18.9.3. Performance Evaluation
18.10. Conclusions.
19.5.1. OTT Case Study-The Context and Content
19.5.2. Formalization of the Problem
19.5.3. The Case Study Process
19.5.4. Release Planning in the Presence of Advanced Feature Dependencies and Synergies
19.5.5. Real-Time What-to-Release Planning
19.5.6. Re-Planning Based on Crowd Clustering
19.5.7. Conclusions and Discussion of Results
19.6. Summary and Future Research
19.7. Appendix: Feature Dependency Constraints
Acknowledgments
References
ch. 20 Boa: An Enabling Language and Infrastructure for Ultra-Large-Scale MSR Studies
20.1. Objectives
20.2. Getting Started with Boa
20.2.1. Boa's Architecture
20.2.2. Submitting a Task
20.2.3. Obtaining the Results
20.3. Boa's Syntax and Semantics
20.3.1. Basic and Compound Types
20.3.2. Output Aggregation
20.3.3. Expressing Loops with Quantifiers
20.3.4. User-Defined Functions
20.4. Mining Project and Repository Metadata.
20.4.1. Types for Mining Software Repositories
20.4.2. Example 1: Mining Top 10 Programming Languages
20.4.3. Intrinsic Functions
20.4.4. Example 2: Mining Revisions that Fix Bugs
20.4.5. Example 3: Computing Project Churn Rates
20.5. Mining Source Code with Visitors
20.5.1. Types for Mining Source Code
20.5.2. Intrinsic Functions
20.5.3. Visitor Syntax
20.5.4. Example 4: Mining AST Count
20.5.5. Custom Traversal Strategies
20.5.6. Example 5: Mining for Added Null Checks
20.5.7. Example 6: Finding Unreachable Code
20.6. Guidelines for Replicable Research
20.7. Conclusions
20.8. Practice Problems
References
ch. 21 Scalable Parallelization of Specification Mining Using Distributed Computing
21.1. Introduction
21.2. Background
21.2.1. Specification Mining Algorithms
21.2.2. Distributed Computing
21.3. Distributed Specification Mining
21.3.1. Principles
21.3.2. Algorithm-Specific Parallelization.
21.4. Implementation and Empirical Evaluation
21.4.1. Dataset and Experimental Settings
21.4.2. Research Questions and Results
21.4.3. Threats to Validity and Current Limitations
21.5. Related Work
21.5.1. Specification Mining and Its Applications
21.5.2. MapReduce in Software Engineering
21.5.3. Parallel Data Mining Algorithms
21.6. Conclusion and Future Work.

The art and science of analyzing software data /

Ejemplares similares