Guerrilla analytics : a practical approach to working with data /
Doing data science is difficult. Projects are typically very dynamic with requirements that change as data understanding grows. The data itself arrives piecemeal, is added to, replaced, contains undiscovered flaws and comes from a variety of sources. Teams also have mixed skill sets and tooling is o...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Otros Autores: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Waltham, Massachusetts :
Morgan Kaufmann,
2015.
|
Colección: | Savvy manager's guides.
|
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Cover; Title Page; Copyright Page; Contents; List of Figures; Table of War Stories; Preface; Why this book?; What this book is and what it is not; Who should read this book?; How this book is organized; Disclaimer; Part 1
- Principles; Chapter 1
- Introducing Guerrilla Analytics; 1.1
- What is data analytics?; 1.1.1
- Data Analytics Definition; 1.1.2
- Examples of Data Analytics; 1.2
- Types of data analytics projects; 1.3
- Introducing Guerrilla Analytics projects; 1.4
- Guerrilla Analytics definition; 1.4.1
- Changing Data; 1.4.2
- Changing Requirements; 1.4.3
- Changing Resource.
- 1.4.4
- Limited Time1.4.5
- Limited Toolsets; 1.4.6
- Analytics Results Must be Reproducible; 1.4.7
- Work Products must be easily explained; 1.5
- Example Guerrilla Analytics projects; 1.6
- Some terminology; 1.7
- Wrap up; Chapter 2
- Guerrilla Analytics: Challenges and Risks; 2.1
- The Guerrilla Analytics workflow; 2.2
- Challenges of managing analytics projects; 2.2.1
- Tracking Multiple Data Inputs; 2.2.2
- Versioning Multiple Data Inputs; 2.2.3
- Tracking Multiple Data Work Products; 2.2.4
- Data Generated by People; 2.2.5
- External Data; 2.2.6
- Version Control of Analytics.
- 2.2.7
- Creating Analytics that is Reproducible2.2.8
- Testing and Reviewing Analytics; 2.2.9
- Foreign Data Environment; 2.2.10
- Upskilling a Team Quickly; 2.2.11
- Reskilling a Team Quickly; 2.3
- Risks; 2.3.1
- Losing the Link Between Data Received and its Storage Location; 2.3.2
- Losing the Link Between Raw Data and Derived Data; 2.3.3
- Inability to Reproduce Work Products Because Source Datasets have Disappeared or been Modified; 2.3.4
- Inability to Easily Navigate the Analytics Environment; 2.3.5
- Conflicting Changes to Datasets; 2.3.6
- Changing of Raw Data.
- 2.3.7
- Out of Date Documentation Misleads the Team2.3.8
- Failure to Communicate Updates to Team Knowledge; 2.3.9
- Multiple Copies of Files and Work Products; 2.3.10
- Fragmented Code that Cannot be Executed Without the Author's Input; 2.3.11
- Inability to Identify the Source of a Dataset; 2.3.12
- Lack of Clarity Around Derivation of an Analysis; 2.3.13
- Multiple Versions of Tools and Libraries; 2.4
- Impact of failure to address analytics risks; 2.5
- Wrap up; Chapter 3
- Guerrilla Analytics Principles; 3.1
- Maintain data provenance despite disruptions; 3.2
- The principles.
- 3.2.1
- Overview3.2.2
- Principle 1: Space is Cheap, Confusion is Expensive; 3.2.3
- Principle 2: Prefer Simple, Visual Project Structures Over Heavily Documented and Project-specific Rules; 3.2.4
- Principle 3: Prefer Automation with Program Code Over Manual Graphical Methods; 3.2.5
- Principle 4: Maintain a Link Between Data on the File System, in the Analytics Environment, and in Work Products; 3.2.6
- Principle 5: Version Control Changes to Data and Program Code; 3.2.7
- Consolidate Team Knowledge in Version-controlled Builds.
- 3.2.8
- Principle 7: Prefer Analytics Code that Runs from Start to Finish.