Practical Weak Supervision /
Most data scientists and engineers today rely on quality labeled data to train their machine learning models. But building training sets manually is time-consuming and expensive, leaving many companies with unfinished ML projects. There's a more practical approach. In this book, Amit Bahree, Se...
Autores principales: | , , |
---|---|
Autor Corporativo: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
O'Reilly Media, Inc.,
2021.
|
Edición: | 1st edition. |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Intro
- Copyright
- Table of Contents
- Foreword by Xuedong Huang
- Foreword by Alex Ratner
- Preface
- Who Should Read This Book
- Navigating This Book
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Chapter 1. Introduction to Weak Supervision
- What Is Weak Supervision?
- Real-World Weak Supervision with Snorkel
- Approaches to Weak Supervision
- Incomplete Supervision
- Inexact Supervision
- Inaccurate Supervision
- Data Programming
- Getting Training Data
- How Data Programming Is Helping Accelerate Software 2.0
- Summary
- Chapter 2. Diving into Data Programming with Snorkel
- Snorkel, a Data Programming Framework
- Getting Started with Labeling Functions
- Applying the Labels to the Datasets
- Analyzing the Labeling Performance
- Using a Validation Set
- Reaching Labeling Consensus with LabelModel
- Intuition Behind LabelModel
- LabelModel Parameter Estimation
- Strategies to Improve the Labeling Functions
- Data Augmentation with Snorkel Transformers
- Data Augmentation Through Word Removal
- Snorkel Preprocessors
- Data Augmentation Through GPT-2 Prediction
- Data Augmentation Through Translation
- Applying the Transformation Functions to the Dataset
- Summary
- Chapter 3. Labeling in Action
- Labeling a Text Dataset: Identifying Fake News
- Exploring the Fake News Detection(FakeNewsNet) Dataset
- Importing Snorkel and Setting Up Representative Constants
- Fact-Checking Sites
- Is the Speaker a "Liar"?
- Twitter Profile and Botometer Score
- Generating Agreements Between Weak Classifiers
- Labeling an Images Dataset: Determining Indoor Versus Outdoor Images
- Creating a Dataset of Images from Bing
- Defining and Training Weak Classifiers in TensorFlow
- Training the Various Classifiers
- Weak Classifiers out of Image Tags
- Deploying the Computer Vision Service
- Interacting with the Computer Vision Service
- Preparing the DataFrame
- Learning a LabelModel
- Summary
- Chapter 4. Using the Snorkel-Labeled Dataset for Text Classification
- Getting Started with Natural Language Processing (NLP)
- Transformers
- Hard Versus Probabilistic Labels
- Using ktrain for Performing Text Classification
- Data Preparation
- Dealing with an Imbalanced Dataset
- Training the Model
- Using the Text Classification Model for Prediction
- Finding a Good Learning Rate
- Using Hugging Face and Transformers
- Loading the Relevant Python Packages
- Dataset Preparation
- Checking Whether GPU Hardware Is Available
- Performing Tokenization
- Model Training
- Testing the Fine-Tuned Model
- Summary
- Chapter 5. Using the Snorkel-Labeled Dataset for Image Classification
- Visual Object Recognition Overview
- Representing Image Features
- Transfer Learning for Computer Vision
- Using PyTorch for Image Classification