Cargando…

Practical Weak Supervision /

Most data scientists and engineers today rely on quality labeled data to train their machine learning models. But building training sets manually is time-consuming and expensive, leaving many companies with unfinished ML projects. There's a more practical approach. In this book, Amit Bahree, Se...

Descripción completa

Detalles Bibliográficos
Autores principales: Tok, Wee (Autor), Bahree, Amit (Autor), Filipi, Senja (Autor)
Autor Corporativo: Safari, an O'Reilly Media Company
Formato: Electrónico eBook
Idioma:Inglés
Publicado: O'Reilly Media, Inc., 2021.
Edición:1st edition.
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Intro
  • Copyright
  • Table of Contents
  • Foreword by Xuedong Huang
  • Foreword by Alex Ratner
  • Preface
  • Who Should Read This Book
  • Navigating This Book
  • Conventions Used in This Book
  • Using Code Examples
  • O'Reilly Online Learning
  • How to Contact Us
  • Acknowledgments
  • Chapter 1. Introduction to Weak Supervision
  • What Is Weak Supervision?
  • Real-World Weak Supervision with Snorkel
  • Approaches to Weak Supervision
  • Incomplete Supervision
  • Inexact Supervision
  • Inaccurate Supervision
  • Data Programming
  • Getting Training Data
  • How Data Programming Is Helping Accelerate Software 2.0
  • Summary
  • Chapter 2. Diving into Data Programming with Snorkel
  • Snorkel, a Data Programming Framework
  • Getting Started with Labeling Functions
  • Applying the Labels to the Datasets
  • Analyzing the Labeling Performance
  • Using a Validation Set
  • Reaching Labeling Consensus with LabelModel
  • Intuition Behind LabelModel
  • LabelModel Parameter Estimation
  • Strategies to Improve the Labeling Functions
  • Data Augmentation with Snorkel Transformers
  • Data Augmentation Through Word Removal
  • Snorkel Preprocessors
  • Data Augmentation Through GPT-2 Prediction
  • Data Augmentation Through Translation
  • Applying the Transformation Functions to the Dataset
  • Summary
  • Chapter 3. Labeling in Action
  • Labeling a Text Dataset: Identifying Fake News
  • Exploring the Fake News Detection(FakeNewsNet) Dataset
  • Importing Snorkel and Setting Up Representative Constants
  • Fact-Checking Sites
  • Is the Speaker a "Liar"?
  • Twitter Profile and Botometer Score
  • Generating Agreements Between Weak Classifiers
  • Labeling an Images Dataset: Determining Indoor Versus Outdoor Images
  • Creating a Dataset of Images from Bing
  • Defining and Training Weak Classifiers in TensorFlow
  • Training the Various Classifiers
  • Weak Classifiers out of Image Tags
  • Deploying the Computer Vision Service
  • Interacting with the Computer Vision Service
  • Preparing the DataFrame
  • Learning a LabelModel
  • Summary
  • Chapter 4. Using the Snorkel-Labeled Dataset for Text Classification
  • Getting Started with Natural Language Processing (NLP)
  • Transformers
  • Hard Versus Probabilistic Labels
  • Using ktrain for Performing Text Classification
  • Data Preparation
  • Dealing with an Imbalanced Dataset
  • Training the Model
  • Using the Text Classification Model for Prediction
  • Finding a Good Learning Rate
  • Using Hugging Face and Transformers
  • Loading the Relevant Python Packages
  • Dataset Preparation
  • Checking Whether GPU Hardware Is Available
  • Performing Tokenization
  • Model Training
  • Testing the Fine-Tuned Model
  • Summary
  • Chapter 5. Using the Snorkel-Labeled Dataset for Image Classification
  • Visual Object Recognition Overview
  • Representing Image Features
  • Transfer Learning for Computer Vision
  • Using PyTorch for Image Classification