NATURAL LANGUAGE PROCESSING WITH SPARK NLP : learning to understand text at scale.
If you want to build an enterprise-quality application that uses natural language text but aren't sure where to begin or what tools to use, this practical guide will help get you started. Alex Thomas, principal data scientist at Wisecube, shows software engineers and data scientists how to buil...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
[Place of publication not identified]
O'REILLY MEDIA,
2020.
|
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Intro
- Copyright
- Table of Contents
- Preface
- Why Natural Language Processing Is Important and Difficult
- Background
- Philosophy
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Part I. Basics
- Chapter 1. Getting Started
- Introduction
- Other Tools
- Setting Up Your Environment
- Prerequisites
- Starting Apache Spark
- Checking Out the Code
- Getting Familiar with Apache Spark
- Starting Apache Spark with Spark NLP
- Loading and Viewing Data in Apache Spark
- Hello World with Spark NLP
- Chapter 2. Natural Language Basics
- What Is Natural Language?
- Origins of Language
- Spoken Language Versus Written Language
- Linguistics
- Phonetics and Phonology
- Morphology
- Syntax
- Semantics
- Sociolinguistics: Dialects, Registers, and Other Varieties
- Formality
- Context
- Pragmatics
- Roman Jakobson
- How To Use Pragmatics
- Writing Systems
- Origins
- Alphabets
- Abjads
- Abugidas
- Syllabaries
- Logographs
- Encodings
- ASCII
- Unicode
- UTF-8
- Exercises: Tokenizing
- Tokenize English
- Tokenize Greek
- Tokenize Ge'ez (Amharic)
- Resources
- Chapter 3. NLP on Apache Spark
- Parallelism, Concurrency, Distributing Computation
- Parallelization Before Apache Hadoop
- MapReduce and Apache Hadoop
- Apache Spark
- Architecture of Apache Spark
- Physical Architecture
- Logical Architecture
- Spark SQL and Spark MLlib
- Transformers
- Estimators and Models
- Evaluators
- NLP Libraries
- Functionality Libraries
- Annotation Libraries
- NLP in Other Libraries
- Spark NLP
- Annotation Library
- Stages
- Pretrained Pipelines
- Finisher
- Exercises: Build a Topic Model
- Resources
- Chapter 4. Deep Learning Basics
- Gradient Descent
- Backpropagation
- Convolutional Neural Networks
- Filters
- Pooling
- Recurrent Neural Networks
- Backpropagation Through Time
- Elman Nets
- LSTMs
- Exercise 1
- Exercise 2
- Resources
- Part II. Building Blocks
- Chapter 5. Processing Words
- Tokenization
- Vocabulary Reduction
- Stemming
- Lemmatization
- Stemming Versus Lemmatization
- Spelling Correction
- Normalization
- Bag-of-Words
- CountVectorizer
- N-Gram
- Visualizing: Word and Document Distributions
- Exercises
- Resources
- Chapter 6. Information Retrieval
- Inverted Indices
- Building an Inverted Index
- Vector Space Model
- Stop-Word Removal
- Inverse Document Frequency
- In Spark
- Exercises
- Resources
- Chapter 7. Classification and Regression
- Bag-of-Words Features
- Regular Expression Features
- Feature Selection
- Modeling
- Naïve Bayes
- Linear Models
- Decision/Regression Trees
- Deep Learning Algorithms
- Iteration
- Exercises
- Chapter 8. Sequence Modeling with Keras
- Sentence Segmentation
- (Hidden) Markov Models
- Section Segmentation
- Part-of-Speech Tagging
- Conditional Random Field
- Chunking and Syntactic Parsing