Python 3 text processing with NLTK 3 cookbook : over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 /
This book is intended for Python programmers interested in learning how to do natural language processing. Maybe you've learned the limits of regular expressions the hard way, or you've realized that human language cannot be deterministically parsed like a computer language. Perhaps you ha...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Otros Autores: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing Ltd,
2014.
|
Edición: | Second edition. |
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Tokenizing Text and WordNet Basics; Introduction; Tokenizing text into sentences; Tokenizing sentences into words; Tokenizing sentences using regular expressions; Training a sentence tokenizer; Filtering stopwords in a tokenized sentence; Looking up Synsets for a word in WordNet; Looking up lemmas and synonyms in WordNet; Calculating WordNet Synset similarity; Discovering word collocations; Chapter 2: Replacing and Correcting Words; Introduction; Stemming words.
- Lemmatizing words with WordNetReplacing words matching regular expressions; Removing repeating characters; Spelling correction with Enchant; Replacing synonyms; Replacing negations with antonyms; Chapter 3: Creating Custom Corpora; Introduction; Setting up a custom corpus; Creating a wordlist corpus; Creating a part-of-speech tagged word corpus; Creating a chunked phrase corpus; Creating a categorized text corpus; Creating a categorized chunk corpus reader; Lazy corpus loading; Creating a custom corpus view; Creating a MongoDB-backed corpus reader; Corpus editing with file locking.
- Chapter 4: Part-of-speech TaggingIntroduction; Default tagging; Training a unigram part-of-speech tagger; Combining taggers with backoff tagging; Training and combining ngram taggers; Creating a model of likely word tags; Tagging with regular expressions; Affix tagging; Training a Brill tagger; Training the TnT tagger; Using WordNet for tagging; Tagging proper names; Classifier-based tagging; Training a tagger with NLTK-Trainer; Chapter 5: Extracting Chunks; Introduction; Chunking and chinking with regular expressions; Merging and splitting chunks with regular expressions.
- Expanding and removing chunks with regular expressionsPartial parsing with regular expressions; Training a tagger-based chunker; Classification-based chunking; Extracting named entities; Extracting proper noun chunks; Extracting location chunks; Training a named entity chunker; Training a chunker with NLTK-Trainer; Chapter 6: Transforming Chunks and Trees; Introduction; Filtering insignificant words from a sentence; Correcting verb forms; Swapping verb phrases; Swapping noun cardinals; Swapping infinitive phrases; Singularizing plural nouns; Chaining chunk transformations.
- Converting a chunk tree to textFlattening a deep tree; Creating a shallow tree; Converting tree labels; Chapter 7: Text Classification; Introduction; Bag of words feature extraction; Training a Naive Bayes classifier; Training a decision tree classifier; Training a maximum entropy classifier; Training scikit-learn classifiers; Measuring precision and recall of a classifier; Calculating high information words; Combining classifiers with voting; Classifying with multiple binary classifiers; Training a classifier with NLTK-Trainer; Chapter 8: Distributed Processing and Handling Large Datasets.