Natural language processing with Java : explore various approaches to organize and extract useful text from unstructured data using Java /
Annotation
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham, UK :
Packt Publishing,
2015.
|
Colección: | Community experience distilled.
|
Temas: | |
Acceso en línea: | Texto completo Texto completo |
Tabla de Contenidos:
- Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Introduction to NLP; What is NLP?; Why use NLP?; Why is NLP so hard?; Survey of NLP tools; Apache OpenNLP; Stanford NLP; LingPipe; GATE; UIMA; Overview of text processing tasks; Finding parts of text; Finding sentences; Finding people and things; Detecting parts of speech; Classifying text and documents; Extracting relationships; Using combined approaches; Understanding NLP models; Identifying the task; Selecting a model; Building and training the model
- Verifying the modelUsing the model; Preparing data; Summary; Chapter 2: Finding Parts of Text; Understanding the parts of text; What is tokenization?; Uses for tokenizers; Simple Java tokenizers; Using the Scanner class; Specifying the delimiter; Using the split method; Using the BreakIterator class; Using the StreamTokenizer class; Using the StringTokenizer class; Java core tokenization performance considerations; NLP tokenizer APIs; Using the OpenNLPTokenizer; Using the SimpleTokenizer class; Using the WhitespaceTokenizer class; Using the TokenizerME class; Using the Stanford tokenizer
- Using the PTBTokenizer classUsing the DocumentPreprocessor class; Using a pipeline; Using LingPipe tokenizers; Training a tokenizer to find parts of text; Comparing tokenizers; Understanding normalization; Converting to lowercase; Removing stopwords; Creating a StopWords class; Using LingPipe to remove stopwords; Using stemming; Using the Porter Stemmer; Stemming with LingPipe; Using lemmatization; Using the StanfordLemmatizer class; Using lemmatization in OpenNLP; Normalizing using a pipeline; Summary; Chapter 3: Finding Sentences; The SBD process; What makes SBD difficult?
- Understanding SBD rules of LingPipe's HeuristicSentenceModel classSimple Java SBDs; Using regular expressions; Using the BreakIterator class; Using NLP APIs; Using OpenNLP; Using the SentenceDetectorME class; Using the sentPosDetect method; Using the Stanford API; Using the PTBTokenizer class; Using the DocumentPreprocessor class; Using the StanfordCoreNLP class; Using LingPipe; Using the IndoEuropeanSentenceModel class; Using the SentenceChunker class; Using the MedlineSentenceModel class; Training a Sentence Detector model; Using the Trained model
- Evaluating the model using the SentenceDetectorEvaluator classSummary; Chapter 4: Finding People and Things; Why NER is difficult?; Techniques for name recognition; Lists and regular expressions; Statistical classifiers; Using regular expressions for NER; Using Java's regular expressions to find entities; Using LingPipe's RegExChunker class; Using NLP APIs; Using OpenNLP for NER; Determining the accuracy of the entity; Using other entity types; Processing multiple entity types; Using the Stanford API for NER; Using LingPipe for NER; Using LingPipe's name entity models