Practical Apache Lucene 8 : uncover the search capabilities of your application /
Gain a thorough knowledge of Lucene's capabilities and use it to develop your own search applications. This book explores the Java-based, high-performance text search engine library used to build search capabilities in your applications. Starting with the basics of Lucene and searching, you wil...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
[Berkeley, CA] :
Apress,
[2020]
|
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Intro
- Table of Contents
- About the Author
- About the Technical Reviewer
- Acknowledgments
- Introduction
- Chapter 1: Hola, Lucene!
- Key Features of Lucene
- Information Retrieval Basics
- Linear Scan
- Stop List
- Stemming
- Term
- Term-Document Incidence Matrix
- Serving Queries Using a Term-Document Incidence Matrix
- Basic Terminology
- Heart of Lucene's Data Representation
- Lucene's Inverted Index Structure
- On-Disk Representation of a Lucene Index
- Terms Dictionary
- Frequencies File
- Positions File
- Queries on Lucene
- Structure of a Lucene Query
- Fields
- Types of Queries in Lucene
- Lucene vs. Relational Databases
- Chapter 2: Hello World: The Lucene Way
- Indexing Data in Lucene
- Document
- Analyzers
- StandardAnalyzer
- StopAnalyzer
- SimpleAnalyzer
- IndexWriter
- Directory
- Create Documents
- Create Index and Write Documents
- Adding Data to the Index
- Bringing It All Together
- TestClass
- Document Search
- QueryParser
- TopDocs
- IndexSearcher
- IndexReader
- Searching
- Boolean Model
- What Is Relevance?
- Scoring Algorithms
- TF/IDF
- Vector Space Model
- Scoring Example
- Lucene's Scoring Model
- Fields
- Similarity
- Boosting
- Collectors
- Chapter 3: Core Search Fundamentals
- Codecs
- DocValues
- Phrase Queries
- Term Vectors
- BooleanQuery
- MultiTermQuery
- QueryCache
- Scorer as Part of the Search Process
- Chapter 4: Spatial Indexing
- Spatial Module
- What Are Geohashes?
- Quad Trees
- K-D Trees
- BKD Trees
- Using Spatial Indexing
- Chapter 5: Location-Aware Search Engines
- Why Use a Search Engine for Geographic Searches?
- Range Queries
- Function Queries
- Geospatial Basics
- Representing Spatial Data
- Tiered Design for Storage
- Geohashes
- Spatial Data with Text Search
- Distance Calculations
- Bounding Box Filter
- A Point on Distance Calculation
- Chapter 6: Introducing Machine Learning with Apache Mahout
- Origin of Apache Mahout
- Why Apache Mahout?
- Introduction to Machine Learning
- Learning
- Collaborative Filtering
- Clustering
- Categorization
- Converting from Lucene Components to Mahout Components
- Integrating Lucene with Mahout
- lucene.vector
- Lucene2seq
- Java Version of Lucene2seq
- Putting It All Together
- Chapter 7: Improving Lucene's Performance
- Increase Indexing Speed
- Reuse Field Instances
- The Curious Case of Large Commits
- Reuse Tokens in Analyzers
- Tuning Flush Intervals
- Increase mergeFactor
- Choosing the Correct Analyzers
- Use Multiple Threads with One IndexWriter
- Index into Separate Indexes and Then Merge
- Improve Search Performance
- Use the Latest Version of Lucene
- Use IndexReader with the readOnly Attribute Equal to True
- Use MMapDirectory/NIOFSDirectory
- Decrease mergeFactor
- Ignore First Query's Performance
- Avoid Reopening IndexSearcher Instances
- Share IndexSearcher Instances