Cargando…

Text mining with R : a tidy approach /

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson d...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Silge, Julia (Autor), Robinson, David (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Sebastopol, CA : O'Reilly Media, 2017.
Edición:First edition.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Copyright; Table of Contents; Preface; Outline; Topics This Book Does Not Cover; About This Book; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Acknowledgements; Chapter 1. The Tidy Text Format; Contrasting Tidy Text with Other Data Structures; The unnest_tokens Function; Tidying the Works of Jane Austen; The gutenbergr Package; Word Frequencies; Summary; Chapter 2. Sentiment Analysis with Tidy Data; The sentiments Dataset; Sentiment Analysis with Inner Join; Comparing the Three Sentiment Dictionaries; Most Common Positive and Negative Words.
  • WordcloudsLooking at Units Beyond Just Words; Summary; Chapter 3. Analyzing Word and Document Frequency: tf-idf; Term Frequency in Jane Austen's Novels; Zipf's Law; The bind_tf_idf Function; A Corpus of Physics Texts; Summary; Chapter 4. Relationships Between Words: N-grams and Correlations; Tokenizing by N-gram; Counting and Filtering N-grams; Analyzing Bigrams; Using Bigrams to Provide Context in Sentiment Analysis; Visualizing a Network of Bigrams with ggraph; Visualizing Bigrams in Other Texts; Counting and Correlating Pairs of Words with the widyr Package.
  • Counting and Correlating Among SectionsExamining Pairwise Correlation; Summary; Chapter 5. Converting to and from Nontidy Formats; Tidying a Document-Term Matrix; Tidying DocumentTermMatrix Objects; Tidying dfm Objects; Casting Tidy Text Data into a Matrix; Tidying Corpus Objects with Metadata; Example: Mining Financial Articles; Summary; Chapter 6. Topic Modeling; Latent Dirichlet Allocation; Word-Topic Probabilities; Document-Topic Probabilities; Example: The Great Library Heist; LDA on Chapters; Per-Document Classification; By-Word Assignments: augment; Alternative LDA Implementations.
  • Casting to a Document-Term MatrixReady for Topic Modeling; Interpreting the Topic Model; Connecting Topic Modeling with Keywords; Summary; Chapter 9. Case Study: Analyzing Usenet Text; Preprocessing; Preprocessing Text; Words in Newsgroups; Finding tf-idf Within Newsgroups; Topic Modeling; Sentiment Analysis; Sentiment Analysis by Word; Sentiment Analysis by Message; N-gram Analysis; Summary; Bibliography; Index; About the Authors; Colophon.
  • Chapter 7. Case Study: Comparing Twitter Archives; Getting the Data and Distribution of Tweets; Word Frequencies; Comparing Word Usage; Changes in Word Use; Favorites and Retweets; Summary; Chapter 8. Case Study: Mining NASA Metadata; How Data Is Organized at NASA; Wrangling and Tidying the Data; Some Initial Simple Exploration; Word Co-ocurrences and Correlations; Networks of Description and Title Words; Networks of Keywords; Calculating tf-idf for the Description Fields; What Is tf-idf for the Description Field Words?; Connecting Description Fields to Keywords; Topic Modeling.