Cargando…

Automated data collection with R : a practical guide to Web scraping and text mining /

Automated data collection with R : a practical guide to Web scraping and text mining /

"This book provides a unified framework of web scraping and information extraction from text data with R for the social sciences"--

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Munzert, Simon (Autor), Ruoba, Christin (Autor), Meiboner, Peter (Autor), Nyhuis, Dominic (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Chichester, England : Wiley, 2015.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Automated Data Collection with R; Contents; Preface; What you won't learn from reading this book; Why R?; Recommended reading to get started with R; Typographic conventions; The book's website; Disclaimer; Acknowledgments; 1 Introduction; 1.1 Case study: World Heritage Sites in Danger; 1.2 Some remarks on web data quality; 1.3 Technologies for disseminating, extracting, and storing web data; 1.3.1 Technologies for disseminating content on the Web; 1.3.2 Technologies for information extraction from web documents; 1.3.3 Technologies for data storage; 1.4 Structure of the book.
  • Part One A Primer on Web and Data Technologies2 HTML; 2.1 Browser presentation and source code; 2.2 Syntax rules; 2.2.1 Tags, elements, and attributes; 2.2.2 Tree structure; 2.2.3 Comments; 2.2.4 Reserved and special characters; 2.2.5 Document type definition; 2.2.6 Spaces and line breaks; 2.3 Tags and attributes; 2.3.1 The anchor tag ; 2.3.2 The metadata tag ; 2.3.3 The external reference tag ; 2.3.4 Emphasizing tags, ; 2.3.5 The paragraphs tag ; 2.3.6 Heading tags, ; 2.3.7 Listing content with, and.
  • 2.3.8 The organizational tags and 2.3.9 The tag and its companions; 2.3.10 The foreign script tag ; 2.3.11 Table tags, and ; 2.4 Parsing; 2.4.1 What is parsing?; 2.4.2 Discarding nodes; 2.4.3 Extracting information in the building process; Summary; Further reading; Problems; 3 XML and JSON; 3.1 A short example XML document; 3.2 XML syntax rules; 3.2.1 Elements and attributes; 3.2.2 XML structure; 3.2.3 Naming and special characters; 3.2.4 Comments and character data; 3.2.5 XML syntax summary; 3.3 When is an XML document well formed or valid?
  • 3.4 XML extensions and technologies3.4.1 Namespaces; 3.4.2 Extensions of XML; 3.4.3 Example: Really Simple Syndication; 3.4.4 Example: scalable vector graphics; 3.5 XML and R in practice; 3.5.1 Parsing XML; 3.5.2 Basic operations on XML documents; 3.5.3 From XML to data frames or lists; 3.5.4 Event-driven parsing; 3.6 A short example JSON document; 3.7 JSON syntax rules; 3.8 JSON and R in practice; Summary; Further reading; Problems; 4 XPath; 4.1 XPath-a query language for web documents; 4.2 Identifying node sets with XPath; 4.2.1 Basic structure of an XPath query; 4.2.2 Node relations.
  • 4.2.3 XPath predicates4.3 Extracting node elements; 4.3.1 Extending the fun argument; 4.3.2 XML namespaces; 4.3.3 Little XPath helper tools; Summary; Further reading; Problems; 5 HTTP; 5.1 HTTP fundamentals; 5.1.1 A short conversation with a web server; 5.1.2 URL syntax; 5.1.3 HTTP messages; 5.1.4 Request methods; 5.1.5 Status codes; 5.1.6 Header fields; 5.2 Advanced features of HTTP; 5.2.1 Identification; 5.2.2 Authentication; 5.2.3 Proxies; 5.3 Protocols beyond HTTP; 5.3.1 HTTP Secure; 5.3.2 FTP; 5.4 HTTP in action; 5.4.1 The libcurl library; 5.4.2 Basic request methods.