Hands-on web scraping with Python : perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others /
Web scraping is an essential technique used in many organizations to scrape valuable data from web pages. This book will help you master web scraping techniques and methodologies using Python libraries and other popular tools such as Selenium. By the end of this book, you will have learned how to ef...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham, UK :
Packt,
[2019]
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Section 1: Introduction to Web Scraping; Chapter 1: Web Scraping Fundamentals; Introduction to web scraping; Understanding web development and technologies; HTTP; HTML; HTML elements and attributes; Global attributes; XML; JavaScript; JSON; CSS; AngularJS; Data finding techniques for the web; HTML page source; Case 1; Case 2; Developer tools; Sitemaps; The robots.txt file; Summary; Further reading; Section 2: Beginning Web Scraping
- Chapter 2: Python and the Web
- Using urllib and RequestsTechnical requirements; Accessing the web with Python; Setting things up; Loading URLs; URL handling and operations with urllib and requests; urllib; requests; Implementing HTTP methods; GET; POST; Summary; Further reading; Chapter 3: Using LXML, XPath, and CSS Selectors; Technical requirements; Introduction to XPath and CSS selector; XPath; CSS selectors; Element selectors; ID and class selectors; Attribute selectors; Pseudo selectors; Using web browser developer tools for accessing web content; HTML elements and DOM navigation
- XPath and CSS selectors using DevToolsScraping using lxml, a Python library; lxml by examples; Example 1
- reading XML from file and traversing through its elements; Example 2
- reading HTML documents using lxml.html; Example 3
- reading and parsing HTML for retrieving HTML form type element attributes; Web scraping using lxml; Example 1
- extracting selected data from a single page using lxml.html.xpath; Example 2
- looping with XPath and scraping data from multiple pages; Example 3
- using lxml.cssselect to scrape content from a page; Summary; Further reading
- Chapter 4: Scraping Using pyquery
- a Python LibraryTechnical requirements; Introduction to pyquery; Exploring pyquery; Loading documents; Element traversing, attributes, and pseudo-classes; Iterating; Web scraping using pyquery; Example 1
- scraping data science announcements; Example 2
- scraping information from nested links; Example 3
- extracting AHL Playoff results; Example 4
- collecting URLs from sitemap.xml; Case 1
- using the HTML parser; Case 2
- using the XML parser; Summary; Further reading; Chapter 5: Web Scraping Using Scrapy and Beautiful Soup; Technical requirements
- Web scraping using Beautiful SoupIntroduction to Beautiful Soup; Exploring Beautiful Soup; Searching, traversing, and iterating; Using children and parents; Using next and previous; Using CSS Selectors; Example 1
- listing elements with the data-id attribute; Example 2
- traversing through elements; Example 3
- searching elements based on attribute values; Building a web crawler; Web scraping using Scrapy; Introduction to Scrapy; Setting up a project; Generating a Spider; Creating an item; Extracting data; Using XPath; Using CSS Selectors; Data from multiple pages; Running and exporting