Cargando…

Python Web Scraping Cookbook : Over 90 proven recipes to get you scraping with Python, micro services, Docker and AWS.

Python Web Scraping Cookbook is a solution-focused book that will teach you techniques to develop high-performance Scrapers, and deal with cookies, hidden form fields, Ajax-based sites, proxies, and more. By the end of this book, you will be able to scrape websites more efficiently with more accurat...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Heydt, Michael
Otros Autores: Zeng, Jay
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham : Packt Publishing, 2018.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Cover; Title Page; Copyright and Credits; Contributors; Packt Upsell; Table of Contents; Preface; Chapter 1: Getting Started with Scraping; Introduction; Setting up a Python development environment ; Getting ready; How to do it ... ; Scraping Python.org with Requests and Beautiful Soup; Getting ready ... ; How to do it ... ; How it works ... ; Scraping Python.org in urllib3 and Beautiful Soup; Getting ready ... ; How to do it ... ; How it works; There's more ... ; Scraping Python.org with Scrapy; Getting ready ... ; How to do it ... ; How it works; Scraping Python.org with Selenium and PhantomJS.
  • Getting readyHow to do it ... ; How it works; There's more ... ; Chapter 2: Data Acquisition and Extraction; Introduction; How to parse websites and navigate the DOM using BeautifulSoup; Getting ready; How to do it ... ; How it works; There's more ... ; Searching the DOM with Beautiful Soup's find methods; Getting ready; How to do it ... ; Querying the DOM with XPath and lxml; Getting ready; How to do it ... ; How it works; There's more ... ; Querying data with XPath and CSS selectors; Getting ready; How to do it ... ; How it works; There's more ... ; Using Scrapy selectors; Getting ready; How to do it ...
  • How it worksThere's more ... ; Loading data in unicode / UTF-8; Getting ready; How to do it ... ; How it works; There's more ... ; Chapter 3: Processing Data; Introduction; Working with CSV and JSON data; Getting ready; How to do it; How it works; There's more ... ; Storing data using AWS S3; Getting ready; How to do it; How it works; There's more ... ; Storing data using MySQL; Getting ready; How to do it; How it works; There's more ... ; Storing data using PostgreSQL; Getting ready; How to do it; How it works; There's more ... ; Storing data in Elasticsearch; Getting ready; How to do it; How it works.
  • There's more ... How to build robust ETL pipelines with AWS SQS; Getting ready; How to do it
  • posting messages to an AWS queue; How it works; How to do it
  • reading and processing messages; How it works; There's more ... ; Chapter 4: Working with Images, Audio, and other Assets; Introduction; Downloading media content from the web; Getting ready; How to do it; How it works; There's more ... ; Â Parsing a URL with urllib to get the filename; Getting ready; How to do it; How it works; There's more ... ; Determining the type of content for a URLÂ ; Getting ready; How to do it; How it works.
  • There's more ... Determining the file extension from a content type; Getting ready; How to do it; How it works; There's more ... ; Downloading and saving images to the local file system; How to do it; How it works; There's more ... ; Downloading and saving images to S3; Getting ready; How to do it; How it works; There's more ... ; Â Generating thumbnails for images; Getting ready; How to do it; How it works; Taking a screenshot of a website; Getting ready; How to do it; How it works; Taking a screenshot of a website with an external service; Getting ready; How to do it; How it works; There's more ...