Web scraping with Python : collecting data from the modern web /
Learn web scraping and crawling techniques to access unlimited data from any web source in any format. With this practical guide, you'll learn how to use Python scripts and web APIs to gather and process data from thousands-or even millions-of web pages at once. Ideal for programmers, security...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Sebastopol, CA :
O'Reilly Media,
2015.
|
Edición: | First edition. |
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- ""Copyright""; ""Table of Contents""; ""Preface""; ""What Is Web Scraping?""; ""Why Web Scraping?""; ""About This Book""; ""Conventions Used in This Book""; ""Using Code Examples""; ""Safari® Books Online""; ""How to Contact Us""; ""Acknowledgments""; ""Part I. Building Scrapers""; ""Chapter 1. Your First Web Scraper""; ""Connecting""; ""An Introduction to BeautifulSoup""; ""Installing BeautifulSoup""; ""Running BeautifulSoup""; ""Connecting Reliably""; ""Chapter 2. Advanced HTML Parsing""; ""You Don't Always Need a Hammer""; ""Another Serving of BeautifulSoup""
- ""Find() and findAll() with BeautifulSoup""""Other BeautifulSoup Objects""; ""Navigating Trees""; ""Regular Expressions""; ""Regular Expressions and BeautifulSoup""; ""Accessing Attributes""; ""Lambda Expressions""; ""Beyond BeautifulSoup""; ""Chapter 3. Starting to Crawl""; ""Traversing a Single Domain""; ""Crawling an Entire Site""; ""Collecting Data Across an Entire Site""; ""Crawling Across the Internet""; ""Crawling with Scrapy""; ""Chapter 4. Using APIs""; ""How APIs Work""; ""Common Conventions""; ""Methods""; ""Authentication""; ""Responses""; ""API Calls""; ""Echo Nest""
- ""A Few Examples""""Twitter""; ""Getting Started""; ""A Few Examples""; ""Google APIs""; ""Getting Started""; ""A Few Examples""; ""Parsing JSON""; ""Bringing It All Back Home""; ""More About APIs""; ""Chapter 5. Storing Data""; ""Media Files""; ""Storing Data to CSV""; ""MySQL""; ""Installing MySQL""; ""Some Basic Commands""; ""Integrating with Python""; ""Database Techniques and Good Practice""; """Six Degrees" in MySQL""; ""Email""; ""Chapter 6. Reading Documents""; ""Document Encoding""; ""Text""; ""Text Encoding and the Global Internet""; ""CSV""; ""Reading CSV Files""; ""PDF""
- ""Microsoft Word and .docx""""Part II. Advanced Scraping""; ""Chapter 7. Cleaning Your Dirty Data""; ""Cleaning in Code""; ""Data Normalization""; ""Cleaning After the Fact""; ""OpenRefine""; ""Chapter 8. Reading and Writing Natural Languages""; ""Summarizing Data""; ""Markov Models""; ""Six Degrees of Wikipedia: Conclusion""; ""Natural Language Toolkit""; ""Installation and Setup""; ""Statistical Analysis with NLTK""; ""Lexicographical Analysis with NLTK""; ""Additional Resources""; ""Chapter 9. Crawling Through Forms and Logins""; ""Python Requests Library""; ""Submitting a Basic Form""
- ""Radio Buttons, Checkboxes, and Other Inputs""""Submitting Files and Images""; ""Handling Logins and Cookies""; ""HTTP Basic Access Authentication""; ""Other Form Problems""; ""Chapter 10. Scraping JavaScript""; ""A Brief Introduction to JavaScript""; ""Common JavaScript Libraries""; ""Ajax and Dynamic HTML""; ""Executing JavaScript in Python with Selenium""; ""Handling Redirects""; ""Chapter 11. Image Processing and Text Recognition""; ""Overview of Libraries""; ""Pillow""; ""Tesseract""; ""NumPy""; ""Processing Well-Formatted Text""; ""Scraping Text from Images on Websites""