Scraping Your First Web Page with Python

Focused View

Janani Ravi

2:39:10

18 View

01 - Course Overview

01 - Course Overview.mp4

01:45

02 - Getting Started with Web Scraping

02 - Module Overview.mp4

01:08

03 - Prerequisites and Course Outline.mp4

01:21

04 - Handling Redirects with the Requests Library.mp4

03:16

05 - Module Summary.mp4

01:17

06 - HTTP Requests and Responses.mp4

05:45

07 - Web Scraping.mp4

02:24

08 - HTTP Client Libraries.mp4

04:21

09 - Making GET Requests Using httplib2.mp4

07:18

10 - Making OPTIONS, POST, PUT Requests with httplib2.mp4

04:08

11 - Handling Redirects with httplib2.mp4

03:33

12 - Making HTTP Requests and Parsing URLs Using urllib.mp4

07:29

13 - GET and POST Requests Using the Requests Library.mp4

04:36

03 - Working with the Parse Tree in BeautifulSoup

14 - Module Overview.mp4

01:15

15 - The HTML Parse Tree.mp4

03:38

16 - Beautiful Soup for HTML Parsing.mp4

02:03

17 - Introducing Beautiful Soup.mp4

05:21

18 - Extracting Specific Page Elements.mp4

06:18

19 - Filtering Elements Using Find and Find All.mp4

07:13

20 - Searching and Filtering Using Custom Functions.mp4

02:49

21 - Extracting Links from a Page.mp4

06:02

22 - Using a Soup Strainer to Parse a Subset of a Document.mp4

03:45

23 - Module Summary.mp4

01:12

04 - Selecting Elements Using the Scrapy Shell

24 - Module Overview.mp4

01:05

25 - Parsing Web Content.mp4

02:19

26 - Introducing Scrapy.mp4

03:58

27 - Getting Started with Scrapy.mp4

04:13

28 - Introducing the Scrapy Shell.mp4

04:28

29 - Selecting Elements Using CSS Selectors.mp4

06:52

30 - Advanced Selections Using CSS Selectors.mp4

05:13

31 - Selecting Elements Using XPath Selectors.mp4

06:41

32 - Module Summary.mp4

01:07

05 - Scraping Web Sites Using Scrapy Spiders

33 - Module Overview.mp4

01:07

34 - How Scrapy Works.mp4

03:17

35 - Creating Your First Custom Spider.mp4

07:02

36 - Writing Scraped Contents to a File.mp4

02:26

37 - Exploring Items Using the Scrapy Shell.mp4

03:55

38 - Using Items to Store Extracted Content.mp4

04:20

39 - Using Item Loaders and Input and Output Processors for Scraped Data.mp4

07:03

40 - Using Pipelines to Transform Scraped Data.mp4

04:43

41 - Module Summary.mp4

01:24

Description

This course covers the important tools for retrieving web content using HTTP libraries such as Requests, Httplib2 and Urllib, as well as powerful technologies for web parsing. These include Beautiful Soup, which is a popular library, and Scrapy, which is a powerful, production-grade framework.

What You'll Learn?

Web scraping is an important technique that is widely used as the first step in many workflows in data mining, information retrieval, and text-based machine learning. In this course, Scraping your First Web Page with Python, you will gain the ability to apply different scraping techniques including Beautiful Soup, and Scrapy. First, you will learn and use various HTTP client libraries such as Requests, httplib2, and urllib to download HTML content. Next, you will discover how Beautiful Soup is an extremely popular Python library that does better than regex in important ways. You will see how Beautiful Soup fixes up badly formed HTML, and constructs a nice parse tree that can be traversed and queried. Finally, you will add to your toolkit the knowledge of Scrapy, which is a full-fledged web scraping framework that combines the steps of retrieving and parsing web content and does so at production-scale. When you’re finished with this course, you will have the skills and knowledge to identify the relative strengths and use-cases of different web retrieval and scraping technologies such as regular expressions, Beautiful Soup, and Scrapy.

More details

User Reviews

Rating

average 0

Total votes0

Focused display

Python

Web Scraping

Janani Ravi

Instructor's Courses

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing high-quality content for technical skill development. Loonycorn is working on developing an engine (patent filed) to automate animations for presentations and educational content.

Pluralsight

View courses Pluralsight

Pluralsight, LLC is an American privately held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog. Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules.