Web Scraping and API Fundamentals in Python

Web Scraping and API Fundamentals in Python offers an introduction to the techniques of data extraction from the web. In this course, you will learn how to use one of the most powerful tools on the Internet – APIs. We will also discuss in depth how to obtain information directly from websites using the BeautifulSoup Python package. There will be a short HTML crash course for those not familiar with it. Finally, we will introduce the Requests-HTML package in order to extract dynamically generated JavaScript content.

Sign up to
preview the program
for FREE!

Create a free account and start learning data science today.

create free account
Our graduates work at exciting places:
walmart
tesla
paypal
citibank
booking.com

Section 1

Course Introduction

In this first section, we will discuss what the course covers, why you need to learn Web Scraping and give you some notes on the ethics of scraping.

Premium course icon What does the course cover?
Premium course icon What is Web Scraping?
Premium course icon Ethics of Scraping

Section 2

Setting Up the Environment

In this part of the course, we will explain to you how to set up Python 3 and then load up Jupyter. We’ll also show you what the Anaconda Prompt is and how we use it to download and import new modules.

Premium course icon Setting up the environment - Do not skip, please!
Premium course icon Why Python and why Jupyter?
Premium course icon Installing Anaconda
Premium course icon Jupyter Dashboard - Part 1
Premium course icon Jupyter Dashboard - Part 2
Premium course icon Installing the packages

Section 3

Working with APIs

Here we will introduce what APIs are and how to use them. In order to do that, we will discuss the popular data exchange format JSON, as well as HTTP requests and the Python library to submit them – ‘requests’. At the end of the section, we will show you how to deal with an API that requires registration.

Premium course icon API overview
Premium course icon HTTP requests: GET and POST requests
Premium course icon JSON: preferred data exchange format for APIs
Premium course icon Exchange rates API: GETting a JSON response
Premium course icon Incorporating parameters in a GET request
Premium course icon Additional API functionalities
Premium course icon Creating a simple currency converter
Premium course icon iTunes API
Show all lessons
Premium course icon iTunes API: Structuring and exporting the data
Premium course icon GitHub API: Pagination
Premium course icon EDAMAM API: Initial setup and registration
Premium course icon EDAMAM API: Sending a POST request
Premium course icon Downloading files with Requests
Show fewer lessons

Section 4

HTML Overview

Web Scraping relies on extracting information from the source code of webpages. Thus, a general understanding of HTML is required. This section is a short crash course for those that are not familiar with HTML. It is meant as an intuitive look into the basics, not a comprehensive guide.

Premium course icon What is HTML?
Premium course icon Structure of HTML
Premium course icon Syntax of HTML. Tags
Premium course icon Tag attributes
Premium course icon Popular tags
Premium course icon CSS and JavaScript
Premium course icon Character encoding
Premium course icon XHTML and code style

Section 5

Web Scraping with Beautiful Soup

After familiarizing with HTML, we are ready to delve into the Web Scraping itself. We will now introduce the “Beautiful Soup” package and explore its capabilities.

Premium course icon Introduction to the Beautiful Soup package
Premium course icon Workflow of Web Scraping
Premium course icon Setting up your first scraper
Premium course icon Searching and navigating the HTML tree
Premium course icon Searching the HTML tree by attributes
Premium course icon Exctracting data from the HTML tree
Premium course icon Extracting text from an HTML tag
Premium course icon Practical example: dealing with links
Show all lessons
Premium course icon Extracting data from nested HTML tags
Premium course icon Scraping multiple pages automatically
Show fewer lessons

Section 6

Practical Project: Scraping Rotten Tomatoes

Now that we’ve seen what Beautiful Soup can do, we will devote this section to practicing our newly formed skills. We are going to obtain information about movies from a ‘Rotten Tomatoes’ rank list.

Premium course icon Setting up your scraper
Premium course icon Extracting the title and year of each movie
Premium course icon Extracting the rest of the information
Premium course icon Dealing with the cast of the movies
Premium course icon Storing and exporting the data in a structured form

Section 7

Scraping HTML Tables

In this short section, we will discuss an easy way to scrape HTML tables.

Premium course icon Scraping HTML tables with the help of Pandas

Section 8

Common Roadblocks when Scraping

Although we have done a decent amount of scraping so far in the course, this is one of those topics that can depend very much so on the website we choose. Different websites present specific problems. Thus, in this section, we will discuss what are the most common problems that you will have to deal with and give you solutions and workarounds.

Premium course icon Common roadblocks when Web Scraping

Section 9

The Requests-HTML Package

Here, we will introduce another Web Scraping package – ‘Requests-HTML’. We are doing it because it has one big advantage over Beautiful Soup – the ability to execute JavaScript. Thus, this allows us to extract dynamically generated content which is exactly what we will do.

Premium course icon Introduction to the requests-html package
Premium course icon Exploring the capabilities of requests-html for Web Scraping
Premium course icon Searching for text
Premium course icon CSS selectors
Premium course icon Scraping JavaScript
MODULE 4

Advanced Specialization

This course is part of Module 4 of the 365 Data Science Program. The complete training consists of four modules, each building upon your knowledge from the previous one. Module 4 is focused on developing a specialized, industry-relevant skill set, and students are encouraged to complete Modules 1, 2, and 3 before they start this part of the training. Here, you will learn how to perform Credit Risk Modeling for banks, Customer Analytics for retail or other commercial companies, and Time Series Analysis for finance and stock data.

See All Modules

Trust the other 500,000 students

Ready to start?
Sign up today for FREE!

Whether you want to scale your career or transition into a new field, data science is the number one skillset employers look for. Grow your analytics expertise and get hired as a data scientist!