Web Scraping and API Fundamentals in Python

Introducing you to the fundamentals of data extraction from the web using Python. We will learn about APIs, Beautiful Soup and Requests-HTML.








Course description

Web Scraping and API Fundamentals in Python offers an introduction to the techniques of data extraction from the web. In this course, you will learn how to use one of the most powerful tools on the Internet – APIs. We will also discuss in depth how to obtain information directly from websites using the BeautifulSoup Python package. There will be a short HTML crash course for those not familiar with it. Finally, we will introduce the Requests-HTML package in order to extract dynamically generated JavaScript content.


Introduction to the course

In this section, you will learn about the importance of data visualization, as well as some theoretical foundations for creating charts. We introduce popular frameworks for choosing an appropriate visualization for your data, discuss color theory, and show different approaches to selecting the colors for your graphic.


Setting Up the Environment

Here, we set up different environments for the course. First, we will guide you through the installation process for Tableau. Then, you will get familiar with the step-by-step process of installing Anaconda and Jupyter and an introductory tour of the Jupyter Dashboard for Python. Finally, you’ll learn how to install R and R studio, explore the latter’s main features and learn how to customize its appearance.


HTML overview

In this section, we explore pie charts, which, despite criticism, are among the most popular visualizations. You will learn how to create a pie chart of engine fuel types in Excel, Tableau, Python, and R, and discover what to avoid when making a pie chart.


Parctical project: Scraping Rotten Tomatoes

In this section, we continue discussing time series data. We will turn our attention to the financial world and explore the stock market returns for two major indices: S&P 500 and FTSE 100. In conclusion, you’ll find out the advantages of using a line chart and what you should be wary of when creating one.


Scraping HTML tables

This section centers around the histogram – an integral part of the data analysis process. We will create a histogram of the price of California's real estate. Here, we devote an extra lecture and explore how to choose the right number of bins for your histogram.


Common roadblocks when scraping

We’ll explore a combination chart of a scatter and a regression line by using marketing data and a regression line to quantify the relationship between a company’s advertising budget and its sales. You will learn how to create a regression scatter in Excel, Tableau, Python, and R, and discover different types of relationships between features in data. the model residuals can be beneficial in model selection.


The requests-html package

Here, we will introduce another Web Scraping package – ‘Requests-HTML’. We are doing it because it has one big advantage over Beautiful Soup – the ability to execute JavaScript. Thus, this allows us to extract dynamically generated content which is exactly what we will do.