Web scraping is a fascinating field with a lot of applications. Its appeal derives from both our curiosity to learn and analyze, as well as the need to collect and use data. But, while we usually imagine ourselves as hackers scraping pages, that’s not always necessary. Many pages have, in fact, been created to provide information rather than to rank on Google.
To facilitate the data extraction, some website owners and data providers maintain APIs which allow for applications to communicate with each other in the background as we browse the web. In this article, we are going to provide a definition of what an API is and how it’s typically used. We’ll also look at some of the most common requests and responses. So, hold on tight!
What Is an API?
API stands for application programming interface. This may sound somewhat unclear, but the concept is rather straightforward. Essentially, an API specifies how software components should interact. In fact, think of it as a contract between a client and a server – if the client makes a request in a specific format, the server will always respond in a documented format or initiate a defined action. For instance, firing up your social media first thing in the morning or checking what the weather’s going to be like on your app.
Fundamentally, APIs exist in many forms and have many use cases. Some can be web-based, others – between an application and an operating system, or even between some piece of hardware and software. The possibilities are endless!
Examples of Web-Based APIs
Arguably the most popular APIs at the moment. Some examples include, but are not limited to:
- Up-to-date currency exchange rates
- Job boards
- Weather forecast information
We use them to incorporate a third-party functionality into a service. For example, have you ever come across Facebook’s ‘Share This’ buttons on unrelated websites like YouTube? Well, it is embedded there using an API.
Another way to apply web APIs is giving the public access to some obscure or constantly changing real-time data. Examples of this are APIs that return up-to-date currency exchange rates, or a social media API that obtains all posts with a certain hashtag.
And, sometimes, we have the data we want but lack the capabilities to process it – because it may require some sort of machine learning algorithm, for instance. Fortunately, there are a lot of APIs out there to help us with that as well.
Side note: If you find a webpage that you’d like to analyze for your personal projects, however, it doesn’t have an available API and is hidden behind a login – you can still try scraping the locked data (but don’t forget to be responsible and ask for permission, of course!).
Are APIs Free to Use?
Now, APIs could be either free or paid. As you can imagine, it’s no easy job to create and maintain an API – you would expect most of them to be paid, right? Contrary to popular belief, however, the internet is full of generous people, creating and maintaining public APIs for free.
But keep in mind that most of the APIs are still paid.
How To Use an API?
That’s the million-dollar question! For starters, all APIs should have some form of documentation – a file or a webpage explaining exactly how to use it, its response format, and so on.
That being said, a shared step across the board is to connect to an API by making an HTTP request to the API server.
What Is an HTTP Request?
HTTP stands for HyperText Transfer Protocol. It specifies how the server should format and transmit requests and responses – this is essentially how most of your web surfing is made possible.
Ultimately, websites consist of a collection of files stored in a remote machine called a server, such as:
- The web page’s HTML code
- Supplementary resources such as images, videos, styles, etc.
While surfing, we download these files to our computer which the browser displays properly. In other words, we send a request, indicating we want to obtain a certain file, then the server processes this request and responds accordingly.
There are two disproportionately popular HTTP request types – GET and POST.
HTTP GET Request
As the name suggests, GET is primarily used to obtain data from a server. Moreover, the request remains in the browser history and server logs. Sometimes, parameters are added in order to receive a more specific response, however, those are appended directly to the URL of the request. Since the URL is visible, the GET method is not used for sensitive information.
HTTP POST Request
In contrast, a POST request is designed for altering the state or send confidential information. As a result, it can carry information in a separate body, protecting it from prying eyes – for example, adding items to your shopping cart. In addition, login credentials and passwords are always transmitted through the POST method.
What Is the Difference Between a Successful and Unsuccessful HTTP Response?
To indicate whether a request was successful, the server’s response contains a status code with a given meaning. The most common ones are:
- 200 – the server has processed the request successfully
- 404 – the server was unable to process the request
You have probably already encountered a 404 HTTP response on a missing webpage, for example.
If you get a 200 response, however, you will be ready to extract the given webpage’s data – most commonly an HTML file. An important note here is that, for web APIs, by far the most common format to transmit data is JSON.
What Is a JSON?
When APIs receive GET requests, they often return data as a JSON. Moreover, the payload data of a POST request may also be a JSON. In other words, this format frequently pops up in web services.
How To Use JSON?
Essentially, the format relies on 3 key concepts:
- It should be easy for humans to read and write.
- It should be easy for programs to process and generate, regardless of the programming language.
- It should be written in plain text.
How JSON achieves these is by building upon 2 structures, familiar to almost all programmers – dictionaries and lists. To learn more about them, check out our dedicated article on the JSON format.
API: Next Steps
Application programming interfaces make up most of our web experience. They’re mostly invisible to the untrained eye, however, without them, surfing the internet would not be as simple as it is today. Yet they’re also quite straightforward and easy to use. So, why not journey further into the land of APIs and meet HTTP requests and JSONs in their natural environment?