In the past, people used to consult shamans who would peek into what the weather would be like during the upcoming months – whether it’ll be a favorable season for crops or there would be a drought. Back in those days, when a nation’s livelihood heavily relied on the mercy of the elements, people also relied on those spiritual guides to ease their worries.
While fortune-tellers are not as highly regarded in the 21st century, however, we’re still very much seeking accurate predictions to understand patterns in:
- Weather conditions
- Population movement
- Stock prices
- Business strategy improvement
- Among many others.
Because we live in modern times, of course, we look to the future through entirely different means. One such means is time series forecasting. In this tutorial, we will briefly explain the idea of forecasting before using Python to make predictions based on a simple autoregressive model. We’ll also compare the results with the actual values for each period.
Without much ado, let’s cut to the chase.
What is Time Series Forecasting?
A time series is data collected over a period of time. Meanwhile, time series forecasting is an algorithm that analyzes that data, finds patterns, and draws valuable conclusions that will help us with our long-term goals.
In simpler terms, when we’re forecasting, we’re basically trying to “predict” the future. This word should sound familiar since we often hear about it on the news, be it in relation to the weather, politics, or another topic altogether. Colloquially, we use “predict” and “forecast” interchangeably – but there’s a very intricate distinction between the two.
In time series, we expect patterns to persist as we progress through time. Therefore, we follow a simple structure:
- Find the pattern
- Predict the future
Of course, finding the pattern is just a fancy way of saying we need to select the correct model, so we’re already halfway done. All that’s left is to make the predictions.
How to Make Predictions Using Time Series Forecasting in Python?
We follow 3 main steps when making predictions using time series forecasting in Python:
- Fitting the model
- Specifying the time interval
- Analyzing the results
Fitting the Model
Let’s assume we’ve already created a time series object and loaded our dataset into Python. In it, we should have the code for:
- Scraping the data
- Creating returns
- Normalizing said returns
- Separating the training and testing sets
Before we proceed, make sure you run the code, so we can have the data available and ready:
The first bit of coding we need to do ourselves involves fitting the model. As we mentioned earlier, we’re going to start with a simple autoregressive model and see how predictions evolve over time. We can name the variable model_ar:
To ensure we’ll need to make minimal changes in the future, let’s use the ARIMA method instead of the ARMA. We’re also going to use FTSE values. Lastly, we must set the order to 1, 0, 0, since this is the ARIMA equivalent of the AR(1):
Of course, we also need to store the fitted results before moving on:
Specifying the Time Interval
Next up, we must specify the time interval for our time series forecast. It could be a day, a week, or whatever period we feel like making it. Of course, the starting date of the forecasted period is essentially the first one we don’t have values for. In other words, we’re looking for the first day after the end of the training set.
Calling the tail() method, we select the first business day following whatever the last shown date is. Let’s say the last date in our dataset is July 14, 2014 – we’ll select July 15 as our first prediction. For convenience, we will store the date in a variable called start_date:
Similarly, we can store the last date of our interval in a variable called end_date. As explained earlier, we can set it equal to any day from July 14, 2014 onwards. The longer the period is, the harder it is to closely see how the data moves between dates. For this reason, let’s go with some mid-range value. We’ll set it to January 1, 2015:
This date can be altered at any point, so long that the new one falls on a business day – otherwise, Python will throw an error message.
After setting everything up, we can finally make a forecast using the predict() method. We do this by setting the array we just created, df_pred, to be equal to the results variable called results_ar.predict on which we call predict():
Inside the parentheses, we set the “start” and “end” arguments to equal to the start and end dates we defined earlier:
To get an idea of what our predictions look like, let’s graph them using the plot() method:
We can also specify the color of the plotted time series by defining the argument of the same name. Conventionally, we prefer using blue for actual values and red for predicted ones, so let’s indicate that:
By all means, remember to define the appropriate figure size and set a title. A name like “Predictions” seems fitting, so let’s set it like that:
Analyzing the Results
After adding the plt.show() command, we can run the cell to see our results:
We see a constantly decreasing line which isn’t at all realistic. In practice, we don’t expect prices to constantly decline. That would mean that the price today is as high as it is ever going to be, and the price tomorrow will always be lower. If that was the case, sellers would always want to get rid of their investments and buyers would have an incentive to hold off before they purchase. That way, everybody would be trying to sell, but nobody would be willing to buy.
The issue here comes from our model of choice. Because we’re using a simple AR model, the predictions are only based on the constant and the prices from the previous period. Thus, we get into this constant pattern of creating a curve where every new value is just a fraction of the previous one, put on top of the constant term.
We can see the curve better if we “zoom out” a little bit. To achieve that, just expand the prediction interval – to, say, November 23, 2019 – then plot the results one more time:
The new plot shows the curve much better than the previous one, so we can verify this unrealistic decline. Now you understand why the AR model is so bad at estimating non-stationary data.
Even so, let’s have a look at how these time series predictions compare to the actual values over the same time period. Before we begin, make sure to set the “end” date back once again to January 1, 2015.
We only need to add the testing set values to the graph. In fact, we only need to add the FTSE price between the “start” and “end” periods since the rest is not relevant right now:
We can also set the color of the new plot to “blue” to ensure it is distinctly different from the “red” we use for time series forecasting:
After running the cell, we see a very interesting pattern:
Over the course of the interval, actual prices cyclically jump up and down around the value we’re expecting. So, does this mean our choice of model is a good estimator for FTSE prices in the long run?
Not really. If we “zoom out” again, we’ll see how there is a trend where the values start to go up in a somewhat consistent fashion:
However, our prediction curve continues to decrease:
This shows once again that AR models aren’t the best estimators of non-stationary data. For this specific case, we’d need a more complex model of time series forecasting in order to make better price predictions.
Time Series Forecasting in Python: Next Steps
While we no longer use crystal balls to predict the future, knowing what’s ahead of us is as important as ever. Using modern methods like time series forecasting is a great way to stay on top of industry trends and anticipate changes. We can not only predict what the weather would be like for the next harvest season, but also forecast the percentage of business revenue for the next quarter, stock investment trends, and more. This opens up a great expanse of career opportunities for those budding data scientists interested in analytics and future-proofing the world!