Lesson 7 of 14By Avijeet Biswal
Last updated on Jan 28, 202167284Forecasting is a technique that is popularly used in the field of machine learning for making business predictions. Companies use past time series forecasts and make business decisions for the future. In this article, we will learn about Time Series Forecasting in detail.
Following are the topics that will be discussed in this article:
Time series forecasting is the method of exploring and analyzing time-series data recorded or collected over a set period of time. This technique is used to forecast values and make future predictions. Not all data that have time values or date values as its features can be considered as a time series data. Any data fit for time series forecasting should consist of observations over a regular, continuous interval.
To use time-series data and develop a model, you need to understand the patterns in the data over time. These patterns are classified into four components, which are:
It represents the gradual change in the time series data. The trend pattern depicts long-term growth or decline.
It refers to the baseline values for the series data if it were a straight line.
It represents the short-term patterns that occur within a single unit of time and repeats indefinitely.
It represents irregular variations and is purely random. These fluctuations are unforeseen, unpredictable, and cannot be explained by the model.
ARIMA stands for Autoregressive Integrated Moving Average. It is a combination of the Autoregressive (AR) and Moving Average (MR) model. The AR model forecast corresponds to a linear combination of past values of the variable. The moving average model forecast corresponds to a linear combination of past forecast errors. The “I” represents the data values that are replaced by the difference between their values and the previous values.
SARIMA stands for Seasonal Autoregressive Integrated Moving Average. It extends the ARIMA model by adding a linear combination of seasonal past values and forecast errors.
The Vector Autoregression (VAR) method models the next step in each time series using an AR model. The VAR model is useful when you are interested in predicting multiple time series variables using a single model.
The Long Short Term Memory network or LSTM is a special kind of recurrent neural network that deals with long-term dependencies. It can remember information from past data and is capable of learning order dependence in sequence prediction problems.
ARIMA models are classified by three factors:
p = Number of autoregressive terms (AR)
d = How many non-seasonal differences are needed to achieve stationarity (I)
q = Number of lagged forecast errors in the prediction equation (MA)
In this demo, we’ll use a dataset with information about air-ticket sales of the airline industry from 1949-1960. We’ll predict the Airline tickets’ sales of 1961 using the ARIMA model in R.
The idea for this analysis is to identify the time series components which are:
Then, we’ll forecast the values based on historical data.
install.packages('forecast')
library(forecast)
data("AirPassengers")
class(AirPassengers
Here, ts represents that it’s a time series dataset.
Let’s check on our date values
start(AirPassengers)
[1] 1949 1
end(AirPassengers)
[1] 1960 12
So, our start date is January 1949, while the end date is December 1960.
sum(is.na(AirPassengers))
[1] 0
summary(AirPassengers)
plot(AirPassengers)
tsdata <- ts(AirPassengers, frequency = 12)
ddata <- decompose(tsdata, "multiplicative")
plot(ddata)
plot(ddata$trend)
plot(ddata$seasonal)
plot(ddata$random)
plot(AirPassengers)
abline(reg=lm(AirPassengers~time(AirPassengers)))
boxplot(AirPassengers~cycle(AirPassengers, xlab="Date", ylab = "Passenger Numbers (1000's)", main = "Monthly air passengers boxplot from 1949-1960"))
From the above plot, you can see that the number of ticket sales goes higher in June, July, and August as compared to the other months of the years.
mymodel <- auto.arima(AirPassengers)
mymodel
plot.ts(mymodel$residuals)
myforecast <- forecast(mymodel, level=c(95), h=10*12)
plot(myforecast)
Box.test(mymodel$resid, lag=5, type="Ljung-Box")
Box.test(mymodel$resid, lag=10, type="Ljung-Box")
Box.test(mymodel$resid, lag=15, type="Ljung-Box")
Looking at the lower p values, we can say that our model is relatively accurate, and we can conclude that from the ARIMA model, that the parameters (2, 1, 1) adequately fit the data.
Looking forward to becoming a Data Scientist? Check out the Data Scientist Masters Program and get certified today
After reading this article, you would have learned what time series is, its various applications, and its different components. Finally, we looked into creating a time series forecasting model using the ARIMA model in R to predict the sale of airline tickets.
If you have any questions related to this article on ‘Time Series Forecasting’, then please ask us in the comments section of this article. Our team of experts will help you solve your queries at the earliest!
Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.