Forecasting is a technique that is popularly used in the field of machine learning for making business predictions. Companies use past time series forecasts and make business decisions for the future. In this article, we will learn about Time Series Forecasting in detail.
What is Time Series Forecasting?
Time series forecasting is the method of exploring and analyzing time-series data recorded or collected over a set period of time. This technique is used to forecast values and make future predictions. Not all data that have time values or date values as its features can be considered as a time series data. Any data fit for time series forecasting should consist of observations over a regular, continuous interval.
Time Series Forecasting Applications
- Time series forecasting is used in stock price prediction to predict the closing price of the stock on each given day.
- E-Commerce and retail companies use forecasting to predict sales and units sold for different products.
- Weather prediction is another application that can be done using time series forecasting.
- It is used by government departments to predict a state's population, at any particular region, or the nation as a whole.
Time Series Components
To use time-series data and develop a model, you need to understand the patterns in the data over time. These patterns are classified into four components, which are:
It represents the gradual change in the time series data. The trend pattern depicts long-term growth or decline.
It refers to the baseline values for the series data if it were a straight line.
It represents the short-term patterns that occur within a single unit of time and repeats indefinitely.
It represents irregular variations and is purely random. These fluctuations are unforeseen, unpredictable, and cannot be explained by the model.
Time Series Forecasting Methods
ARIMA stands for Autoregressive Integrated Moving Average. It is a combination of the Autoregressive (AR) and Moving Average (MR) model. The AR model forecast corresponds to a linear combination of past values of the variable. The moving average model forecast corresponds to a linear combination of past forecast errors. The “I” represents the data values that are replaced by the difference between their values and the previous values.
SARIMA stands for Seasonal Autoregressive Integrated Moving Average. It extends the ARIMA model by adding a linear combination of seasonal past values and forecast errors.
The Vector Autoregression (VAR) method models the next step in each time series using an AR model. The VAR model is useful when you are interested in predicting multiple time series variables using a single model.
The Long Short Term Memory network or LSTM is a special kind of recurrent neural network that deals with long-term dependencies. It can remember information from past data and is capable of learning order dependence in sequence prediction problems.
Time Series Forecasting Using the ARIMA Model
ARIMA models are classified by three factors:
p = Number of autoregressive terms (AR)
d = How many non-seasonal differences are needed to achieve stationarity (I)
q = Number of lagged forecast errors in the prediction equation (MA)
In this demo, we’ll use a dataset with information about air-ticket sales of the airline industry from 1949-1960. We’ll predict the Airline tickets’ sales of 1961 using the ARIMA model in R.
The idea for this analysis is to identify the time series components which are:
- Random behavior of data
Then, we’ll forecast the values based on historical data.
Load the Forecast Package into RStudio
Load the Air Passengers’ Dataset and View Its Class
Here, ts represents that it’s a time series dataset.
Display the Dataset
Let’s check on our date values
 1949 1
 1960 12
So, our start date is January 1949, while the end date is December 1960.
Find out If There Are Any Missing Values
Check the Summary of the Dataset
Plot the Dataset
Decompose the Data Into Four Components
tsdata <- ts(AirPassengers, frequency = 12)
ddata <- decompose(tsdata, "multiplicative")
Plot the Different Components Individually
Plot a Trendline on the Original Dataset
Create a Box Plot by Cycle
boxplot(AirPassengers~cycle(AirPassengers, xlab="Date", ylab = "Passenger Numbers (1000's)", main = "Monthly air passengers boxplot from 1949-1960"))
From the above plot, you can see that the number of ticket sales goes higher in June, July, and August as compared to the other months of the years.
Build the ARIMA Model Using auto.arima() Function
mymodel <- auto.arima(AirPassengers)
Plot the Residuals
Forecast the Values for the Next 10 Years
myforecast <- forecast(mymodel, level=c(95), h=10*12)
Validate the Model by Selecting Lag Values
Box.test(mymodel$resid, lag=5, type="Ljung-Box")
Box.test(mymodel$resid, lag=10, type="Ljung-Box")
Box.test(mymodel$resid, lag=15, type="Ljung-Box")
Looking at the lower p values, we can say that our model is relatively accurate, and we can conclude that from the ARIMA model, that the parameters (2, 1, 1) adequately fit the data.
Looking forward to becoming a Data Scientist? Check out the Data Science Certification Courses and get certified today
After reading this article, you would have learned what time series is, its various applications, and its different components. Finally, we looked into creating a time series forecasting model using the ARIMA model in R to predict the sale of airline tickets.
If you have any questions related to this article on ‘Time Series Forecasting’, then please ask us in the comments section of this article. Our team of experts will help you solve your queries at the earliest!