The Best Guide to Time Series Forecasting in R

Forecasting is a technique that is popularly used in the field of machine learning for making business predictions. Companies use past time series forecasts and make business decisions for the future. In this article, we will learn about Time Series Forecasting in detail.

Become a Data Scientist with Hands-on Training!

Data Scientist Master’s ProgramExplore Program
Become a Data Scientist with Hands-on Training!

What is Time Series Forecasting?

Time series forecasting is the method of exploring and analyzing time-series data recorded or collected over a set period of time. This technique is used to forecast values and make future predictions. Not all data that have time values or date values as its features can be considered as a time series data. Any data fit for time series forecasting should consist of observations over a regular, continuous interval.

Become a Data Scientist with Hands-on Training!

Data Scientist Master’s ProgramExplore Program
Become a Data Scientist with Hands-on Training!

Time Series Forecasting Applications

Time_Series

  • Time series forecasting is used in stock price prediction to predict the closing price of the stock on each given day.
  • E-Commerce and retail companies use forecasting to predict sales and units sold for different products.
  • Weather prediction is another application that can be done using time series forecasting.
  • It is used by government departments to predict a state's population, at any particular region, or the nation as a whole.

Time Series Components

To use time-series data and develop a model, you need to understand the patterns in the data over time. These patterns are classified into four components, which are:

  • Trend

It represents the gradual change in the time series data. The trend pattern depicts long-term growth or decline.

  • Level

It refers to the baseline values for the series data if it were a straight line.

  • Seasonality

It represents the short-term patterns that occur within a single unit of time and repeats indefinitely.

  • Noise

It represents irregular variations and is purely random. These fluctuations are unforeseen, unpredictable, and cannot be explained by the model.

Become a Data Scientist with Hands-on Training!

Data Scientist Master’s ProgramExplore Program
Become a Data Scientist with Hands-on Training!

Time Series Forecasting Methods

  • ARIMA Model

ARIMA stands for Autoregressive Integrated Moving Average. It is a combination of the Autoregressive (AR) and Moving Average (MR) model. The AR model forecast corresponds to a linear combination of past values of the variable. The moving average model forecast corresponds to a linear combination of past forecast errors. The “I” represents the data values that are replaced by the difference between their values and the previous values.

  • SARIMA Model

SARIMA stands for Seasonal Autoregressive Integrated Moving Average. It extends the ARIMA model by adding a linear combination of seasonal past values and forecast errors.

  • VAR

The Vector Autoregression (VAR) method models the next step in each time series using an AR model. The VAR model is useful when you are interested in predicting multiple time series variables using a single model.

  • LSTM

The Long Short Term Memory network or LSTM is a special kind of recurrent neural network that deals with long-term dependencies. It can remember information from past data and is capable of learning order dependence in sequence prediction problems.

Become a Data Scientist with Hands-on Training!

Data Scientist Master’s ProgramExplore Program
Become a Data Scientist with Hands-on Training!

Time Series Forecasting Using the ARIMA Model

ARIMA models are classified by three factors:

p = Number of autoregressive terms (AR)

d = How many non-seasonal differences are needed to achieve stationarity (I)

q = Number of lagged forecast errors in the prediction equation (MA)

In this demo, we’ll use a dataset with information about air-ticket sales of the airline industry from 1949-1960. We’ll predict the Airline tickets’ sales of 1961 using the ARIMA model in R.

Time_Series-2.

The idea for this analysis is to identify the time series components which are:

  • Trend 
  • Seasonality
  • Random behavior of data

Then, we’ll forecast the values based on historical data.

Load the Forecast Package into RStudio

install.packages('forecast')

library(forecast)

  • Load the Air Passengers’ Dataset and View Its Class

data("AirPassengers")

class(AirPassengers

Here, ts represents that it’s a time series dataset.

  • Display the Dataset

Time_Series-3

Let’s check on our date values

start(AirPassengers)

[1] 1949    1

end(AirPassengers)

[1] 1960   12

So, our start date is January 1949, while the end date is December 1960.

Learn Job Critical Skills To Help You Grow!

Post Graduate Program In Data EngineeringExplore Program
Learn Job Critical Skills To Help You Grow!

  • Find out If There Are Any Missing Values

sum(is.na(AirPassengers))

[1] 0

  • Check the Summary of the Dataset

summary(AirPassengers)

/Time_Series-4

  • Plot the Dataset

plot(AirPassengers)

Time_Series-5

  • Decompose the Data Into Four Components

tsdata <- ts(AirPassengers, frequency = 12) 

ddata <- decompose(tsdata, "multiplicative")

plot(ddata)

Time_Series-6.

  • Plot the Different Components Individually

plot(ddata$trend)

plot(ddata$seasonal)

plot(ddata$random)

Time_Series-7

  • Plot a Trendline on the Original Dataset

plot(AirPassengers)

abline(reg=lm(AirPassengers~time(AirPassengers)))

time series 10

Become a Data Scientist with Hands-on Training!

Data Scientist Master’s ProgramExplore Program
Become a Data Scientist with Hands-on Training!

Create a Box Plot by Cycle

boxplot(AirPassengers~cycle(AirPassengers, xlab="Date", ylab = "Passenger Numbers (1000's)", main = "Monthly air passengers boxplot from 1949-1960"))

Time_Series-11

From the above plot, you can see that the number of ticket sales goes higher in June, July, and August as compared to the other months of the years. 

  • Build the ARIMA Model Using auto.arima() Function

mymodel <- auto.arima(AirPassengers)

mymodel

Time_Series-12

  • Plot the Residuals

plot.ts(mymodel$residuals)

Time_Series-13

  • Forecast the Values for the Next 10 Years

myforecast <- forecast(mymodel, level=c(95), h=10*12)

plot(myforecast)

Time_Series-14

  • Validate the Model by Selecting Lag Values

Box.test(mymodel$resid, lag=5, type="Ljung-Box")

Time_Series-15

Box.test(mymodel$resid, lag=10, type="Ljung-Box")

time series16

Box.test(mymodel$resid, lag=15, type="Ljung-Box")

Time_Series-17

Looking at the lower p values, we can say that our model is relatively accurate, and we can conclude that from the ARIMA model, that the parameters (2, 1, 1) adequately fit the data. 

Learn over a dozen of data science tools and skills with PG Program in Data Science and get access to masterclasses by Purdue faculty. Enroll now and add a shining star to your data science resume!

Conclusion

After reading this article, you would have learned what time series is, its various applications, and its different components. Finally, we looked into creating a time series forecasting model using the ARIMA model in R to predict the sale of airline tickets. 

Become a Data Scientist with Hands-on Training!

Data Scientist Master’s ProgramExplore Program
Become a Data Scientist with Hands-on Training!

If you have any questions related to this article on ‘Time Series Forecasting’, then please ask us in the comments section of this article. Our team of experts will help you solve your queries at the earliest!

About the Author

Avijeet BiswalAvijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.