The Autoregressive Integrated Moving Average (ARIMA) family of models has now been a workhorse of time series forecasting for almost 50 years. The acronym indicates that the model is composed of different terms, an autoregressive term (AR), a moving-average term (MA), and an integration term (I) that accounts for the non-stationarity of the time series. The three terms combined constitute one of the most widely used discrete-time dynamic model.

ARIMA model definition

Consider the three linear dynamical system models:

AutoRegressive (AR) model of order \(p\):

  1. \[x_t=\phi_1 x_{t-1}+ \dots + \phi_p x_{t-p}+\omega_t\]

Moving Average (MA) model of order \(q\):

  1. \[x_t=\theta_1 \omega_{t-1}+ \dots + \theta_q \omega_{t-q}+\omega_t\]

AutoRegressive Moving Average (ARMA) model of order \(p\) and \(q\):

  1. \[x_t=\phi_1 x_{t-1}+ \dots + \phi_p x_{t-p} + \theta_1 \omega_{t-1}+ \dots + \theta_q \omega_{t-q}+\omega_t\]

where \(w_t\) is a Gaussian white noise with zero mean, constant variance \(\sigma^2_\omega\) and zero covariances. In the AR model, the current value of the process, \(x_t\), is expressed as a finite linear aggregate of previous values of the process and white noise \(\omega_t\). In the MA model, the current value of the process, \(x_t\), is a finite linear aggregate of previous values of the white noise, \(\omega_{t-1}\), \(\omega_{t-2}\), \(\omega_{t-q}\), plus the current value of the noise \(\omega_t\). In the ARMA model, \(x_t\) is expressed as a sum of finite linear aggregate of previous values of the process, and the past and current white noise inputs. The ARMA process is the most flexible of the three, meaning that it will require fewer parameters to be estimated when the model is used to forecast time series data.

As detailed in Holden (1995), that any stationary process can be uniquely represented by a possibly infinite MA process is due to Wold (1938 Theorem 7). That is, ignoring any deterministic component that can be removed by subtraction, (1) can be written as

\[x_t=(1+\theta_1L+\theta_2L^2 + \dots)\epsilon_t=\theta(L)\epsilon_t,\]

where \(L\) is the lag operator, often called the backshift operator \(B\), such that \((1-L^k)x_t\equiv x_t-x_{t-k}\). That any stationary process can be uniquely represented by a possibly infinite AR process is due to Whittle (1963, p. 21). Again, ignoring any deterministic component, (2) can be written as

\[\epsilon=(1-\phi_1L-\phi_2L^2 - \phi_3L^3 \dots)x_t=\phi(L)x_t.\]

Equation (3), which can be written as

\[\phi(L)x_t=\theta(L)\epsilon_t \to \frac{\phi(L)}{\theta(L)}x_t=\epsilon,\]

provides the groundwork for the ARMA models of Box & Jenkins (1976) (first published in the 1970, after a series of working papers).

For time series that are not stationary Box & Jenkins (1976) popularised the idea that differencing a non-stationary process \(d\) times can reduce it to a (nearly) stationary process so that the ARMA modelling framework above applies. That is, if \(y_t\) is non-stationary, stationarity can often be achieved by taking differences \(x_t=(1-L)^dy_t\). Note that in practice \(d\) is rarely greater than 2.

The general (non-seasonal) ARIMA process is commonly referred to as ARIMA(\(p,d,q\)), where \(d\) is the order of integration of the process, \(p\) is the number of autoregressive parameters and \(q\) is the number of moving average parameters. Special cases follow naturally: ARIMA(\(p,0,q\)) \(\equiv\) ARMA(\(p,q\)), ARIMA(\(p,0,0\)) \(\equiv\) AR(\(p\)), ARIMA(\(0,0,q\)) \(\equiv\) MA(\(q\)), ARIMA(\(0,1,0\)) is a random walk process and ARIMA(\(0,0,0\)) is a white noise. Further details are available in almost any book on time series forecasting and zillions of papers.


Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis, forecasting and control. San Francisco: Holden-Day.

Holden, K. (1995). Vector autoregression modeling and forecasting. Journal of Forecasting, 14(3), 159–166.

Whittle, P. (1963). Prediction and regulation. London: English University Press.

Wold, H. (1938). A study in the analysis of stationary time series. Upsala: Almquist; Wiksell.

Time series models: VAR model definition

The Vector AutoRegression (VAR) family of models has been widely used for modelling and forecasting since the early 1980s. A VAR model is...… Continue reading

Types of forecast: ex ante vs ex post

Published on September 06, 2015

Outliers and the correlation coefficient

Published on September 05, 2015