Honglei Xie

Linear regression with ARMA errors

February 25, 2017 | 4 Minute Read

Recently I’m getting more and more interested in time series prediction, which might be somehow neglected by the machine learning community. However, this topic should have attracted massive attention — who wouldn’t wish to know (even get a bit of sense) tomorrow’s stock price for example? Obviously, this type of question is difficult. People actually even do not know if it is possible to solve. The author of “A Random Walk to Wall Street” even claimed that “a blindfolded monkey throwing darts at a newspaper’s financial pages could select a portfolio that would do just as well as one carefully selected by experts.” The prevailing theory says that stock price is totally random and unpredictable. So this post just aims to demonstrate one of many ways to model time series data. For example, some models are proved to be useful and meaningful by economists in investigating the relationship between macro-economical indexes.

Let’s get back to this post: multivariate time series prediction. A decent example would be like that predicting next 5 year’s bank loan’s default rate given next 5 year’s macro-economical variables such as unemployment rate, prime rate, GDP etc. Here I’m going to show you how to build the linear regression model with (non-seasonal) ARMA errors to make time series prediction.

Workflow for time series data

  • detect outliers
  • detect trend
  • detect seasonality
  • detect long-run cycle (assuming constant variance)

If you are using R, I will recommend function TS:::decompose to decompose time series where it provides two ways: additive decomposition and multiplicative decomposition.

Model

You may want to fit an ordinary linear regression on the time-series data. However, not surprisingly, the resulting model’s residuals have a time series structure since the \(i.i.d\) assumption does not hold. Therefore the estimated coefficients and confidence intervals are not reliable. To address the issue, we should build linear regression model with ARIMA errors.

A non-seasonal linear regression model with ARIMA \(p, d, q\) errors can be written as:

\[y_t = \beta_0 + \beta_1 x_{1t} + \beta_2 x_{2t} + .... + e_t\]

where \((1-\phi_1B – \cdots – \phi_p B^p)(1-B)^d e_t^ = c + (1 + \theta_1 B + \cdots + \theta_q B^q)w_t\)

where \(B\) is the backshift operator; \[ c = \mu(1-\phi_1-\phi_2-...-\phi_p) \] and \[w_t \sim iid \text{N}(0, \sigma^2) \]

\(H\) step-ahead time series forecasting strategies

When you try to provide long time forecasts with limited sample size, you have to be careful about the strategy of choice. I found out this paper gives a comprehensive review of \(H\) step-ahead time series forecasting strategies. In the example I’m going to show you, I adopted the so-called “recursive strategy” where we optimize a model based on a one-step ahead criterion. When calculating a \(H\)-step ahead forecast, we iteratively feed the forecasts of the model back in as input for the next prediction. Of course, there are many other strategies out there and “Multiple-Output strategies are the best performing approaches”.

Firstly, a single model \(f\) is trained to perform a one-step ahead forecast. \[ y_{t+1} = f(y_t, ..., y_{t-p+1}) + \epsilon_t \]

The advantage of this method is it returns the unbiased estimator of

\[\mathbb{E}\left[ \textbf{y}_{(t+1):(t+H)} \,\vert\, \textbf{y}_{(t-d+1):t} \right]\]

where

\[ \textbf{y}_{(t+1):(t+H)} = [y_{t + 1}, \ldots, y_{t + H}] \in \mathbb{R}^H \] and \[ \textbf{y}_{(t-d+1):t} = [y_{t - d + 1}, \ldots, y_t] \in \mathbb{R}^d \]

Another advantage is that we need to train only one model.

Implementation

Unfortunately, I was unable to find any on-line code to apply the recursive strategy so I wrote my own. Here is an end-to-end example walk-through

Two forecast accuracy matrices, \(MAPE\) and \(SMAPE\) are returned, contained in the final dataframe.

\[ \text{MAPE} = 100\text{mean}(|y_t – \hat{y}_t|/|y_t|) \]

Because \( \text{MAPE} \) puts a heavier penalty on negative errors (when \(y_<\hat{y}_t\)) than on positive errors. To avoid the asymmetry of \( \text{MAPE}\), we should also consider the Symmetric MAPE (SMAPE).

\[ \text{sMAPE} = 100\text{mean}(2|y_t – \hat{y}_t|/|y_t + \hat{y}_t|) \]

Also try

Bayesian

Let’s forget ARIMA and go Bayesian! I have found this amazing post that might be of your interests.

Recurrent neural network (RNN)

Using LSTM to do time series prediction seems to be very common nowadays.