Time series forecasting.

e are considering a problem of forecasting of a random variable

based on information contained by some vector $x\in R^{M}.$ The

is treated as a sample of some random variable that we also denote as

Proposition

We evaluate the quality of the forecast $\hat{y}$ by calculating the quadratic deviation of the forecast: MATH Consequently, the optimal forecast is given by the conditional expectation MATH

Proof

The forecast is some function of : . Therefore, we require that the function of interest $f_{0}$ would satisfy MATH for any smooth function and a small real number $\varepsilon$ , where the functional is defined by MATH Thus, we have MATH By properties of conditional expectation, MATH MATH The last equation holds for any smooth $u\left( x\right)$ . Therefore, MATH as claimed.

If the information

is sufficiently large then it makes sense to look for

among linear functions:

. Repeating the above argument, we have MATH

with linear $f_{0}$ and

We extend the above requirement to the case of a vector valued variable $y\in R^{N}$ . We require that the function

, for some matrix

, would be such that for any linear

, the random quantities

and $u\left( x\right)$ are not correlated: MATH

We see that the forecasting operation

is similar to a linear projection of

. This motivates introduction of the operation MATH

that takes two vector valued random variables and produces a deterministic matrix. We would like to treat this operation as a scalar product even though it does not have a full set of properties. We will construct a projection operation

with the defining properties MATH

We proceed to verify that the projection

may be defined by the straightforward adaptation of the formula from the elementary geometry MATH

on a subclass of random variables with zero mean:

Indeed,

and

From the two properties, verified here, all the other well known properties of orthogonal projection follow. This allows using of geometric intuition in the computation that follows.

The operation