The first step in any time series investigation always involves careful scrutiny of the recorded data plotted over time
This picture shows quarterly earnings per share for the U.S. company Johnson&Johnson.
In this case we want to focus our attention on the increasing underlying trend variability, and somewhat regular oscillation superimposed on the trend that seems to repeat over quarters.
library(astsa)
tsplot(jj, type="o", ylab="Quarterly Earnings per Share")
tsplot(log(jj)) # not shown
This picture shows prices and daily returns of the Standard and Poor’s 100 Index (S&P100) from 1984 to 2017
It is easy to spot the financial crisis of 2008 in the figure
The mean of the series appears to be stable with an average return of apporximately zero, however, the volatility (or variability) of data exhibits clustering; that is highly volative periods tend to be clustered together
library(xts)
djiar = diff(log(djia$Close))[-1] # approximate returns
tsplot(djiar , main="DJIA Returns", xlab=’’, margins=.5)
This image shows the weekly USD/GBP foreign exchange rate (U.S. Dollars to One British Pound)
library("fImport")
usbp=fredSeries("DEXUSEU", from="2001-01-01")
tsplot(usbp)
This image shows Cryptocurrency “BitCoin” from April 28, 2013 to November 25, 2017
The primary objective of time series analysis is to develop mathematical models that provide plausible descriptions for sample data
A simple kind of generated series might be a collection of uncorrelated random variables, wt, with mean 0 and finite variance σ^2. We denote this process as ε_t ~ N(0,σ^2)
The time series generated from uncorrelated variables is used as a model for noise in engineering applications where it is called white noise
We often require stronger conditions and need the noise to be Gaussian white noise, where in the ε_t are independent and identically distributed (iid) normal random variables, with mean 0 and variance σ^2
Although both cases require 0 mean and constant variance, the difference is that generically, the term white noise means the time series is uncorrelated. Gaussian white noise implies normality (which implies independence)
If the stochastic behaviour of all time series could be explained in terms of the white noise model, classical statistical methods would suffice.
We might replace the white noise series wt by a moving average that smooths the series
e.g.
w = rnorm (500,0,1) # 500 N(0,1) variates
v = filter(w, sides=2, rep (1/3 ,3)) # moving average
par(mfrow=c(2,1))
tsplot(w, main="white noise")
tsplot(v, ylim=c(-3,3), main="moving average")
This series is much smoother than the white noise series, and it is apparent that averaging removes some of the high frequency behaviour of the noise.
A linear combination of values in a time series such as the above equation is referred to, generically, a filtered series; hence the command filter.
Suppose we consider the white noise series ε_t as input and calculate the output using the second order equation
Successively for t = 1,2... the previous equation represents a regression or prediction fo the current value x_t of a time series as a function of the past two values of the series, and, hence, the term autoregression is suggested for this model.
A problem with startup values exists here because the equation also depends on the initial conditions x_0 and x_-1, but for now assume they are 0. We can then generate data recursively by substituting into the previuos formula
w = rnorm (550,0,1) # 50 extra to avoid startup problems
x = filter(w, filter=c(1,-.9), method="recursive")[-(1:50)]
tsplot(x, main="autoregression")
for t = 1,2,... with initial condition x_0 = 0, ans where ε_t is white noise. The constant δ is called the drift, and when δ = 0, the model is called simply a random walk because the value of the time t is the value of the series at time t-1 plus a completely random movement determined by ε_t. Note that we may rewrite the previous formula as a cumulative sum of white noise variates.
for t = 1,2,...
set.seed(154) # so you can reproduce the results
w = rnorm(200); x = cumsum(w) # two commands in one line
wd = w + .2; xd = cumsum(wd)
ts.plot(xd, ylim = c(-5, 55), main = "random walk", ylab ='')
abline(a = 0, b= .2, lty = 2) # drift
lines (x, col = 4)
abline (h = 0, col = 4, lty = 2)
We now discuss various measures that describe the general behavior of a process as it evolves over time. A rather simple descriptive measure is the mean function, such as the average monthly high temperature for your city. In this case, the mean is a function of time.
The autocovariance measures the linear dependence between two points on the same series observed at different times
If the result is 0, then x_s and x_t are not linearly related, but there may be some dependence structure between them.
The ACF measures the linear predictability of the series at time t, say x_ t, using only the value x_ s.
The ACF has values in [-1,1], easily shown with Cauchy-Schwarz inequality
If we can predict x_ t perfectly from x_ s through a linear relationship, x_ t = β_0 + β_1 * x_ s, then the correlation will be +1 when β_1 > 0, and -1 when β_1 < 0