PhD Days: September 2014

I used http://www.ral.ucar.edu/~ericg/extRemes/extRemes2.pdf to try out "extRemes" R package.
Below is my notes and thoughts.
On my desk was a copy of
Coles, S. (2001). An introduction to statistical modeling of extreme values / Stuart Coles, London ; New York : Springer, [2001].

Fitting the Generalized Extreme Value distribution function to the Port Jervis, New York annual maximum winter temperature data using "extRemes" R package

library(extremes)

data("PORTw")

plot(PORTw$TMX1, type = "l", xlab = "Year", ylab = "Maximum winter temperature", col = "darkblue")

Default Run

fit1 <- fevd(TMX1, PORTw, units = "deg C")

fit1

Output

fevd(x = TMX1, data = PORTw, units = "deg C")
[1] "Estimation Method used: MLE"

Negative Log-Likelihood Value: 172.7426

Estimated parameters: location scale shape 15.1406132 2.9724952 -0.2171486

Standard Error Estimates: location scale shape 0.39745119 0.27521741 0.07438302

Estimated parameter covariance matrix. location scale shapelocation 0.15796745 0.01028664 -0.010869596scale 0.01028664 0.07574462 -0.010234809shape -0.01086960 -0.01023481 0.005532834

AIC = 351.4853

BIC = 358.1438

GEV model has 3 parameters

locatation
scale
shape

The simplest distribution function is Gumbel, for which is only 2 parameters.
When the Gumbel df. is a good fit, the estimate of the shape parameter should be approx. 0

Here shape is not close to 0. Gumbel probably not good.

3 measures of the goodness of fit provided.

Negative Log-Likelihood Value
AIC
BIC

Gumbel Run

fit0 <- fevd(TMX1, PORTw, type = "Gumbel", units = "deg C")

fit0

fevd(x = TMX1, data = PORTw, type = "Gumbel", units = "deg C")
[1] "Estimation Method used: MLE"

Negative Log-Likelihood Value: 175.7782

Estimated parameters:
location scale
14.799982 2.886128

Standard Error Estimates:
location scale
0.3709054 0.2585040

Estimated parameter covariance matrix.
location scale
location 0.13757080 0.03173887
scale 0.03173887 0.06682429

AIC = 355.5563

BIC = 359.9954

All the goodness of fit measure are all larger as expected.

lr.test Likelihood-Ratio Test

Use the likelihood ratio test to confirm this

lr.test(fit1,fit0)
Likelihood-ratio Test
data: TMX1TMX1
Likelihood-ratio = 6.0711, chi-square critical value = 3.841, alpha = 0.050,
Degrees of Freedom = 1.000, p-value = 0.01374 alternative hypothesis: greater

I look at the p-value, because it makes sense to me.

Bayesian estimation

> fB <- fevd(TMX1, PORTw, method = "Bayesian")
> fB

fevd(x = TMX1, data = PORTw, method = "Bayesian")
[1] "Estimation Method used: Bayesian"

Acceptance Rates:
log.scale shape
0.5379076 0.4971994

fevd(x = TMX1, data = PORTw, method = "Bayesian")

[1] "Quantiles of MCMC Sample from Posterior Distribution"

2.5% Posterior Mean 97.5%
location 14.2452034 15.1496558 16.0945492194
scale 2.5276179 3.0935399 3.8803429287
shape -0.3654228 -0.1975376 -0.0004970114

Estimated parameter covariance matrix.
location log.scale shape
location 0.240496184 0.009813165 -0.015884315
log.scale 0.009813165 0.124166098 -0.013129196
shape -0.015884315 -0.013129196 0.008814744

DIC = 1045.476

L-moments

fitLM <- fevd(TMX1, PORTw, method = "Lmoments")
fitLM

fevd(x = TMX1, data = PORTw, method = "Lmoments")
[1] "GEV Fitted to TMX1 of PORTw data frame, using L-moments estimation."
location scale shape
15.1775146 3.0286294 -0.2480594

Significance of Co-variance Information

plot(PORTw$TMX1, PORTw$AOindex)

> fit2 <- fevd(TMX1, PORTw, location.fun = ~AOindex, units = "deg C")
> fit2

fevd(x = TMX1, data = PORTw, location.fun = ~AOindex, units = "deg C")
[1] "Estimation Method used: MLE"

Negative Log-Likelihood Value: 166.7992

Estimated parameters:
mu0 mu1 scale shape
15.2538412 1.1518782 2.6809613 -0.1812824

Standard Error Estimates:
mu0 mu1 scale shape
0.35592663 0.31800904 0.24186870 0.06725912

Estimated parameter covariance matrix.
mu0 mu1 scale shape
mu0 0.126683767 0.002230374 0.010009100 -0.008065698
mu1 0.002230374 0.101129752 -0.002538585 0.002075487
scale 0.010009100 -0.002538585 0.058500466 -0.007020374
shape -0.008065698 0.002075487 -0.007020374 0.004523789

AIC = 341.5984

BIC = 350.4764

What is going on here?
location parameterisation now has 2 parts mu0 and mu1

mu(x) = mu0 + mu1 * x, where x = AO Index

lr-test
All the goodness of fit measure are lower when AO Index is consider
Expect utilises the AO index information improves the fit of the model

lr.test(fit1,fit2)
Likelihood-ratio Test
data: TMX1TMX1
Likelihood-ratio = 11.8869, chi-square critical value = 3.841, alpha = 0.050,
Degrees of Freedom = 1.000, p-value = 0.0005653

p-value very small, result as expected

Is the covariance information significant to the scale parameter?

fit3 <- fevd(TMX1, PORTw, location.fun= ~AOindex, scale.fun = ~AOindex, units = "deg C")
> fit3
fevd(x = TMX1, data = PORTw, location.fun = ~AOindex, scale.fun = ~AOindex,
units = "deg C")
[1] "Estimation Method used: MLE"

Negative Log-Likelihood Value: 166.6593

Estimated parameters:
mu0 mu1 sigma0 sigma1 shape
15.2616373 1.1802783 2.6782395 -0.1464682 -0.1860603

Standard Error Estimates:
mu0 mu1 sigma0 sigma1 shape
0.35610192 0.31724472 0.24242220 0.27629143 0.06862845

Estimated parameter covariance matrix.
mu0 mu1 sigma0 sigma1 shape
mu0 0.126808580 -0.0085301401 0.0098703920 -0.001720545 -0.0084519671
mu1 -0.008530140 0.1006442141 -0.0001225007 -0.014111326 0.0004891586
sigma0 0.009870392 -0.0001225007 0.0587685234 -0.005405442 -0.0073208829
sigma1 -0.001720545 -0.0141113262 -0.0054054420 0.076336953 0.0015286996
shape -0.008451967 0.0004891586 -0.0073208829 0.001528700 0.0047098637

AIC = 343.3185

BIC = 354.4161

lr-test
All the goodness of fit measure are HIGHER when AO Index is consider for location and scale
than only location

lr.test(fit2,fit3)
Likelihood-ratio Test
data: TMX1TMX1

Likelihood-ratio = 0.2798, chi-square critical value = 3.841, alpha = 0.050,
Degrees of Freedom = 1.000, p-value = 0.5968
alternative hypothesis: greater

p-value is large,

Currently play with the Port Jervis, New York annual maximum and minimum winter temperature data, provided with the extRemes R package

Extreme value deal with rare events which are unlikely to follow patterns and assumptions common with complete event view, but standard initial analysis tools can still provide insight.

library(extremes)
data("PORTw")

plot(PORTw$TMX1, type = "l", xlab = "Year", ylab = "Maximum winter temperature", col = "red")
plot(PORTw$TMX0, type = "l", xlab = "Year", ylab = "Maximum winter temperature", col = "darkblue")

Visual inspection

Trend - none
Cycle - none
Clustering- none
Pair wise correlation - maybe

Both are non-cyclic, and can probably be described using an additive model, since the random fluctuations in the data are roughly constant in size over time:

Exponential Model

Lets try fitting a Simple Exponential Smoothing
convert to a time series object

> maxts <- ts(PORTw$TMX1, start=c(1927))

Fit model with no trend or cycle

> fit1 <- HoltWinters(maxts, beta=FALSE, gamma=FALSE)
> fit1
Holt-Winters exponential smoothing without trend and without seasonal component.
Call:
HoltWinters(x = maxts, beta = FALSE, gamma = FALSE)
Smoothing parameters:
alpha: 0.1368945
beta : FALSE
gamma: FALSE
Coefficients:
[,1]
a 16.49764

plot(fit1)

Simple measure of fit. sum-of-squared-errors

fit1$SSE
[1] 763.3207

ARIMA Model

Difference the times series

maxtsdiff <- diff(maxts, differences = 1)

acf(maxtsdiff, lag.max = 20)

Basic visual inspection, may something at 1 year lag.

maxtsarima <- arima(maxts, order=c(0,1,1))
> maxtsarima
Series: maxts
ARIMA(0,1,1)
Coefficients:
ma1
-1.0000

s.e. 0.0532

sigma^2 estimated as 9.636:

log likelihood=-173.07

AIC=350.15

AICc=350.33

BIC=354.56

And let R work out if and what order ARIMA would be appropriate

> mtsARIMA <- auto.arima(maxts)
> mtsARIMA
Series: maxts
ARIMA(0,0,0) with non-zero mean
Coefficients:
intercept
16.3154
s.e. 0.3737
sigma^2 estimated as 9.494:

log likelihood=-173.01

AIC=350.02

AICc=350.21

BIC=354.46

No moving average happening

Pair Wise Correlation

Start with scatter plots

Check correlation coefficients

> cor(PORTw$TMX1, PORTw$TMN0)
[1] 0.1802413> cor(PORTw$TMX1, PORTw$AOindex)[1] 0.3944286> cor(PORTw$TMN0, PORTw$AOindex)[1] 0.01206764> cor(PORTw$TMX1, PORTw$AOindex, method="kendall")[1] 0.3019692

Possibly something to work with Arctic Oscillation Index and Maximium winter temperature

PhD Days

Monday, September 15, 2014

Trying out ExtRemes R package : Part 1 - Block Maxima Approach