Monday, September 15, 2014

Trying out ExtRemes R package : Part 1 - Block Maxima Approach


I used http://www.ral.ucar.edu/~ericg/extRemes/extRemes2.pdf to try out "extRemes" R package.
Below is my notes and thoughts.
On my desk was  a copy of
Coles, S. (2001). An introduction to statistical modeling of extreme values / Stuart Coles, London ; New York : Springer, [2001].


Fitting the Generalized Extreme Value distribution function  to the Port Jervis, New York annual maximum winter temperature data using "extRemes" R package

library(extremes) 
data("PORTw") 
plot(PORTw$TMX1, type = "l", xlab = "Year", ylab = "Maximum winter temperature", col = "darkblue")

Default Run

fit1 <- fevd(TMX1, PORTw, units = "deg C") 
fit1

Output

fevd(x = TMX1, data = PORTw, units = "deg C")
[1] "Estimation Method used: MLE"

 Negative Log-Likelihood Value:  172.7426 

 Estimated parameters:  location      scale      shape 15.1406132  2.9724952 -0.2171486  

 Standard Error Estimates:  location      scale      shape 0.39745119 0.27521741 0.07438302  

 Estimated parameter covariance matrix.            location       scale        shapelocation  0.15796745  0.01028664 -0.010869596scale     0.01028664  0.07574462 -0.010234809shape    -0.01086960 -0.01023481  0.005532834 

 AIC = 351.4853  

 BIC = 358.1438 

GEV model has 3 parameters

  • locatation
  • scale
  • shape
The simplest distribution function is  Gumbel, for which is only 2 parameters.
When the Gumbel df. is a good fit, the estimate of the shape parameter should be approx. 0

Here shape is not close to 0.  Gumbel probably not good.

3 measures of the goodness of fit provided.

  • Negative Log-Likelihood Value 
  • AIC
  • BIC

Gumbel Run

fit0 <- fevd(TMX1, PORTw, type = "Gumbel", units = "deg C")
fit0
fevd(x = TMX1, data = PORTw, type = "Gumbel", units = "deg C")
[1] "Estimation Method used: MLE"

 Negative Log-Likelihood Value:  175.7782 

 Estimated parameters:
 location     scale
14.799982  2.886128 
 
 Standard Error Estimates:
 location     scale
0.3709054 0.2585040 
 Estimated parameter covariance matrix.
           location      scale
location 0.13757080 0.03173887
scale    0.03173887 0.06682429
 AIC = 355.5563  
 BIC = 359.9954 
All the goodness of fit measure are all larger as expected.

lr.test Likelihood-Ratio Test

Use the likelihood ratio test to confirm this

lr.test(fit1,fit0)
Likelihood-ratio Test
data:  TMX1TMX1
Likelihood-ratio = 6.0711, chi-square critical value = 3.841, alpha = 0.050,
Degrees of Freedom = 1.000, p-value = 0.01374
          alternative hypothesis: greater 
I look at the p-value, because it makes sense to me.


Bayesian estimation 

> fB <- fevd(TMX1, PORTw, method = "Bayesian")
> fB
 
fevd(x = TMX1, data = PORTw, method = "Bayesian")
[1] "Estimation Method used: Bayesian"

 Acceptance Rates:
log.scale     shape
0.5379076 0.4971994 
fevd(x = TMX1, data = PORTw, method = "Bayesian")
[1] "Quantiles of MCMC Sample from Posterior Distribution"
               2.5% Posterior Mean         97.5%
location 14.2452034     15.1496558 16.0945492194
scale     2.5276179      3.0935399  3.8803429287
shape    -0.3654228     -0.1975376 -0.0004970114

 Estimated parameter covariance matrix.
              location    log.scale        shape
location   0.240496184  0.009813165 -0.015884315
log.scale  0.009813165  0.124166098 -0.013129196
shape     -0.015884315 -0.013129196  0.008814744
DIC =  1045.476 

L-moments 

fitLM <- fevd(TMX1, PORTw, method = "Lmoments")
fitLM

fevd(x = TMX1, data = PORTw, method = "Lmoments")
[1] "GEV  Fitted to  TMX1  of  PORTw data frame, using L-moments estimation."
  location      scale      shape
15.1775146  3.0286294 -0.2480594 

Significance of Co-variance Information

plot(PORTw$TMX1, PORTw$AOindex)


> fit2 <- fevd(TMX1, PORTw, location.fun = ~AOindex, units = "deg C")
> fit2

fevd(x = TMX1, data = PORTw, location.fun = ~AOindex, units = "deg C")
[1] "Estimation Method used: MLE"

 Negative Log-Likelihood Value:  166.7992 

 Estimated parameters:
       mu0        mu1      scale      shape
15.2538412  1.1518782  2.6809613 -0.1812824 
 Standard Error Estimates:
       mu0        mu1      scale      shape
0.35592663 0.31800904 0.24186870 0.06725912 

 Estimated parameter covariance matrix.
               mu0          mu1        scale        shape
mu0    0.126683767  0.002230374  0.010009100 -0.008065698
mu1    0.002230374  0.101129752 -0.002538585  0.002075487
scale  0.010009100 -0.002538585  0.058500466 -0.007020374
shape -0.008065698  0.002075487 -0.007020374  0.004523789
 AIC = 341.5984  
 BIC = 350.4764 
What is going on here?
location parameterisation now has 2 parts mu0 and mu1


mu(x) = mu0 + mu1 * x,   where x = AO Index
lr-test
All the goodness of fit measure are lower when AO Index is consider
Expect utilises the AO index information improves the fit of the model
lr.test(fit1,fit2)
Likelihood-ratio Test
data:  TMX1TMX1
Likelihood-ratio = 11.8869, chi-square critical value = 3.841, alpha = 0.050,
Degrees of Freedom = 1.000, p-value = 0.0005653

p-value very small, result as expected

Is the covariance information significant to the scale parameter?

fit3 <- fevd(TMX1, PORTw, location.fun= ~AOindex, scale.fun = ~AOindex, units = "deg C")
> fit3
fevd(x = TMX1, data = PORTw, location.fun = ~AOindex, scale.fun = ~AOindex,
    units = "deg C")
[1] "Estimation Method used: MLE"

 Negative Log-Likelihood Value:  166.6593 

 Estimated parameters:
       mu0        mu1     sigma0     sigma1      shape
15.2616373  1.1802783  2.6782395 -0.1464682 -0.1860603 
 Standard Error Estimates:
       mu0        mu1     sigma0     sigma1      shape
0.35610192 0.31724472 0.24242220 0.27629143 0.06862845 

 Estimated parameter covariance matrix.
                mu0           mu1        sigma0       sigma1         shape
mu0     0.126808580 -0.0085301401  0.0098703920 -0.001720545 -0.0084519671
mu1    -0.008530140  0.1006442141 -0.0001225007 -0.014111326  0.0004891586
sigma0  0.009870392 -0.0001225007  0.0587685234 -0.005405442 -0.0073208829
sigma1 -0.001720545 -0.0141113262 -0.0054054420  0.076336953  0.0015286996
shape  -0.008451967  0.0004891586 -0.0073208829  0.001528700  0.0047098637
 AIC = 343.3185  
 BIC = 354.4161 

lr-test
All the goodness of fit measure are HIGHER when AO Index is consider for location and scale
than only location
lr.test(fit2,fit3)
Likelihood-ratio Test
data:  TMX1TMX1
 
Likelihood-ratio = 0.2798, chi-square critical value = 3.841, alpha = 0.050,
Degrees of Freedom = 1.000, p-value = 0.5968
alternative hypothesis: greater
p-value is large,

Thursday, September 11, 2014

Basic Time Series with R



Currently play with the Port Jervis, New York annual maximum and minimum winter temperature data, provided with the extRemes R package

Extreme value deal with rare events which are unlikely to follow patterns and assumptions common with complete event view, but standard initial analysis tools can still provide insight.

library(extremes)
data("PORTw")

plot(PORTw$TMX1, type = "l", xlab = "Year", ylab = "Maximum winter temperature", col = "red")
plot(PORTw$TMX0, type = "l", xlab = "Year", ylab = "Maximum winter temperature", col = "darkblue")


Visual inspection

  • Trend - none
  • Cycle - none
  • Clustering- none
  • Pair wise correlation - maybe


Both are non-cyclic, and can probably be described using an additive model, since the random fluctuations in the data are roughly constant in size over time:

Exponential Model

Lets try fitting a Simple Exponential Smoothing
convert to a time series object
> maxts <- ts(PORTw$TMX1, start=c(1927))
Fit  model  with no trend or cycle
> fit1 <- HoltWinters(maxts, beta=FALSE, gamma=FALSE)
> fit1
Holt-Winters exponential smoothing without trend and without seasonal component.
Call:
HoltWinters(x = maxts, beta = FALSE, gamma = FALSE)
Smoothing parameters:
 alpha: 0.1368945
 beta : FALSE
 gamma: FALSE
Coefficients:
      [,1]
a 16.49764
plot(fit1)
Simple measure of fit. sum-of-squared-errors 
fit1$SSE
[1] 763.3207 

ARIMA Model

Difference the times series
maxtsdiff <- diff(maxts, differences = 1)

acf(maxtsdiff, lag.max = 20)


Basic visual inspection, may something at 1 year lag.

 maxtsarima <- arima(maxts, order=c(0,1,1))
> maxtsarima
Series: maxts
ARIMA(0,1,1)                  
Coefficients:
          ma1
      -1.0000

s.e.   0.0532
sigma^2 estimated as 9.636:   
log likelihood=-173.07 
AIC=350.15   
AICc=350.33  

BIC=354.56 
And let R work out if and what order ARIMA would be appropriate

> mtsARIMA <- auto.arima(maxts)
> mtsARIMA
Series: maxts
ARIMA(0,0,0) with non-zero mean
Coefficients:
      intercept
        16.3154
s.e.     0.3737
sigma^2 estimated as 9.494:
 
log likelihood=-173.01
AIC=350.02  
AICc=350.21  
          BIC=354.46

No moving average happening

Pair Wise Correlation

Start with scatter plots




Check correlation coefficients

    > cor(PORTw$TMX1, PORTw$TMN0)
[1] 0.1802413> cor(PORTw$TMX1, PORTw$AOindex)[1] 0.3944286> cor(PORTw$TMN0, PORTw$AOindex)[1] 0.01206764> cor(PORTw$TMX1, PORTw$AOindex, method="kendall")[1] 0.3019692

Possibly something to work with Arctic Oscillation Index and Maximium winter temperature