PhD Days: September 2015

I am using http://www.ral.ucar.edu/~ericg/extRemes/extRemes2.pdf to try out "extRemes" R package.
Below is my notes and thoughts.
On my desk is a copy of
Coles, S. (2001). An introduction to statistical modeling of extreme values / Stuart Coles, London ; New York : Springer, [2001].

Hurricane Data from "extRemes" package
Estimated economic damage caused by U.S. hurricanes from 1926 to 1995

Note the cost are normalised.

data("damage")
par(mfrow=c(2,2))

plot(damage$Year, damage$Dam, xlab = "Year", ylab = "U.S. Hurricane Damage (billion USD)",cex = 0.75, col = "darkblue", bg = "lightblue", pch = 21)

plot(damage$Year, log(damage$Dam), xlab = "Year", ylab = "U.S. Hurricane Damage (billion USD)",cex = 0.75, col = "darkblue", bg = "lightblue", pch = 21)

Need to consider a threshold, balance between bias and variance

threshrange.plot(damage$Dam, r=c(1, 15), nint=20)

Having never seen a plot like before I look at

the size of uncertainty bound stabilising

for the reparameterized scale

10 billion too large

for shap

Just below 6 billion the uncertainty bounds appear to stabilise.

mrlplot(damage$Dam, xlim = c(0, 12))

I could form a straight line (linear) from the local minima within the CI bounds to 12 billion.

My own limited understanding of these plots agrees 6 billion may be a reasonable threshold.

Average number of hurricanes is 2.06 per year

fitD <- fevd(Dam, damage, threshold = 6, type = "GP", time.units = "2.06/year")

plot(fitD)

What can I see

the outlier Miami 1926.

This is right on the lower bound of the time series. Has the bound been choose on purpose?, Have the measurement tools changed?

studying extremes so elimination not options
population increase and spread to previously low population areas occurring
oil rigs

pextRemes(fitD, c(20, 40, 60, 100), lower.tail = FALSE)[1] 0.159285373 0.046886265 0.022234267 0.008513151

Katrina cost more than 40 billion (guesstimate)

Sandy cost more than 20 billion (guesstimate)

Lets extend the time series with more available data including the Galverston storms

Pielke et al. 1900-2005 Data

library(xlsx)
dat <- read.xlsx("public_data_may_2007-2.xls", sheetIndex=1 ,header = TRUE)

hur <- dat[,c(1,6)]
hur$damage <- hur$Normalized.PL05 / 100000000
hur <- hur[complete.cases(hur),]

plot(hur$Year, hur$damage, xlab = "Year", ylab = "U.S. Hurricane Damage (billion USD)",cex = 0.75, col = "darkblue", bg = "lightblue", pch = 21)

Galverston 1900, Miami 1926, and Katerina 2005 are the top three.

plot(hur$Year, log(hur$damage*1000), xlab = "Year", ylab = "U.S. Hurricane Damage (million USD)",cex = 0.75, col = "darkblue", bg = "lightblue", pch = 21)

The plot seems more dense in the recent past. Record keeping, population and what is considered a major event may have changed.

Threshold

There are 207 events, 12 above 20 Billion. A suitable threshold should be less than 20 billion.

mrlplot(hur$damage, xlim = c(0, 20))

Choosing a threshold of 8 Billion

Below 10 billion few differences in uncertainty of shape.
At 8 billion the reparameterised scale uncertainty is stabilising
Mean Excess plot is remarkable boring below 15 Billion.

Average number of hurricanes 207 / 105 = 1.97

fitD <- fevd(damage, hur, threshold = 8, type = "GP", time.units = "1.97/year"

plot(fitD)

pextRemes(fitD, c(20, 40, 60, 100), lower.tail = FALSE)

[1] 0.4800389 0.1962596 0.1009108 0.0382478

What can I see?

Miami 1926 is not as lonely.
The limits on all the axii are much greater.
All the points in the return period/return level plot are within the confidence intervals

What was changed in the data?

time period
number of large events
normalisation of damages

Want more insight into how these changes are reflected in results

Trimming to 1901-2004 would remove two major events
Trimming to 1926-1995 to remove normalisation of damages

Trim 1901-2004

hur2004 <- hur[which(hur$Year > 1900),]

hur2004 <- hur2004[which(hur$Year <2005),]

average hurricane per year 1.97

keep same threshold 8 billion

hur1901 <- hur[which(hur$Year > 1900),]
hur1901 <- hur1901[which(hur1901$Year < 2005),]
fit1901 <- fevd(damage, hur1901, threshold = 8, type = "GP", time.units = "1.97/year")

hur1926 <- hur[which(hur$Year > 1925),]
hur1926 <- hur1926[which(hur1926$Year < 1996),]
fit1926 <- fevd(damage, hur1926, threshold = 8, type = "GP", time.units = "1.97/year")

> pextRemes(fit1901, c(20, 40, 60, 100), lower.tail = FALSE)
[1] 0.44262508 0.16345777 0.07780206 0.02630498
> pextRemes(fit1926, c(20, 40, 60, 100), lower.tail = FALSE)
[1] 0.46102420 0.17505044 0.08403011 0.02836343
> pextRemes(fitD, c(20, 40, 60, 100), lower.tail = FALSE)
[1] 0.4800389 0.1962596 0.1009108 0.0382478

fit	return level	lower ci	approx	upper ci
1900-2005	20	25.01	42.15	59.28
1901-2004	20	20.95	35.05	49.15
1926-1995	20	20.84	38.644	56.45
1900-2005	100	26.77	102.70	178.63
1901-2004	100	25.96	82.19	138.41
1926-1995	100	20.41	88.80	157.18

Normalisation effected the axii.

Otherwise still feel insight low.

PhD Days

Tuesday, September 15, 2015

Trying out ExtRemes R package : Part 2 - Threshold Approach

Initial look at EVT Return Levels