Monday, September 15, 2014

Trying out ExtRemes R package : Part 1 - Block Maxima Approach


I used http://www.ral.ucar.edu/~ericg/extRemes/extRemes2.pdf to try out "extRemes" R package.
Below is my notes and thoughts.
On my desk was  a copy of
Coles, S. (2001). An introduction to statistical modeling of extreme values / Stuart Coles, London ; New York : Springer, [2001].


Fitting the Generalized Extreme Value distribution function  to the Port Jervis, New York annual maximum winter temperature data using "extRemes" R package

library(extremes) 
data("PORTw") 
plot(PORTw$TMX1, type = "l", xlab = "Year", ylab = "Maximum winter temperature", col = "darkblue")

Default Run

fit1 <- fevd(TMX1, PORTw, units = "deg C") 
fit1

Output

fevd(x = TMX1, data = PORTw, units = "deg C")
[1] "Estimation Method used: MLE"

 Negative Log-Likelihood Value:  172.7426 

 Estimated parameters:  location      scale      shape 15.1406132  2.9724952 -0.2171486  

 Standard Error Estimates:  location      scale      shape 0.39745119 0.27521741 0.07438302  

 Estimated parameter covariance matrix.            location       scale        shapelocation  0.15796745  0.01028664 -0.010869596scale     0.01028664  0.07574462 -0.010234809shape    -0.01086960 -0.01023481  0.005532834 

 AIC = 351.4853  

 BIC = 358.1438 

GEV model has 3 parameters

  • locatation
  • scale
  • shape
The simplest distribution function is  Gumbel, for which is only 2 parameters.
When the Gumbel df. is a good fit, the estimate of the shape parameter should be approx. 0

Here shape is not close to 0.  Gumbel probably not good.

3 measures of the goodness of fit provided.

  • Negative Log-Likelihood Value 
  • AIC
  • BIC

Gumbel Run

fit0 <- fevd(TMX1, PORTw, type = "Gumbel", units = "deg C")
fit0
fevd(x = TMX1, data = PORTw, type = "Gumbel", units = "deg C")
[1] "Estimation Method used: MLE"

 Negative Log-Likelihood Value:  175.7782 

 Estimated parameters:
 location     scale
14.799982  2.886128 
 
 Standard Error Estimates:
 location     scale
0.3709054 0.2585040 
 Estimated parameter covariance matrix.
           location      scale
location 0.13757080 0.03173887
scale    0.03173887 0.06682429
 AIC = 355.5563  
 BIC = 359.9954 
All the goodness of fit measure are all larger as expected.

lr.test Likelihood-Ratio Test

Use the likelihood ratio test to confirm this

lr.test(fit1,fit0)
Likelihood-ratio Test
data:  TMX1TMX1
Likelihood-ratio = 6.0711, chi-square critical value = 3.841, alpha = 0.050,
Degrees of Freedom = 1.000, p-value = 0.01374
          alternative hypothesis: greater 
I look at the p-value, because it makes sense to me.


Bayesian estimation 

> fB <- fevd(TMX1, PORTw, method = "Bayesian")
> fB
 
fevd(x = TMX1, data = PORTw, method = "Bayesian")
[1] "Estimation Method used: Bayesian"

 Acceptance Rates:
log.scale     shape
0.5379076 0.4971994 
fevd(x = TMX1, data = PORTw, method = "Bayesian")
[1] "Quantiles of MCMC Sample from Posterior Distribution"
               2.5% Posterior Mean         97.5%
location 14.2452034     15.1496558 16.0945492194
scale     2.5276179      3.0935399  3.8803429287
shape    -0.3654228     -0.1975376 -0.0004970114

 Estimated parameter covariance matrix.
              location    log.scale        shape
location   0.240496184  0.009813165 -0.015884315
log.scale  0.009813165  0.124166098 -0.013129196
shape     -0.015884315 -0.013129196  0.008814744
DIC =  1045.476 

L-moments 

fitLM <- fevd(TMX1, PORTw, method = "Lmoments")
fitLM

fevd(x = TMX1, data = PORTw, method = "Lmoments")
[1] "GEV  Fitted to  TMX1  of  PORTw data frame, using L-moments estimation."
  location      scale      shape
15.1775146  3.0286294 -0.2480594 

Significance of Co-variance Information

plot(PORTw$TMX1, PORTw$AOindex)


> fit2 <- fevd(TMX1, PORTw, location.fun = ~AOindex, units = "deg C")
> fit2

fevd(x = TMX1, data = PORTw, location.fun = ~AOindex, units = "deg C")
[1] "Estimation Method used: MLE"

 Negative Log-Likelihood Value:  166.7992 

 Estimated parameters:
       mu0        mu1      scale      shape
15.2538412  1.1518782  2.6809613 -0.1812824 
 Standard Error Estimates:
       mu0        mu1      scale      shape
0.35592663 0.31800904 0.24186870 0.06725912 

 Estimated parameter covariance matrix.
               mu0          mu1        scale        shape
mu0    0.126683767  0.002230374  0.010009100 -0.008065698
mu1    0.002230374  0.101129752 -0.002538585  0.002075487
scale  0.010009100 -0.002538585  0.058500466 -0.007020374
shape -0.008065698  0.002075487 -0.007020374  0.004523789
 AIC = 341.5984  
 BIC = 350.4764 
What is going on here?
location parameterisation now has 2 parts mu0 and mu1


mu(x) = mu0 + mu1 * x,   where x = AO Index
lr-test
All the goodness of fit measure are lower when AO Index is consider
Expect utilises the AO index information improves the fit of the model
lr.test(fit1,fit2)
Likelihood-ratio Test
data:  TMX1TMX1
Likelihood-ratio = 11.8869, chi-square critical value = 3.841, alpha = 0.050,
Degrees of Freedom = 1.000, p-value = 0.0005653

p-value very small, result as expected

Is the covariance information significant to the scale parameter?

fit3 <- fevd(TMX1, PORTw, location.fun= ~AOindex, scale.fun = ~AOindex, units = "deg C")
> fit3
fevd(x = TMX1, data = PORTw, location.fun = ~AOindex, scale.fun = ~AOindex,
    units = "deg C")
[1] "Estimation Method used: MLE"

 Negative Log-Likelihood Value:  166.6593 

 Estimated parameters:
       mu0        mu1     sigma0     sigma1      shape
15.2616373  1.1802783  2.6782395 -0.1464682 -0.1860603 
 Standard Error Estimates:
       mu0        mu1     sigma0     sigma1      shape
0.35610192 0.31724472 0.24242220 0.27629143 0.06862845 

 Estimated parameter covariance matrix.
                mu0           mu1        sigma0       sigma1         shape
mu0     0.126808580 -0.0085301401  0.0098703920 -0.001720545 -0.0084519671
mu1    -0.008530140  0.1006442141 -0.0001225007 -0.014111326  0.0004891586
sigma0  0.009870392 -0.0001225007  0.0587685234 -0.005405442 -0.0073208829
sigma1 -0.001720545 -0.0141113262 -0.0054054420  0.076336953  0.0015286996
shape  -0.008451967  0.0004891586 -0.0073208829  0.001528700  0.0047098637
 AIC = 343.3185  
 BIC = 354.4161 

lr-test
All the goodness of fit measure are HIGHER when AO Index is consider for location and scale
than only location
lr.test(fit2,fit3)
Likelihood-ratio Test
data:  TMX1TMX1
 
Likelihood-ratio = 0.2798, chi-square critical value = 3.841, alpha = 0.050,
Degrees of Freedom = 1.000, p-value = 0.5968
alternative hypothesis: greater
p-value is large,

Thursday, September 11, 2014

Basic Time Series with R



Currently play with the Port Jervis, New York annual maximum and minimum winter temperature data, provided with the extRemes R package

Extreme value deal with rare events which are unlikely to follow patterns and assumptions common with complete event view, but standard initial analysis tools can still provide insight.

library(extremes)
data("PORTw")

plot(PORTw$TMX1, type = "l", xlab = "Year", ylab = "Maximum winter temperature", col = "red")
plot(PORTw$TMX0, type = "l", xlab = "Year", ylab = "Maximum winter temperature", col = "darkblue")


Visual inspection

  • Trend - none
  • Cycle - none
  • Clustering- none
  • Pair wise correlation - maybe


Both are non-cyclic, and can probably be described using an additive model, since the random fluctuations in the data are roughly constant in size over time:

Exponential Model

Lets try fitting a Simple Exponential Smoothing
convert to a time series object
> maxts <- ts(PORTw$TMX1, start=c(1927))
Fit  model  with no trend or cycle
> fit1 <- HoltWinters(maxts, beta=FALSE, gamma=FALSE)
> fit1
Holt-Winters exponential smoothing without trend and without seasonal component.
Call:
HoltWinters(x = maxts, beta = FALSE, gamma = FALSE)
Smoothing parameters:
 alpha: 0.1368945
 beta : FALSE
 gamma: FALSE
Coefficients:
      [,1]
a 16.49764
plot(fit1)
Simple measure of fit. sum-of-squared-errors 
fit1$SSE
[1] 763.3207 

ARIMA Model

Difference the times series
maxtsdiff <- diff(maxts, differences = 1)

acf(maxtsdiff, lag.max = 20)


Basic visual inspection, may something at 1 year lag.

 maxtsarima <- arima(maxts, order=c(0,1,1))
> maxtsarima
Series: maxts
ARIMA(0,1,1)                  
Coefficients:
          ma1
      -1.0000

s.e.   0.0532
sigma^2 estimated as 9.636:   
log likelihood=-173.07 
AIC=350.15   
AICc=350.33  

BIC=354.56 
And let R work out if and what order ARIMA would be appropriate

> mtsARIMA <- auto.arima(maxts)
> mtsARIMA
Series: maxts
ARIMA(0,0,0) with non-zero mean
Coefficients:
      intercept
        16.3154
s.e.     0.3737
sigma^2 estimated as 9.494:
 
log likelihood=-173.01
AIC=350.02  
AICc=350.21  
          BIC=354.46

No moving average happening

Pair Wise Correlation

Start with scatter plots




Check correlation coefficients

    > cor(PORTw$TMX1, PORTw$TMN0)
[1] 0.1802413> cor(PORTw$TMX1, PORTw$AOindex)[1] 0.3944286> cor(PORTw$TMN0, PORTw$AOindex)[1] 0.01206764> cor(PORTw$TMX1, PORTw$AOindex, method="kendall")[1] 0.3019692

Possibly something to work with Arctic Oscillation Index and Maximium winter temperature 

Sunday, August 24, 2014

Update QGIS from 2.2.0-8 to 2.4.0-1


Updating QGIS on Mac OSX 10.9.4

To update my QGIS need to download 2 packages

http://www.kyngchaos.com/software/qgis

http://www.kyngchaos.com/software/frameworks#gdal_complete

I deleted the old QGIS from the application  directory.

I investigated updating GDAL to GDAL 1.11 using anaconda, but

 the following combinations of packages create a conflict with the
remaining packages:
  - python 2.7*
  - gdal 1.11*

So I installed GDAL 1.11 using
http://www.kyngchaos.com/software/frameworks#gdal_complete

Then installed QGIS 2.4.0

Friday, August 15, 2014

Days since 0001-01-01

Viewing output from CSIRO ACCESS 1.3 CMIP5 run,  I saw the time units were 'days since 0001-01-01' the coder in me had heart palpitations.

The joys of Julian Calendars, leap years,  does anyone know how many days between now and then.

Looking at a ACCESS Port Processing script
https://trac.nci.org.au/trac/access_tools/browser/app/trunk/app.py?rev=85

refdate are 1 (0001-01-01) and 719 163 (1970-01-01)

As I am currently just mucking around, I am just glad that someone knows how many days has passed since January 1, Year 1.

Now I can transform to my frame of reference more readily.

Sunday, June 15, 2014

Climate Change in 4 Dimensions


From April 8 2104 till June 17 I participated in online course

www.coursera.org

Climate Change in 4 Dimensions
University of San Diego.

Overal I learnt a lot about climate change beyond the science aspects

During the course I kept notes to provide some feedback and ideas for improvement.
I captured this in this post.

Lecture 5

Early part is very US centric 

Global Average Warming Chart after Ocean Acidification and Coral , the text is to small to read.

Lecture 6

Very informative and enjoyable, Professor David G. Victor engaged me well.Introduced Stock Problem Economic model to me. Already introducing new ideas to my thoughts


Lecture 9


Weekly activity
Survey is USA oriented. Of course the US should act independent of other countries. The question for me is should Australia act independent of other countries.

This weeks graded quiz was not up to scratch. Usually only one or none of the question are badly worded, easily to be misunderstood, or difficult to attribute to any of the reading or lectures. This week multiple questions including.
Question 8 , 9. difficult to understand/read. 
"Heat waves and heavy precipitation" is not a trend.    

Lecture 10

Feedback: 
Slide from Rogell, Meinshausen 2012 is out of date, wrong name inconsistent with other slides. 
Chaparral is not a common international word, and is not what would grow in Australia. 
Question in the ungraded quiz unclear
Based on global climate models, what regions will see the biggest precipitation increase with global warming?
This question is about North America, Not clear

Concepts

Uses indices to show warming particular in USA, definitely regional/spatial
Indices

  • Mean annual temperature
  • Number of cold days
  • Number of hot days
  • Agricultural Based - frost, super hot days, frost free (growing season) forecast.
    • Keeling curve shows growing season has increased by about 10 days, in agreement with ag-models
Heat Wave

  • Multiple factors some of which more common due to CC
  • Europe 2003, 2006, Australia 2009 (Karoly), Russia 2010, USA 2009


Models don't seem to capture very cold days. 

Midterm

Question 7 unclear
If someone makes a prediction, and it comes true, does that mean that the prediction is correct?
does it mean hypothesis or prediction is true?

Question 9 
Debatable if one or more answers are correct

Week 8 

Lecture 14 
Ice, snow and water
Does NOT mention ice West antarctica Ice Shelf info?
Projections continue to change with increasing understanding?
Parts of the lecture are dating quickly.

Lecture 15
Very directed to a Californian Audience. The level of knowledge about geography of California was assumed to be higher than my own, though I have travelled there.  Think this would exclude some of the coursera audience.

Quiz  Question 9
"Please fill in the blank: By 2050, the number of ‘extremely hot’ days could increase ________."
This statement was not in the lecture notes, so how does one know what location of study to cite? Does it it relate to global, USA, California, San Diego, or Australia extremely hot days.?

Lecture 19

Spelling mistake in the "Check your knowledge quiz" question 4
Arpanet

Sunday, May 25, 2014

Thoughts on Pattern Scaling

During the last week I have been consider ideas put forward at the

Pattern Scaling, Climate Model Emulators and their Application to the new Scenario Process
NCAR, Boulder Colorado, April 23-25 2014

and

Lopez et. la. 2013 Robustness of pattern scaled climate change scenarios for adaption decision support

Major Aims



  • "Fit empirical / statistical relation b/w impact relevant climate variables and large scale quantities obtainable through simple models"
  • "Run simple models under arbitrary scenarios and recover impact relevant outcomes by applying those relations"



  • User Needs
    • Impact research
    • policy makers
    • Social Economic
    • higher resolution

  • Uncertainty
    • Handling
    • Quantitating 
    • sufficiently low uncertainty for outcome information produced to be useful.


Standard Pattern Scaling

  • Developed, tested and applied for 20 years
  • provide a simplified representation of climate system responses.

    • local (or regional) changes in these variables tend to increase linearly with the global warming over the coming century.
    1. local change can be seen as a 'response' to the global warming (GW)

    critical assumption is the there is linear relationship b/w a scalar, and a geographical response pattern

Flaws / Concerns

  • main climate mechanisms are not linear
  • feedback
  • timescale in response change
  • patterns evolve

Uncertainty 

Uncertainty hard to capture
  • model uncertainty
    • Multimodel ensemble can reduce uncertainty
  • scenario uncertainty
  • depend on statistical assumption
  • analysis of variance 
    • map std dev.

Wednesday, May 14, 2014

NCL Regional Temperature Anomolies



Data
HadCRUT 4
Surface Temperature Anomalies (C with respect to 1961-1990)

Choose a region

Getting my regions in a reusable, correctly projected, and regridded  was more difficult than I thought it would be.
Publicly available global region based netCDF files are not common
In the end, I used a shape file to create a netCDF with a single time dimension, and  integer layer of region ids at each 1 degree

Australia

Just going with continents at this stage and Australia is the country I was born.


Start with a single day.
; read only desired  time 1970
 x  = fin->temperature_anomaly(1440,:,:)

 xr = where(region.eq.2,x,x@_FillValue)

 print(avg(xr))

Then time series for a single year. Oct 2012 to Sept 2013 

; Time Series average for Australia 2013
 x1=fin->temperature_anomaly(1953:1964,:,:)
 rconform = conform(x1, region, (/1,2/))
 xr1 =mask(x1, rconform, 2)

 xa1 = dim_avg_n(xr1,(/1 , 2/))
 t1 = ispan(0,12,1)

 wks   = gsn_open_wks ("x11","xy")                ; open workstation
 res                  = True                     ; plot mods desired
 res@tiMainString     = "Average Australian Temperature Anomaly 2013" ;


Result similar to BOM 


Then annual time series for a decade or so

 yStart = 1990
 yEnd = 2013
 tStart  = (yStart - T_OFFSET ) * 12;
 tEnd = (yEnd - T_OFFSET) * 12 + 11
 x2=fin->temperature_anomaly(tStart:tEnd,:,:)
    
 rconform2 = conform(x2, region, (/1,2/))
 xr2 =mask(x2, rconform2, 2)
 copy_VarCoords(x2, xr2) ; need dim metadata retained for clim functions
 xa2 = dim_avg_n(xr2,(/1 , 2/))
  xannual = month_to_annual(xa2, 1)  ;Annual Average Temperautre             

  printVarSummary(xannual)  

 wks   = gsn_open_wks ("ps","xy")                ; open workstation
 res                  = True                     ; plot mods desired
 res@tiMainString     = "Annual Mean  Australian Temperature  1990-2013" ;
 res@tiYAxisString = "Anomalies" ; y-axis label      

 res@gsnYRefLine           = 0.              ; reference line   
 res@gsnXYBarChart         = True            ; create bar chart 
 res@gsnAboveYRefLineColor = "red"           ; above ref line fill red
 res@gsnBelowYRefLineColor = "blue"          ; below ref line fill blue

res@tiXAxisString = "Year"

 plot  = gsn_csm_xy (wks,ispan(yStart,yEnd,1),xannual,res) ; create plot  


Then summer time series for a decade

 ; Summer Time Average Temperature anomaly for 2000's
 yStart = 2000
 yEnd = 2013
 tStart  = (yStart - T_OFFSET ) * 12 ;
 tEnd = (yEnd - T_OFFSET) * 12 + 11
 x3=fin->temperature_anomaly(tStart:tEnd,:,:)
    
 rconform3 = conform(x3, region, (/1,2/)) ;Create Mask grid with time dim of data
 xr3 =mask(x3, rconform3, 2) ; mask out data not in region 2 , Australia
 copy_VarCoords(x3, xr3) ; need dim metadata retained for clim functions

 xseasonal = month_to_seasonN(xr3, (/ "DJF"/))  ; Seasonal Average Temperautre       
 printVarSummary(xseasonal)  

 xa3 = dim_avg_n(xseasonal,(/2 , 3/)) ; Average across  long and lat dimension
                                ; average across region as all other values masked 
 copy_VarCoords_2(xseasonal,xa3)
 printVarSummary(xa3)

 print(xa3)

 wks   = gsn_open_wks ("x11","xy")                ; open workstation
 res                  = True                     ; plot mods desired
 res@tiMainString     = "Summer Mean  Australian Temperature  1990-2013" ;
 res@tiYAxisString = "Anomalies" ; y-axis label      

 res@gsnYRefLine           = 0.              ; reference line   
 res@gsnXYBarChart         = True            ; create bar chart 
 res@gsnAboveYRefLineColor = "red"           ; above ref line fill red
 res@gsnBelowYRefLineColor = "blue"          ; below ref line fill blue

res@tiXAxisString = "Year"

 plot  = gsn_csm_xy (wks,ispan(yStart,yEnd,1),xa3(0,:),res) ; create plot 


Happy with this time to move on

Thursday, May 8, 2014

Install CDO


Mac OSX Mavericks

Download https://code.zmaw.de/projects/cdo/files

In terminal

gunzip cdo-current.tar.gz

tar -xf cdo-current.tar

cd cdo-1.6.4r5

./configure

make

sudo make install

Tuesday, May 6, 2014

Projections and Gridding

Today I have been thinking about projections

CMIP5 is netCDF the projection is provided in the Metadata, resolution 0.5° x 0.5°

I want to perform polygon regions analysis.

I need the 'masking' of the CMIP5 to be efficient. Visualisation is not a concern.

What file format for storing polygons should I use ?

  • projection 
  • resolution
I have been looking at different programming languages, particularly R and NCL.

NCL is built to handle  netCDF data efficiently.
But I am having trouble grasping how I can use my polygons efficiently. The polygons are initially shapefiles. The example polygon code focuses on using polygons in visualisation. The example shapefile masking code seems very inefficient. 

Focusing on NCL

If I know CMIP5 resolution and projection I should be able to create netCDF file with a layer / a variable / or slot in array -  per region , with binary values.

Max and Min Latitude of the region could be used to reduce data extract from the netCDF

; read only desired area & times
x     = in->SST(tStrt:tLast,{latS:latN},{lonL:lonR})

Merge / AND / if / where
@FillValue is kind of like null for this variable, ignored by many functions.
  if(.not.isatt(data,"_FillValue")) then
    data@_FillValue = default_fillvalue(typeof(data))          ;-- make sure "data" has a missing value
  end if

x is CMIP5 netCDF data
regions is all my polygons

 x and regions should have the same dimensions
 xr = where(regions.eq.17, x, x@_FillValue)
Or once I have a single binary object per region
 x = in->SST(tStrt:tLast,{r17.latS:r17.latN},{r17.lonL:r17.lonR})
 xr = where(r17, x, x@_FillValue)
There is memory concern here. 
I am going with the concept by region. that is 
For each region
  • For each netCDF file
    • Open netCDF file
    • Populate variable (X) with netCDF data by region and time period
    • Close netCDF file
    • Perform any calculation which will optimise memory footprint
      • x <- X
    • Delete X, keep derived data.
Still got the region shape file. Regrid to standard CMIP5

Ahhh there is no standard grid ding CMIP5
Initially going for bi-linearly interpolation  to 1x1 rectilinear grid 
Why Bilinear

Mora, C., et al. (2013). "Biotic and human vulnerability to projected changes in ocean biogeochemistry over the 21st century." PLoS biology 11(10): e1001682.

Why Rectilinear grid
  • Simple
  • Too many people think in rectangles
  • I don't like how the areas are so different
  • Consider variation at later stage
Why 1 x 1
  • Because resolution should be reasonable with all the CMIP5 0.5 degree data

Thursday, May 1, 2014

NCL Try outs


Installing NCL

Put this in my .bash_profile file:
# NCL additions
export NCARG_ROOT=/Volumes/Data/Users/fmacgill/Projects/ncl62/
export PATH=$NCARG_ROOT/bin:$PATH
Library Issue
Had issues with a library, solve by installing gcc47 through macports

http://www.ncl.ucar.edu/Download/macosx.shtml#libgomp

Put this in my .bash_profile file:
   export DYLD_FALLBACK_LIBRARY_PATH=/opt/local/lib/gcc47/

clmMonLLT

Decide to try out NCL with this example code.
Downloaded 
clim0_4.ncl
Downloaded 
xieArkin_T42.nc
Want to see output immediately so changed Line 31
wks   = gsn_open_wks("x11" ,"climo")        ; open ps file
So what going on

Wednesday, April 30, 2014

Initial Region Masking map

There are many ways to break up the global into regions.
Already spent hours trying to find ways of classification of create regions
Already spent hours trying to locate shape files, geoTiffs, or any GIS compatible sources

Waste of time, the data exists, others have it. Need to work out who to ask and gain the confident to ask.

Need to focus on the process

Josh O'Brien
So from http://stackoverflow.com/questions/20146809/how-can-i-plot-a-continents-map-with-r
SRES are IPCC Emission Scenario Regions


Same code tweaked.

library(rworldmap)
library(rgeos)

sPDF <- getMap()
sres <-
    sapply(levels(sPDF$SRES),
           FUN = function(i) {
               ## Merge polygons within a continent
               poly <- gUnionCascaded(subset(sPDF, SRES==i))
               ## Give each polygon a unique ID
               poly <- spChFIDs(poly, i)
               ## Make SPDF from SpatialPolygons object
               SpatialPolygonsDataFrame(poly,
                                        data.frame(SRES=i, row.names=i))
           },
           USE.NAMES=TRUE)

## Bind the 11 SRES-level SPDFs into a single SPDF
sres <- Reduce(spRbind, sres)

## Plot to check that it worked
plot(sres, col=heat.colors(nrow(sres)))

## Check that it worked by looking at the SPDF's data.frame
## (to which you can add attributes you really want to plot on)
data.frame(sres)

Save for future use. Use  .RData for faster loading in the future.
> class(sres)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
> proj4string(sres)
[1] "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"


> setwd("~/Projects/R/test")
> getwd()
[1] "/Volumes/Data/Users/fmacgill/Projects/R/test"

> sres <- data_name
> plot(sres, col=heat.colors(nrow(sres)))
> data_name <- sres
> save(data_name, file="SRESGlobal.RData" )

Clear Variables and check retrieval

> sres <- 1
> data_name <- 1
> load("SRESGlobal.RData")
> sres <- data_name
> plot(sres, col=heat.colors(nrow(sres)))

Monday, March 10, 2014

Download Data

Downloaded some simple time series data.

How do I keep the data, the source, and crap together?

Metadata


What Metadata do I need?

I don't know.

Starting with

  • source / url
  • units
  • location
  • type
  • startDate
  • endDate
  • increment
Going to save a matlab struct per time series. 
This is not going to be an optimal solution, but better than nothings, and better than doing nothing.

Python


I need to work with netCDF files

Currently I know Java, am familiar  with MATLAB

Java is clunking with numbers.
I have found MATLAB difficult to produce reusable code with.
R sounds appealing, but python has more support in the college, for netCDF and GIS applications

--> Introducing myself to python, 

  1. Download and install iPython
  2. Watch intro to iPython 
  3. Download zip from Unidata Training Workshop from GitHub
    https://github.com/Unidata/tds-python-workshop
  4. Unzip in directory
  5. Start ipython in this directory
    ipython notebook --pylab inline
This went nowhere.

Talked with my supervisors going elsewhere at the moment

Tuesday, February 25, 2014

Submitted Abstract to Conference



Today I submitted an abstract to a conference.

The research was from my masters. It felt good to have something out there even if it is not related to my current research.

Thursday, February 20, 2014

Wiki

Did a mini Wiki  Demo last week for supervisor.
Good support

Already in soft release. Multiple people using the 'Getting Started' Printer setup page.

Continuing to add pages and information for release to the group next week.

New Machine

Software

Matlab 2013b
Google Chrome

Google Drive
Evernote
Evernote Web Clipper for Chrome
Endnote V7
Spotify

Google Chrome

  • Linked mainly to my PhD Google+ account. 
  • Will know that I also have a Google Account
  • Will never be used for Banking, PayPal, government interactions.

Storage

Online Backup

One of the extra tasks I have taken on for the group is to look at storage and online backup. 


Storage

Document Storage


Options
1.     GoogleDrive / GoogleDoc through GoogleApps
2.     University hosted file sever costing 250  per 1Tb

Recommendation

1. GoogleDrive / GoogleDoc through GoogleApps             
·      Currently provided by GoogleDoc and Google Drive.
·      Everyone seems confortable with.
·      Main issue  is there is no encryption security provided

Large data set storage


Currently students using the largest Datasets. They have access to a Earth Science hosted database for their work. Leave for later date.

Version Control Repository

Why – good practice to store versions of software.

Options
1.     Git:
             GitHub
·      Uni has a evangelist, who would help with introduction
·      I have used it.
·      Private Repository hosted in the Cloud by reputable GitHub available to education groups free, but further investigation shows there are quite a few hoops
Bitbucket
·      Uni has a evangelist, who would help with introduction
·      BitBucket allows unlimited Private Repositories with up to 5 users
·      Other Departments use it.
2.     Subversion :
·      PIK uses it
·      I can administer it
·      We would need have a box,
o   Upfront costs
o   Location issues.

Recommendation
1.  BitBucket unless likely to have more than 5 users.


Wiki

            Why – great way to communicate with in the Phd group.
Particularly as a single point for new comers to get established, record any issue or work arounds
           
Recommendation

Google Sites included as part of GoogleApps Education


Laptop Backup


Important Consideration

·      Off Site
·      Cost
·      Size
·      Encryption Security
·      Longevity
·      Ease of Use
·      OS Support (Windows , Mac, Linux)
·      Organisation Solution

 Uni does not support or have a recommended solution

Dual solutions provide best disaster plan.
            Regular Manual Hard disk / USB backup
            Online Automated Backup
Reasonable bandwidth
Ability to turn off when in a low bandwidth situation.


Non Free Solutions
            SpiderOak     
~$5 per month per user  200Gb (e)
            iDrive             
from $25 per year for 150Gb (e) individuals
 also org. pricing need further investigation,
start with 5Gb free
            CrashPlan     
~ $5 per month per user 
                        need to investigate educational discounts
                       start with 2Gb free

           
Free

 

Mega

GoogleDrive

iDrive

Size

50Gb
30Gb
5

Encryption

No
No
Yes

Longevity

Unknown / Could be issue
Good
Good

Windows

yes
yes
Yes

OSx

no
yes
Yes

Linux

no
yes
Yes

Org Sol

no
somewhat
Upgrade

Location Choice

yes
no
Yes

 

Distributed Computing (CPU) power

Not Considered