---
title: "Climate Metrics from daily weather data"
---
[ The R Script associated with this page is available here](scripts/08_WeatherData.R). Download this file and open it (or copy-paste into a new script) with RStudio so you can follow along.
# Summary
* Access and work with station weather data from Global Historical Climate Network (GHCN)
* Explore options for plotting timeseries
* Trend analysis
* Compute Climate Extremes
# Climate Metrics
## Climate Metrics: ClimdEX
Indices representing extreme aspects of climate derived from daily data:
Climate Change Research Centre (CCRC) at University of New South Wales (UNSW) ([climdex.org](http://www.climdex.org)).
### 27 Core indices
For example:
* **FD** Number of frost days: Annual count of days when TN (daily minimum temperature) < 0C.
* **SU** Number of summer days: Annual count of days when TX (daily maximum temperature) > 25C.
* **ID** Number of icing days: Annual count of days when TX (daily maximum temperature) < 0C.
* **TR** Number of tropical nights: Annual count of days when TN (daily minimum temperature) > 20C.
* **GSL** Growing season length: Annual (1st Jan to 31st Dec in Northern Hemisphere (NH), 1st July to 30th June in Southern Hemisphere (SH)) count between first span of at least 6 days with daily mean temperature TG>5C and first span after July 1st (Jan 1st in SH) of 6 days with TG<5C.
* **TXx** Monthly maximum value of daily maximum temperature
* **TN10p** Percentage of days when TN < 10th percentile
* **Rx5day** Monthly maximum consecutive 5-day precipitation
* **SDII** Simple pricipitation intensity index
# Weather Data
### Climate Data Online

### GHCN

## Options for downloading data
### `FedData` package
* National Elevation Dataset digital elevation models (1 and 1/3 arc-second; USGS)
* National Hydrography Dataset (USGS)
* Soil Survey Geographic (SSURGO) database
* International Tree Ring Data Bank.
* *Global Historical Climatology Network* (GHCN)
### NOAA API

[National Climatic Data Center application programming interface (API)]( http://www.ncdc.noaa.gov/cdo-web/webservices/v2).
### `rNOAA` package
Handles downloading data directly from NOAA APIv2.
* `buoy_*` NOAA Buoy data from the National Buoy Data Center
* `ghcnd_*` GHCND daily data from NOAA
* `isd_*` ISD/ISH data from NOAA
* `homr_*` Historical Observing Metadata Repository
* `ncdc_*` NOAA National Climatic Data Center (NCDC)
* `seaice` Sea ice
* `storm_` Storms (IBTrACS)
* `swdi` Severe Weather Data Inventory (SWDI)
* `tornadoes` From the NOAA Storm Prediction Center
---
### Libraries
```r
library(raster)
library(sp)
library(rgdal)
library(ggplot2)
library(ggmap)
library(dplyr)
library(tidyr)
library(maps)
library(scales)
# New Packages
library(rnoaa)
library(climdex.pcic)
library(zoo)
library(reshape2)
library(broom)
```
### Station locations
Download the GHCN station inventory with `ghcnd_stations()`.
```r
datadir="data"
st = ghcnd_stations()
## Optionally, save it to disk
# write.csv(st,file.path(datadir,"st.csv"))
## If internet fails, load the file from disk using:
# st=read.csv(file.path(datadir,"st.csv"))
```
### GHCND Variables
5 core values:
* **PRCP** Precipitation (tenths of mm)
* **SNOW** Snowfall (mm)
* **SNWD** Snow depth (mm)
* **TMAX** Maximum temperature
* **TMIN** Minimum temperature
And ~50 others! For example:
* **ACMC** Average cloudiness midnight to midnight from 30-second ceilometer
* **AWND** Average daily wind speed
* **FMTM** Time of fastest mile or fastest 1-minute wind
* **MDSF** Multiday snowfall total
### `filter()` to temperature and precipitation
```r
st=dplyr::filter(st,element%in%c("TMAX","TMIN","PRCP"))
```
### Map GHCND stations
First, get a global country polygon
```r
worldmap=map_data("world")
```
Plot all stations:
```r
ggplot(data=st,aes(y=latitude,x=longitude)) +
facet_grid(element~.)+
annotation_map(map=worldmap,size=.1,fill="grey",colour="black")+
geom_point(size=.1,col="red")+
coord_equal()
```

It's hard to see all the points, let's bin them...
```r
ggplot(st,aes(y=latitude,x=longitude)) +
annotation_map(map=worldmap,size=.1,fill="grey",colour="black")+
facet_grid(element~.)+
stat_bin2d(bins=100)+
scale_fill_distiller(palette="YlOrRd",trans="log",direction=-1,
breaks = c(1,10,100,1000))+
coord_equal()
```

## Your turn
Produce a binned map (like above) with the following modifications:
* include only stations with a data record that starts before 1950 and ends after 2000 (keeping only complete records during that time).
* include only `tmax`
## Download daily data from GHCN
`ghcnd()` will download a `.dly` file for a particular station. But how to choose?
### `geocode` in ggmap package useful for geocoding place names
Geocodes a location (find latitude and longitude) using either (1) the Data Science Toolkit (http://www.datasciencetoolkit.org/about) or (2) Google Maps.
```r
geocode("University at Buffalo, NY")
```
```
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=University%20at%20Buffalo%2C%20NY
```
```
## lon lat
## 1 -78.78897 43.00081
```
However, you have to be careful:
```r
geocode("My Grandma's house")
```
```
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=My%20Grandma%27s%20house
```
```
## lon lat
## 1 -97.18744 32.82241
```
But this is pretty safe for well known places.
```r
coords=as.matrix(geocode("Buffalo, NY"))
```
```
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Buffalo%2C%20NY
```
```r
coords
```
```
## lon lat
## [1,] -78.87837 42.88645
```
Now use that location to spatially filter stations with a rectangular box.
```r
dplyr::filter(st,
grepl("BUFFALO",name)&
between(latitude,coords[2]-1,coords[2]+1) &
between(longitude,coords[1]-1,coords[1]+1)&
element=="TMAX")
```
```
## # A tibble: 3 x 11
## id latitude longitude elevation state name gsn_flag wmo_id element
##
## 1 USC0… 42.9 -78.9 -1000. NY BUFF… "" "" TMAX
## 2 USC0… 42.9 -78.9 177. NY BUFF… "" "" TMAX
## 3 USW0… 42.9 -78.7 211. NY BUFF… "" HCN 7… TMAX
## # ... with 2 more variables: first_year , last_year
```
You could also spatially filter using `over()` in sp package...
With the station ID, we can now download daily data from NOAA.
```r
d=meteo_tidy_ghcnd("USW00014733",
var = c("TMAX","TMIN","PRCP"),
keep_flags=T)
head(d)
```
```
## # A tibble: 6 x 14
## id date mflag_prcp mflag_tmax mflag_tmin prcp qflag_prcp
##
## 1 USW0… 1938-05-01 T " " " " 0 " "
## 2 USW0… 1938-05-02 T " " " " 0 " "
## 3 USW0… 1938-05-03 " " " " " " 25 " "
## 4 USW0… 1938-05-04 " " " " " " 112 " "
## 5 USW0… 1938-05-05 T " " " " 0 " "
## 6 USW0… 1938-05-06 " " " " " " 64 " "
## # ... with 7 more variables: qflag_tmax , qflag_tmin ,
## # sflag_prcp , sflag_tmax , sflag_tmin , tmax ,
## # tmin
```
See [CDO Daily Description](http://www1.ncdc.noaa.gov/pub/data/cdo/documentation/GHCND_documentation.pdf) and raw [GHCND metadata](http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt) for more details. If you want to download multiple stations at once, check out `meteo_pull_monitors()`
### Quality Control: MFLAG
Measurement Flag/Attribute
* **Blank** no measurement information applicable
* **B** precipitation total formed from two twelve-hour totals
* **H** represents highest or lowest hourly temperature (TMAX or TMIN) or average of hourly values (TAVG)
* **K** converted from knots
* ...
See [CDO Description](http://www1.ncdc.noaa.gov/pub/data/cdo/documentation/GHCND_documentation.pdf)
### Quality Control: QFLAG
* **Blank** did not fail any quality assurance check
* **D** failed duplicate check
* **G** failed gap check
* **K** failed streak/frequent-value check
* **N** failed naught check
* **O** failed climatological outlier check
* **S** failed spatial consistency check
* **T** failed temporal consistency check
* **W** temperature too warm for snow
* ...
See [CDO Description](http://www1.ncdc.noaa.gov/pub/data/cdo/documentation/GHCND_documentation.pdf)
### Quality Control: SFLAG
Indicates the source of the data...
## Summarize QC flags
Summarize the QC flags. How many of which type are there? Should we be more conservative?
```r
table(d$qflag_tmax)
```
```
##
## G I
## 29027 2 10
```
```r
table(d$qflag_tmin)
```
```
##
## G I S
## 29026 2 7 4
```
```r
table(d$qflag_prcp)
```
```
##
## G
## 29038 1
```
* **T** failed temporal consistency check
#### Filter with QC data and change units
```r
d_filtered=d%>%
mutate(tmax=ifelse(qflag_tmax!=" "|tmax==-9999,NA,tmax/10))%>% # convert to degrees C
mutate(tmin=ifelse(qflag_tmin!=" "|tmin==-9999,NA,tmin/10))%>% # convert to degrees C
mutate(prcp=ifelse(qflag_tmin!=" "|prcp==-9999,NA,prcp))%>% # convert to degrees C
arrange(date)
```
Plot temperatures
```r
ggplot(d_filtered,
aes(y=tmax,x=date))+
geom_line(col="red")
```
```
## Warning: Removed 11 rows containing missing values (geom_path).
```

Limit to a few years and plot the daily range and average temperatures.
```r
d_filtered_recent=filter(d_filtered,date>as.Date("2013-01-01"))
ggplot(d_filtered_recent,
aes(ymax=tmax,ymin=tmin,x=date))+
geom_ribbon(col="grey",fill="grey")+
geom_line(aes(y=(tmax+tmin)/2),col="red")
```
```
## Warning: Removed 11 rows containing missing values (geom_path).
```

### Zoo package for rolling functions
Infrastructure for Regular and Irregular Time Series (Z's Ordered Observations)
* `rollmean()`: Rolling mean
* `rollsum()`: Rolling sum
* `rollapply()`: Custom functions
Use rollmean to calculate a rolling 60-day average.
* `align` whether the index of the result should be left- or right-aligned or centered
```r
d_rollmean = d_filtered_recent %>%
arrange(date) %>%
mutate(tmax.60 = rollmean(x = tmax, 60, align = "center", fill = NA),
tmax.b60 = rollmean(x = tmax, 60, align = "right", fill = NA))
```
```r
d_rollmean%>%
ggplot(aes(ymax=tmax,ymin=tmin,x=date))+
geom_ribbon(fill="grey")+
geom_line(aes(y=(tmin+tmax)/2),col=grey(0.4),size=.5)+
geom_line(aes(y=tmax.60),col="red")+
geom_line(aes(y=tmax.b60),col="darkred")
```
```
## Warning: Removed 11 rows containing missing values (geom_path).
```
```
## Warning: Removed 70 rows containing missing values (geom_path).
## Warning: Removed 70 rows containing missing values (geom_path).
```

## Your Turn
Plot a 30-day rolling "right" aligned sum of precipitation.