Chapter 6 Working Climate data
The first step is to set your working directory, where your files are stored. You can do this from the toolbar tab session, and choose ‘Set Working Directory’ from the drop down menu, and navigate to your folder.

setwd
Or use the command setwd(enter your path here)
6.1 Historical climate data
6.1.1 Install Packages and load libraries
#install.packages("R.utils","rnaturalearth","reshape","raster",
#"magrittr","dplyr","lubridate")
library(R.utils)
library(rnaturalearth)
library(reshape)
library(raster)
library(magrittr)
library(dplyr)
library(lubridate)
6.1.2 Load data from cru website
- On your browser, navigate to (https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/cruts.2103051243.v4.05/). This is where all data variables are stored.
- Select the variable of your choice. In the example below, we will use precipitation and mean temperature data, variable names ‘pre’ & ‘tmp’ respectively.
- On the data folder, find data for the years 1901-2020 (cru_ts4.05.1901.2020.pre.dat.nc.gz/ru_ts4.05.1901.2020.tmp.dat.nc.gz).
- Right click on it and copy link address. Open a code chunk and paste your link in an object( as in ‘myurl’ in below code).
- Call the function
download.file
and enter details as below. - The files are stored as a compressed .gz file, to extract the files, call the function
gunzip
and enter file name with the .gz extension - Run code
# precipitation
#preurl<- "https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/
#cruts.2103051243.v4.05/pre/cru_ts4.05.1901.2020.pre.dat.nc.gz"
#download.file(preurl, destfile="cru_ts4.05.1901.2020.pre.dat.nc.gz")
#gunzip("cru_ts4.05.1901.2020.pre.dat.nc.gz")
# temperature
#tmpurl<- "https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/
#cruts.2103051243.v4.05/tmp/cru_ts4.05.1901.2020.tmp.dat.nc.gz"
#download.file(tmpurl, destfile="cru_ts4.05.1901.2020.tmp.dat.nc.gz")
#gunzip("cru_ts4.05.1901.2020.tmp.dat.nc.gz")
6.1.3 Load/define geometry
This is where data to define your area of interest goes.
You may upload your own geometry,such as shapefile or use any of the map services available in r. In this case we will use country boundaries from the package rnaturalearth
<-rnaturalearth::ne_countries(country ='malawi') malawi
6.1.4 Load/Import your data into r
Create new object name and call the raster function stack
Enter the path to your data directory/ where you saved the downloaded file.
To extract data exactly to your desired area on interest, use the raster functions crop
and mask
and the bounding geometry (in our case the Malawi national boundary)
<-stack("C:\\Users\\Makabe\\NAPdown\\cru_ts4.05.1901.2020.pre.dat.nc")
prstack<-stack("C:\\Users\\Makabe\\NAPdown\\cru_ts4.05.1901.2020.tmp.dat.nc")
temp
<-raster::crop(prstack, malawi) # cut out data for malawi
pr_crop<-raster::mask(pr_crop,malawi) # mask data to malawi boundary
pr_mask
<-raster::crop(temp, malawi)
tcrop<-raster::mask(tcrop, malawi) tmask
At this point you can already plot individual precipitation and temperature layers using the basic plot function. Below we plot precipitation and temperature for October 2011
par(mfrow=c(1,2))
plot(prstack$X2011.10.16, main='Precipitation')
plot(temp$X2011.10.16, main='Temperature')
6.1.5 Convert the raster stack created above into a dataframe and format date values
This step extracts raster values at each station (depicted by its latitude and longitude) and stores them as a table.
The melt
function converts the data from a wide to long table format.
The date values in our data are stored as and within text and the step below is to extract the dates from this text and store them in a new column.
We also, create two more columns to separate Year and Month
Caution!: The process in this code chunk involves modifying the data structure and thus extra attention is required. If you need to run the code or part of it more than once, it is strongly recommended to run the whole code chunk rather than in parts.This is to avoid duplication and conflict in executing commands.
<-as.data.frame(pr_mask, xy=TRUE, na.rm=TRUE)%>%
prdfmelt(id.vars=c('x','y')) # create dataframe
<-as.data.frame(tmask, xy=TRUE, na.rm=TRUE)%>%
tmpdfmelt(id.vars=c('x', 'y'))
<-substr(prdf$variable, 2,11) # extract date values from the dataframe
Date$Date<-Date # add date column to dataframe
prdf<-substr(Date,1,4)
Year<-substr(Date,6,7)
Month<-cbind(prdf, Year, Month)
prdf
colnames(prdf)[colnames(prdf)=="value"]<-"pr" # change column label
$Date<-substr(tmpdf$variable, 2,11)
tmpdf$Year<-substr(tmpdf$Date,1,4)
tmpdf$Month<-substr(tmpdf$Date,6,7)
tmpdf
colnames(tmpdf)[colnames(tmpdf)=="value"]<-"tmean"
6.1.6 Plot Monthly data
Our data is now ready for some statistical analysis. we may plot monthly data as below;
# summarise data by month and calculate monthly mean
<-prdf%>%dplyr::filter(Year>=1981)%>% group_by(Month)%>%
pr_monthlysummarise(across(contains("pr"), ~mean(pr)))
# rearrange drawing order to plot from July to June
$Month<-factor(pr_monthly$Month,
pr_monthlylevels =c('07','08','09','10','11','12',
'01','02','03','04','05','06'))
<-tmpdf%>%dplyr::filter(Year>=1981)%>%group_by(Month)%>%
temp_monthlysummarise('tmean'=mean(tmean))
$Month<-factor(temp_monthly$Month,
temp_monthlylevels = c('07','08','09','10','11','12',
'01','02','03','04','05','06'))
# combine the pr and temp data frames
<-cbind(pr_monthly,temp_monthly)
pr_tmp<-pr_tmp[,-3] # remove duplicate column
pr_tmp
$Month<-month.abb
pr_tmp$Month<-factor(pr_tmp$Month,
pr_tmplevels = c("Jul","Aug","Sep","Oct","Nov",
"Dec","Jan","Feb","Mar","Apr","May" ,"Jun"))
# define properties for secondary axis (only when plotting 2 variables
#in the same plot area)
<-list(overlaying = "y",
tyside = "right",
title = "Temperature (°C)",
autotick = FALSE,
dtick = 8,
range=c(15,30)
)
# plot
plot_ly(type= 'bar', data= pr_tmp, x= ~Month, y= ~pr, name = 'Precipitation')%>%
add_lines(x= ~Month, y= ~tmean,name= 'Temperature', yaxis='y2')%>%
add_markers(x= ~Month,y= ~tmean,color='#D21919',yaxis='y2', name='Temperature',
showlegend=FALSE)%>%
layout(legend=list(orientation='h', y=-0.13,x=0.65),
yaxis=list(title='Precipitation (mm)',showticklables=F,
tickfont=list(size=16)),width=700, height=450,
title='Malawi \n (Mean 1981-2020)', yaxis2=ty)%>%
subplot(titleX = TRUE, titleY = TRUE)
6.1.7 Plot Annual data
or the annual means as below;
- Subset the data to get only values from 1981 on wards (or whichever year is preferred)
- Then, group the data by years
- Calculate mean values for each year
- Combine the data frames
- Plot time series object
<-prdf%>%dplyr::filter(Year>=1981)%>% group_by(Year)%>%
pr_annsummarise(across(contains("pr"), ~mean(pr)))
<-tmpdf%>%dplyr::filter(Year>=1981)%>%group_by(Year)%>%
temp_annsummarise('tmean'=mean(tmean))
# combine the pr and temp data frames
<-cbind(pr_ann,temp_ann)
pr.tmp<-pr.tmp[,-3] # remove duplicate column
pr.tmp
# define properties for secondary axis
<-list(overlaying = "y",
tyside = "right",
title = "Temperature (°C)",
autotick = FALSE,
dtick = 1,
showgrid=F,
range=c(20,25)
)
# plot
plot_ly(type= 'bar', data= pr.tmp, x= ~Year, y= ~pr, name = 'Precipitation')%>%
add_lines(x= ~Year, y= ~tmean, mode = 'lines+markers',name= 'Temperature',
yaxis='y2')%>%
layout(legend=list(orientation='h', y=-0.13,x=0.7),
yaxis=list(title='Precipitation (mm)',
showticklables=F, tickfont=list(size=14)),width=700,
height=450, title='Malawi \n (Mean 1981-2020)',
yaxis2=ty, xaxis=list(tickangle=270,tickfont=list(size=14)))
The two preceding plots are twin-plots, meaning two graphics have been combined in a single plot area. To plot just one you may be choose one of the data frames, say temperature, and plot it on its own. See below
%>%filter(Year>2000)%>% plotly::plot_ly()%>%
temp_annadd_lines(x= ~Year, y= ~tmean, color='red', showlegend=FALSE)%>%
add_markers(x= ~Year, y= ~tmean, color='red', showlegend=FALSE,
name='Temperature')%>%
layout(yaxis=list(title='Mean Temerature (°C)',tickfont=list(size=16)),
xaxis=list(tickangle=270,tickfont=list(size=16)))