Introduction to raster data

Goals:

Understand the difference between raster and vector data
Introduce raster parameters - dim(), crs(), res(), extent(), names()
Use basic raster operations - subset(), stack(), calc(), extract(), crop(), mask(), cellStats()
Visualization of rasters - plot(), col =, xlim =, ylim =
Access and download data - getData()

Getting started

Let’s install and load the libraries we’ll need:

if(!require(raster)){install.packages("raster")}

## Loading required package: raster

## Loading required package: sp

if(!require(rgdal)){install.packages("rgdal")}

## Loading required package: rgdal

## rgdal: version: 1.4-8, (SVN revision 845)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj
##  Linking to sp version: 1.3-2

if(!require(here)){install.packages("here")}

## Loading required package: here

## here() starts at /Users/christianjohn/Documents/Tutorials/rasterIntro

library(raster) # For working with raster datasets
library(rgdal)  # For working with vector datsets
library(here)   # For dealing with filepath annoyances

Let’s start off with two sets of data. First, we’ll need to download a vector object with spatial information about California’s shape, and second, some historical temperature data in raster format. To download the vector data, we can use download.file() to directly access a file from the internet, and then we can unzip it using the unzip() function. We will load the Spatial* data into R using readOGR() from the rgdal library.

# First download the shapefile from California's geospatial data repository
download.file("https://data.ca.gov/dataset/e212e397-1277-4df3-8c22-40721b095f33/resource/3db1e426-fb51-44f5-82d5-a54d7c6e188b/download/ca-state-boundary.zip",
              destfile = paste0(here(), "/CA_outline.zip"))

# Then unzip it (it was compressed for the download)
unzip(paste0(here(), "/CA_outline.zip"))

# Open the shape using rgdal::readOGR()
CA = readOGR(dsn = here(),
             layer = "CA_State_TIGER2016")

## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/christianjohn/Documents/Tutorials/rasterIntro", layer: "CA_State_TIGER2016"
## with 1 features
## It has 14 fields
## Integer64 fields read as strings:  ALAND AWATER

# Always look at your data!
plot(CA)

We’ll come back to the vector data later. In the meantime, let’s access some raster data and start playing around with it. First, download historical (also called “current” = 1960-1990) maximum monthly temperatures from the worldclim data repository, using the getData() function from the raster package. Each layer of the raster stack is the maximum temperature for a month (nLayers = 12). The resolution argument (res) tells getData() that we want coarse data (resolution = 10 minutes means that a pixel in Sacramento is ~11.5 miles wide).

# Download max temp data from worldclim with resolution = 10 degree-minutes (1/6 of a degree)
maxTempHistorical = getData(name = 'worldclim', 
                         var='tmax', 
                         res=10)
# Look at the data
maxTempHistorical

## class      : RasterStack 
## dimensions : 900, 2160, 1944000, 12  (nrow, ncol, ncell, nlayers)
## resolution : 0.1666667, 0.1666667  (x, y)
## extent     : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## names      : tmax1, tmax2, tmax3, tmax4, tmax5, tmax6, tmax7, tmax8, tmax9, tmax10, tmax11, tmax12 
## min values :  -478,  -421,  -422,  -335,  -190,   -94,   -59,   -76,  -153,   -265,   -373,   -452 
## max values :   418,   414,   420,   429,   441,   467,   489,   474,   441,    401,    401,    413

Raster properties

Raster* objects include rasters, rasterStacks, and rasterBricks. A raster has one layer, while rasterStacks and rasterBricks can have more than one layer. We can use several simple functions to explicitly ask R about different properties of the raster dataset:

# How many rows and columns (and optionally layers) are in the object?
dim(maxTempHistorical)

## [1]  900 2160   12

# How big are the pixels?
res(maxTempHistorical)

## [1] 0.1666667 0.1666667

# Where are the geographic limits of this dataset?
extent(maxTempHistorical)

## class      : Extent 
## xmin       : -180 
## xmax       : 180 
## ymin       : -60 
## ymax       : 90

# How are these data being projected from a 3D surface onto a 2D map?
crs(maxTempHistorical)

## CRS arguments:
##  +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

# What are the names of the layers in the object?
names(maxTempHistorical)

##  [1] "tmax1"  "tmax2"  "tmax3"  "tmax4"  "tmax5"  "tmax6"  "tmax7"  "tmax8" 
##  [9] "tmax9"  "tmax10" "tmax11" "tmax12"

An important note: You may have noticed that the res() output is 1.6666… But 1.67 whats? 1.67 meters? 1.67 elephant toes? The answer lies in the projection of the dataset. Because we downloaded worldclim data at a 10-minute scale, and there are 60 minutes in a degree; 10/60 = 1/6 = 0.167. So, in this case, the resolution of the data is 0.167 degrees.

If you have an object with crs() = “…longlat…” you’re measuring your data in degrees. Another common crs is UTM (California is in UTM zones 10 and 11), which is measured in meters! Comparing the resolution of data in longlat and UTM units is like comparing a space shuttle’s speed in miles per hour and kilometers per hour.

Now that we know what the deal is with the properties of our raster dataset, let’s subset the data so we’re dealing with just one month’s worth of data. Subsetting can be done in a few ways, two of which I will show here. Suppose we are interested in the maximum historical January temperature. We can subset the 12-layer rasterStack using the name of the layer, or the index of the layer:

Subsetting raster data

# We can subset a layer of a raster stack in two ways:
# 1, using the `subset()` function
# 2, using indexing

# Method 1: Can use layer name or numerical index.
names(maxTempHistorical)

##  [1] "tmax1"  "tmax2"  "tmax3"  "tmax4"  "tmax5"  "tmax6"  "tmax7"  "tmax8" 
##  [9] "tmax9"  "tmax10" "tmax11" "tmax12"

maxJanHisto = subset(maxTempHistorical, 
                     subset = "tmax1")

# Method 2: Also works for layer name or numerical index
sample2 = maxTempHistorical[[1]]

# Check that both methods do the same thing:
all.equal(maxJanHisto, sample2) # TRUE

## [1] TRUE

Plotting raster data

Now that we’re down to just one Raster layer, let’s take a look at it. We can use standard base plotting functions from R to visualize raster* datasets. Note that if we call plot() on a raster dataset with many layers, each layer is plotted. And remember, the temperature is stored in units of 0.1 °C.

# Using typical graphics functions in R we can visualize raster datasets.
plot(maxTempHistorical)

plot(maxJanHisto, 
     main = "Historical max Jan temps")

We can also adjust the color ramp and plotting window, as one would with other base plotting endeavors.

# We can specify a color ramp using the `col` argument:
plot(maxJanHisto, 
     col = topo.colors(100), 
     main = "Historical max Jan temps")

# We can zoom in on a region using the xlim and ylim arguments:
plot(maxJanHisto,
     xlim = c(-180,-50),
     ylim = c(10,90), 
     main = "Historical max Jan temps, North America")

Cropping and masking raster data

Often, we only want to look at a specific region within a raster dataset. This can be accomplished by cropping and/or masking, and is akin to spatially subsetting our data. The raster functions for this are crop() and mask(). To crop or mask a raster, we will need a raster and a secondary object, typically a Spatial* object. The raster and secondary objects must be projected using the same crs arguments. Now we’ll return to the California shapefile we downloaded a while ago. Because the Spatial* object is not projected in the same way as the raster object, we’ll need to transform it accordingly.

# CA is a SpatialPolygonsDataFrame
proj4string(CA)

## [1] "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs"

proj4string(maxJanHisto)

## [1] "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"

# Since the projections aren't the same, we need to transform a dataset. Reprojecting rasters can degrade the data quality (and is computationally expensive), so let's transform the California shapefile
CA_proj = spTransform(CA, CRSobj = crs(maxJanHisto))

# Check the transformation by plotting the raster and vector data together
plot(maxJanHisto)
lines(CA_proj)

# Now, we can finally do some subsetting! Let's compare cropping and masking.
# First, crop:
maxJanCaliCrop = crop(maxJanHisto,
                      CA_proj)
# Next, mask:
maxJanCaliMask = mask(maxJanHisto,
                      CA_proj)

# Visualize the output of these functions
plot(maxJanCaliCrop, main = "cropped dataset")

plot(maxJanCaliMask, main = "masked dataset")

So, what’s the difference?

# The answer lies in the extent
extent(maxJanCaliCrop)

## class      : Extent 
## xmin       : -124.5 
## xmax       : -114.1667 
## ymin       : 32.5 
## ymax       : 42

extent(maxJanCaliMask)

## class      : Extent 
## xmin       : -180 
## xmax       : 180 
## ymin       : -60 
## ymax       : 90

Clearly, crop() and mask() do different things. If we want to crop the raster data down to the extent of the shapefile, and retain only data within the shape, we’ll need to use both crop() and mask() (unless we’re working with a rectangle).

# If we want to crop to the extent, *and* mask to the bounds of the region, we need to use both `crop()` and `mask()`:
maxJanCaliClip = crop(maxJanHisto,
                      CA_proj)
maxJanCaliClip = mask(maxJanCaliClip,
                      CA_proj)
plot(maxJanCaliClip)
lines(CA_proj)

Raster calculations

Now, let’s use raster calculations to compute values within and across rasters. Suppose we want to visualize Jan temps with meaningful values. Remember, temperature is reported in 0.1-degree Celcius units. So, let’s divide by 10 and visualize:

plot(maxJanHisto,
     main = "Raw data (units = 0.1 deg C)")

janRealUnits = calc(maxJanHisto,
                    fun = function(x){x/10})
plot(janRealUnits,
     main = "Scaled data (units = 1 deg C)")

In some cases, such as the example above, we can simplify the approach by vectorizing it. In that case, we don’t even need to use the calc() function!

# Vectorized approach:
plot(maxJanHisto/10,
     main = "Vectorized method (units = 1 deg C)")

Now suppose we want to estimate the mean January temperature across space. To do that, we’ll also need a tmin raster layer. The var argument can be set to tmin for minimum monthly temperatures. While it isn’t perfect, a rough estimate for mean temperature can be computed by taking the mean of the tmax and tmin values.

# Download the min historical temp data
minTempHisto = getData(name = 'worldclim', 
                            var='tmin', 
                            res=10)
# Select the layer with January
minJanHisto = minTempHisto[[1]]
# Visualize - do these data make sense?
plot(minJanHisto)

To find the mean, we’ll need to use calc() again, this time specifying fun = mean in the arguments:

# Next, stack the raster datsets we want to use into one object using the `stack()` function:
janHistoStack = stack(maxJanHisto,
                      minJanHisto)

# Finally, we can calculate a mean value using the `calc()` function, and specifying that we want to calculate a mean:
meanJanHisto = calc(janHistoStack,
                    fun = mean)

# Look at the data
plot(meanJanHisto)

A practical application

At this point, we’ve learned how to access raster datasets using the getData() function in the raster package, and perform a few simple operations with rasters. Let’s make things a bit more complex. We can now answer the following question: how much warming can we expect by the year 2070 under the current Carbon emissions regime?

To answer: Download projected maximum and minimum monthly temperatures under a climate warming scenario. We’ll use the RCP 8.5 (“business as usual”) in the year 2070. Predicted climate data came from the Coupled Model Intercomparisin Project (CMIP5) - here we’re just using one model from that project (AC). We’ll use the same resolution as the other data we downloaded, 10-minutes.

# Download maximum temp data
tmax85 = getData(name = 'CMIP5', 
                 var='tmax', 
                 res=10, 
                 rcp=85, 
                 model='AC', 
                 year=70)

### And minimum temp data
tmin85 = getData(name = 'CMIP5', 
                 var='tmin', 
                 res=10, 
                 rcp=85, 
                 model='AC', 
                 year=70)

Let’s see how much warmer we can expect mean January temperatures to be in 2070, under the RCP 8.5 scenario.

# Stack the layers of interest into a single object
jan85 = stack(tmax85[[1]],
              tmin85[[1]])

# Use `calc()` to find the projected mean January temperature under RCP 8.5
meanJan85 = calc(jan85,
                 fun = mean)

# Find the difference between predicted and historical January temperatures using an arithmetical approach
diffJan85 = meanJan85 - meanJanHisto

# Look at the difference
plot(diffJan85, main = "Change in Jan temp, historical to 2070")

Where do we see the greatest warming? How much warmer do we expect those regions to be compared to the 1960-1990 baseline?

Suppose we want to know what the warming situation is for a particular location - say, Sacramento. To do that, we’ll use the extract() function. We need to create a matrix that indicates Sacramento’s location (in the same units as the raster), and then extract.

# First, identify where Sacramento is 
# NOTE: be sure to use appropriate units given the CRS arguments!!
sacLon = -121.4944  # Longitude = east/west
sacLat = 38.5816    # Latitude = north/south

# We need to convert the coordinates for Sacramento into matrix form to use the `extract()` function
sacMatrix =  matrix(c(sacLon, sacLat),
                    ncol = 2)

# Use the `extract()` function to identify the value at a specific location
sacWarming = extract(diffJan85, y = sacMatrix)
sacWarming

## [1] 31.5

How much is the mean January temperature in Sacramento expected to increase by 2070?

Raster statistics

We can get broad-scale aggregated summary statistics from rasters using the cellStats() function and specifying a summarizing function. Available functions include mean, standard deviation, min, etc. See documentation details for more.

# Use cellStats to summarize entire raster
meanTempJan = cellStats(meanJanHisto,
                        stat = "mean")

# Print result
meanTempJan

## [1] -18.68286

We can also use cellStats for raster bricks, returning a value for each layer. Let’s use California temperature as an example. First, crop and mask the full historical max temp raster stack to CA.

# Crop and mask the data
maxHistoCaliClip = crop(maxTempHistorical,
                        CA_proj)
maxHistoCaliClip = mask(maxHistoCaliClip,
                        CA_proj)

# Find the mean value of each layer
avgMaxtempCA = cellStats(maxHistoCaliClip,
                         stat = "mean")

# Visualize
plot(avgMaxtempCA/10, 
     type = "b",
     main = "Average maximum temperature by month\nin California, 1960-1990",
     ylab = "Temperature (deg C)",
     xlab = "Month")

Challenge

Compare predicted temperature in a month of your choice between the two different RCP scenarios, RCP 4.5 and RCP 8.5. How does Carbon mitigation affect expected warming? Does mitigation affect warming uniformly over space?

Hint: First, develop a workflow with comments. What data do you need to answer the question? What operations need to be performed on the data? Are there any unit corrections to consider?