Presentation-quality maps with Google Earth Engine and R

Mariposa County, CA - home of Yosemite Valley and the impressive Half Dome geologic feature. The transition in vegetation communities between the Central Valley (southwest) and the high Sierra (northeast) is significant enough to be clearly observed with Landsat imagery.

I’ve noticed that many of the broad-scale analyses I run using Google Earth Engine frequently lead to error messages like “Too many pixels in the region” and “Computation timed out.” Often, the workaround for quick visualization is setting a scaling factor that doesn’t make computing requirements on Google’s end go nuts. That means, when we look at graphs through ui.Chart.image everything is analyzed across a scaled version of the original raster product.

So, how do we get Google to process and export large RGB datasets and not run into scaling issues along the way? In this case, less is more, every time. Using ui.Chart and Map.addLayer() is where we run into problems. Those are great functions if you want to check your progress along the way, but once you know you have what you want, get rid of all the steps that don’t directly work into the data pipeline of finding imagery, processing imagery, and exporting imagery. Even if mapping steps don’t lead to errors, they do take a long time for large datasets.

Perhaps the single largest consideration you’ll make any time you work with raster datasets is the resolution of your data. In this case, we’re making plots for presentations. A 4k monitor or projector is 3180 pixels wide by 2160 pixels tall. So, there’s really no reason to use data at a finer scale than this if we’re only making a map to give our audience some context. That means, throughout the data pipeline below, I limit the maxpixels to 10 million (3180x2160 is just over 8 million pixels). Don’t bother downloading higher resolution data than that for your ROI if it’s going to end up in a slideshow - but note that for poster printing you may want finer data.

Let’s take Yolo County as an example. If I want to use a high-definition image of Yolo County in a presentation (and I do), the workflow should include:

Identify region of interest
Create masks to remove clouds and water
Select data within a timeframe of interest
Apply masks
Calculate target pixel values
Export image

Disclaimer: You need to be a registered Google Earth Engine user to do this. They have a request system that in my experience is very quick to respond, but be aware that it may take a few days to register.

Note: All code for this workflow can be found pre-compiled (and data pre-downloaded) at this link. Here’s a link to the full script in the Earth Engine code editor.

Part 1: Processing and downloading data with Google Earth Engine

Region of Interest

So, to take things step-by-step: First, we’ll be using javaScript - it was new to me when I started using Earth Engine. I found Google’s tutorials super helpful for navigating a new language. Begin by declaring the region of interest (in this case, Yolo County).

ROI definition

// This ROI is for Yolo county.
var ROI = ee.FeatureCollection("TIGER/2018/Counties")
  .filter(ee.Filter.eq("NAME", "Yolo"));
// Choose the mapping space
Map.centerObject(ROI, 9);
Map.addLayer(ROI, {}, 'Region of interest');

Define functions

Then, define functions to identify clouds and calculate the normalized difference water index , which will be used as a mask.

Function definitions

// This function masks cloudy pixels.
// Function to mask clouds using the quality band of Landsat 8.
var maskClouds = function(image) {
  var qa = image.select('pixel_qa');
  /// Check that the cloud bit is off.
  // See https://www.usgs.gov/land-resources/nli/landsat/landsat-collection-1-level-1-quality-assessment-band
  var mask1 = qa.bitwiseAnd(1 << 3).eq(0);
  var mask2 = qa.bitwiseAnd(1 << 5).eq(0);
  return image.updateMask(mask1).updateMask(mask2);
};


// This function adds NDWI (water index) band to a Landsat 8 image.
var addNDWI = function(image) {
  return image
    .addBands(image
      .normalizedDifference(['B3', 'B5'])
      .rename('NDWI'));
};

Find raw data

Now, we can move on to accessing the data we want. Here, I map the functions from up above to the image collection, removing data outside the region of interest, masking out cloudy images, and creating an NDWI band for each image.

Data acquisition

// Load a raw Landsat scene and display it.
var raw = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
  .filterDate("2020-01-01", "2020-12-31")
  .filterBounds(ROI)
  .map(maskClouds)
  .map(addNDWI);
// Print info about the collection to the console
print("Raw image collection", raw);
// Add raw ImageCollection to mapping panel
// Note that plotting an ImageCollection only shows 
// first layer from Collection in space
Map.addLayer(raw, 
             {bands: ['B4', 'B3', 'B2'],
              min: 0,
              max: 2000}, 
             'Raw image collection', 
             false);

Data wrangling

Calculate the target pixel value and mask out regions classified as water. Here, I want to use the median value from each pixel to produce a median California. Earth Engine has a good tutorial on creating a greenest pixel composite, which also requires NDVI band - but I don’t want a greenest pixel composite so won’t be using it here. For more on compositing check out this link.

Data wrangling

//// Image reduction (temporal)
//// Calculate median value across time series within ROI
var medImg = raw.median();
// Print to console
print("Median image", medImg);
// Visualize in mapping panel
Map.addLayer(medImg, {bands: ['B4', 'B3', 'B2'], min: 0, max: 2000}, 'Median raw image', false);


//// Image masking (spatial)
var medImgClip = medImg.clip(ROI);
//// Select and visualize water mask
var ndwi = medImgClip.select("NDWI");
// Define viz parameters
var ndwiViz = {min: -1, max: 1, palette: ['00FFFF', '0000FF']};
var rgbViz = {bands: ['B4', 'B3', 'B2'], min: 0, max: 2000};
// Visualize NDWI
Map.addLayer(ndwi, ndwiViz, 'NDWI', false);
// Generate binary water mask image
var waterMask = ndwi.lte(0);
print(waterMask);

// Remove areas from RGB image that are "watery"
var landOnly = medImgClip.updateMask(waterMask);
Map.addLayer(landOnly, rgbViz, 'RGB, Land only', false);
// Remove areas from NDWI image that are land
var waterOnly = ndwi.updateMask(ndwi.gte(0));
Map.addLayer(waterOnly, ndwiViz, 'Water mask', false);

Finally, create an RGB composite of the image and export it to Google Drive. It’s going to have a large file size and consequently it’ll take a long time to download, so grab a good book.

Visualization and export

// Apply visualization parameters to imagery
var RGB_masked = landOnly.visualize(rgbViz);
var RGB_roi = medImgClip.visualize(rgbViz);
Map.addLayer(RGB_roi, {}, 'RGB including water', false);


// Export the image, specifying scale and region. Use 
// as fine as product resolution for scale in SMALL ROI's
// BUT DO NOT USE THAT SCALE FOR CA STATE 
// IT WILL BE TOO MUCH, MAN!

// AS LONG AS THIS IS FOR DATA VIZ AND NOT EXTRACTION:
// Note that because a 4k display is 3840x2160 pixels
// (which equals 8,294,400 pixels total) there is
// no reason to set maxPixels on the export > 1e8
Export.image.toDrive({
  image: RGB_roi,
  description: 'Yolo_LC08_RGB_2020_SR',
  folder: 'mapTime',
  scale: 90,
  region: ROI,
  maxPixels: 1e8});

// Export the polygon
Export.table.toDrive({
  collection: ROI,
  description: 'Yolo_ROI',
  folder: 'mapTime',
  fileFormat: "SHP"});

Confirm the task and create a directory to export the raster image. Depending on the size of your image, Earth Engine may break it up into tiles, but don’t worry: they’re georeferenced, so you can mosaic everything together on your own computer quite easily.

The whole of California. The level-3 Sierra Nevada ecoregion. Notice that none of the maps here include a scale bar: If you decide to include some more material for your audience to orient themselves, add some frame of reference using <code>raster::scalebar()</code>, <code>SDMTools::Scalebar()</code>, or the graticule package.

Part 2: Raster plotting with R

Eventually, the task will be marked completed. If you’ve made it this far, there should be a file (or series of files) in the new subdirectory you just created on Google Drive. Download the entire subdirectory, and proceed with final raster processing. To do this in R, load the relevant libraries, load relevant shapefiles, choose an appropriate coordinate window, and plot the data. For US state borders, I used the USAboundaries dataset. Raster functions in R aren’t super efficient, so one of these days I may try to put a similar approach together for Python, but for the time being the R approach works well enough for my needs.

Plot and save with R and ggplot

## Curious about TOA vs SR? Check out this post for more:
# https://gis.stackexchange.com/questions/304180/what-are-the-min-and-max-values-of-map-addlayer-on-google-earth-engine

#### Import packages ----
library(tidyverse)
library(raster)
library(sf)
library(RStoolbox)
library(cowplot)


#### Import data ----
## Raster
eeImg = raster::stack("data/Sierra_LC08_RGB_2020_SR.tif")
eeImg

## Vector
## Yolo County
ROI = sf::read_sf("data/Sierra_ROI.shp") %>%
  st_union()
ROI

## CA State
CA = USAboundaries::us_states() %>%
  st_as_sf() %>%
  filter(name == "California")
CA


#### Data wrangling ----
## Raster
# Convert to data.frame
# Recall that we set maxPixels in the Earth Engine export to 1e8,
# that's the maximum value you should consider using here as well. 
# If your image is going to be a half the screen, you can use half
# the maxpixels. 
imgFortFull = fortify(eeImg, maxpixels = 1e8)
imgFort = imgFortFull %>%
  rename("R" = vis.red,
         "G" = vis.green,
         "B" = vis.blue) %>%
  filter(R + G + B != 0) %>%
  mutate(Rsc = scales::rescale(R, to = c(0,1), from = c(0,255)),
         Gsc = scales::rescale(G, to = c(0,1), from = c(0,255)),
         Bsc = scales::rescale(B, to = c(0,1), from = c(0,255)))


#### Plots ----
## Vector context plot
vectorPlot = ggplot() +
  geom_sf(data = CA,
          fill = "grey90") +
  geom_sf(data = ROI,
          fill = "grey60") +
  theme_void() +
  theme(panel.background = element_rect(fill = "transparent", 
                                        colour = "transparent"),  
        plot.background = element_rect(fill = "transparent", 
                                       colour = "transparent"))
# vectorPlot


## RGB plot
rgbPlot = ggplot() +
  geom_tile(data = imgFort, 
              aes(x = x, y = y, 
                  fill = rgb(Rsc,Gsc,Bsc)))  +
  scale_fill_identity() +
  geom_sf(data = ROI,
          fill = "transparent",
          lwd = 0.75,
          col = "grey40") +
  theme_void() +
  theme(panel.background = element_rect(fill = "transparent", 
                                        colour = "transparent"),  
        plot.background = element_rect(fill = "transparent", 
                                       colour = "transparent"))
# rgbPlot



#### Combine contextual vector and RGB imagery to a single plot with inset
## Generate a "complete" plot that includes contextual vector data
completePlot = ggdraw() +
  draw_plot(rgbPlot) +
  draw_plot(vectorPlot, 
            x = 0.075, y = 0.075, 
            width = 0.4, height = 0.5)
# completePlot



#### Save plots ----
## Give some thought to the output metadata:
# A 4k display is 3840 x 2160 pixels. Therefore there's really no need to have presentation images be >3840 horizontal pixels or >2160 vertical pixels
# Let's do some math. If a figure is to take up half of a ppt slide, then we can happily take it to 1920 horizontal pixels. If that's the case, we can make final image dimensions 7x7 inches, with a resolution of 300 pixels per inch ("dpi") and lose no more information than we would from 4k viewing anyway (final dimensions 2100 x 2100 pixels).

ggsave(filename = "plots/sierraRGB_SR.jpg",
       rgbPlot, bg = "transparent",
       height = 7, width = 7, units = "in", dpi = 300)
ggsave(filename = "plots/sierraContext_SR.jpg",
       completePlot, bg = "transparent",
       height = 7, width = 7, units = "in", dpi = 300)

And finally you should end up with an image that looks something like the one below. Recall that a 4k display has 3840x2160 pixels; there’s no reason to shoot for more detail than that if this plot is for a presentation. A 7“x7” image with 300 pixels per inch will end up being 2100x2100 pixels, a nice size for a half-slide image. Saved as a jpg, the final file size comes out to about 816kB.

Presentation-quality maps with Google Earth Engine and R

Christian John

1/12/2021

Part 1: Processing and downloading data with Google Earth Engine

Region of Interest

Define functions

Find raw data

Data wrangling

Part 2: Raster plotting with R