Working with HDF-EOS files in R: convert, project and mosaic

Hierarchical Data Format – Earth Observing System, otherwise known as HDF-EOS or simply HDF, files are the prescribed format for data products from the NASA Earth Observing System satellites: Terra, Aqua and Aura. After NASA chose the HDF format back in 1993, the National Center for Supercomputing Applications, and later the HDF Group, worked with the space agency to prepare for the upcoming enormous data management challenges. Those challenges arrived in April 2000 when the then recently launch of Terra satellite began delivering data. Each of the EOS satellites deliver a terabyte or more of data per day from many different instruments.

Despite the data storage and organization efficiency of HDF files, they can be rather difficult to work with in R. I know I’ve struggled with them early on and spent countless hours on forums trying to figure out seemingly simple issues. This post is for those readers who work with HDF-EOS files and are interested in converting them into ArcGIS-ready geotiff files.

First off, acknowledgements: This function would not have been possible with out the hard work and dedication by Jonathan Greenberg at the University of Illinois, and Matteo Mattiuzzi at the University of Natural Resources and Life Sciences in Vienna.

The code is below. It is commented and is, as always, also available as a Gist. The entire script should be self explanatory if you’re a regular R user.

projHDF2GTiff = function(loc, hdfs, gtiffs, lyr, fromSRS, toSRS){ 
  if("gdalUtils" %in% rownames(installed.packages()) == FALSE){
    install.packages("gdalUtils", repos="http://r-forge.r-project.org")
    require(gdalUtils)
  } # install and load the gdalUtils package. 
  setwd(loc) # set the working directory to where the data is
suppressWarnings(dir.create(paste(loc,"Projected",sep="/"))) # create a directory to store projected files
    for (i in 1:length(hdfs)){ 
        gdal_translate(hdfs[i],gtiffs[i],sd_index=lyr) # extract the specified HDF layer and save it as a Geotiff
            gdalwarp(gtiffs[i],paste(loc,"Projected",gtiffs[i],sep="/"),s_srs=fromSRS,t_srs=toSRS,srcnodata=-3000,dstnodata=-3000,overwrite = T) # project geotiffs
            unlink(gtiffs[i]) # delete unprojected geotiffs to save space
    }
      }

Now that you’re done studying the function, let’s have a bit of fun with it:

# load variables
myloc = "c:/Users/Hakim/Downloads/test_data/test/" # working directory
hdfs1 = list.files(getwd(), pattern="hdf$") # HDF file list
gtiffs1 = gsub("hdf","tif",hdfs1) # out out GeoTIFF file list
frm.srs = "+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs" # original HDF SRS
to.srs = "+proj=longlat +datum=WGS84 +no_defs" # desired GeoTIFF SRS
s.nodata = -3000 # HDF nodata value
d.nodata = -3000 # GeoTIFF nodata value
# lyr is the HDF layer you want to extract. In this example it is "1" to 
# signify the first layer in the HDF file i.e. NDVI
# execute the function
projHDF2GTiff(loc = myloc, hdfs = hdfs1, gtiffs = gtiffs1, lyr = 1, fromSRS = frm.srs, toSRS = to.srs,
srcnodata = s.nodata, dstnodata = d.nodata)
rm(myloc,hdfs1,gtiffs1,frm.srs,to.srs,s.nodata,d.nodata) # remove variables to save memory

You’ve got your desired HDF layer extracted and projected. But your study area is huge and you have a lot of GeoTIFFs that make it up. You have some analyses to run on these files and at this point you have two options: (1) run the analyses on each file separately and mosaic them ex post facto because you have limited computing resources, or (2) mosaic them first and then run the analysis because you have enough computing resources. Either way, you need to mosaic them. The function is below and is also available as a Gist.

mosaicGTiffs = function(proj.loc, gtiffs, mosaicName, overwrite){ 
  if("gdalUtils" %in% rownames(installed.packages()) == FALSE){ # checks if gdalutils is installed 
    install.packages("gdalUtils", repos="http://r-forge.r-project.org")
    require(gdalUtils)
  }
  suppressWarnings(dir.create(paste(proj.loc,"Mosaicked",sep="/"))) # creates a directory to store mosaicked file
  gdalwarp(gtiffs, paste(proj.loc,"/","Mosaicked","/",mosaicName,".tif",sep=""),overwrite = overwrite)
}

Let’s try it out on the files that we extracted and projected using the previous function.

proj.loc = "c:/Users/Hakim/Downloads/test_data/test/Projected" # set the location of your GeoTIFFS
gtiffs2 = list.files("c:/Users/Hakim/Downloads/test_data/test/Projected/",pattern = ".tif", full.names = T) # list the files
myMosaic = "MOSAIC1" # the name of the final mosaicked GeoTIFF
# execute
mosaicGTiffs(proj.loc = proj.loc, gtiffs = gtiffs2, mosaicName = myMosaic, overwrite = T)
rm(proj.loc,gtiffs2,myMosaic) # remove variables to save memory

And viola! Below is our final mosaicked image. This particular one is composed of four GeoTIFFs covering the upper left, upper right, lower left and lower right sections of the image.

The mosaicked GeoTIFF.