Working with HDF-EOS files in R: convert, project and mosaic
Hierarchical Data Format – Earth Observing System, otherwise known as HDF-EOS or simply HDF, files are the prescribed format for data products from the NASA Earth Observing System satellites: Terra, Aqua and Aura. After NASA chose the HDF format back in 1993, the National Center for Supercomputing Applications, and later the HDF Group, worked with the space agency to prepare for the upcoming enormous data management challenges. Those challenges arrived in April 2000 when the then recently launch of Terra satellite began delivering data. Each of the EOS satellites deliver a terabyte or more of data per day from many different instruments.
Despite the data storage and organization efficiency of HDF files, they can be rather difficult to work with in R. I know I’ve struggled with them early on and spent countless hours on forums trying to figure out seemingly simple issues. This post is for those readers who work with HDF-EOS files and are interested in converting them into ArcGIS-ready geotiff files.
First off, acknowledgements: This function would not have been possible with out the hard work and dedication by Jonathan Greenberg at the University of Illinois, and Matteo Mattiuzzi at the University of Natural Resources and Life Sciences in Vienna.
The code is below. It is commented and is, as always, also available as a Gist. The entire script should be self explanatory if you’re a regular R user.
projHDF2GTiff = function(loc, hdfs, gtiffs, lyr, fromSRS, toSRS){ if("gdalUtils" %in% rownames(installed.packages()) == FALSE){ install.packages("gdalUtils", repos="http://r-forge.r-project.org") require(gdalUtils) } # install and load the gdalUtils package. setwd(loc) # set the working directory to where the data is suppressWarnings(dir.create(paste(loc,"Projected",sep="/"))) # create a directory to store projected files for (i in 1:length(hdfs)){ gdal_translate(hdfs[i],gtiffs[i],sd_index=lyr) # extract the specified HDF layer and save it as a Geotiff gdalwarp(gtiffs[i],paste(loc,"Projected",gtiffs[i],sep="/"),s_srs=fromSRS,t_srs=toSRS,srcnodata=-3000,dstnodata=-3000,overwrite = T) # project geotiffs unlink(gtiffs[i]) # delete unprojected geotiffs to save space } }
Now that you’re done studying the function, let’s have a bit of fun with it:
# load variables myloc = "c:/Users/Hakim/Downloads/test_data/test/" # working directory hdfs1 = list.files(getwd(), pattern="hdf$") # HDF file list gtiffs1 = gsub("hdf","tif",hdfs1) # out out GeoTIFF file list frm.srs = "+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs" # original HDF SRS to.srs = "+proj=longlat +datum=WGS84 +no_defs" # desired GeoTIFF SRS s.nodata = -3000 # HDF nodata value d.nodata = -3000 # GeoTIFF nodata value # lyr is the HDF layer you want to extract. In this example it is "1" to # signify the first layer in the HDF file i.e. NDVI # execute the function projHDF2GTiff(loc = myloc, hdfs = hdfs1, gtiffs = gtiffs1, lyr = 1, fromSRS = frm.srs, toSRS = to.srs, srcnodata = s.nodata, dstnodata = d.nodata) rm(myloc,hdfs1,gtiffs1,frm.srs,to.srs,s.nodata,d.nodata) # remove variables to save memory
You’ve got your desired HDF layer extracted and projected. But your study area is huge and you have a lot of GeoTIFFs that make it up. You have some analyses to run on these files and at this point you have two options: (1) run the analyses on each file separately and mosaic them ex post facto because you have limited computing resources, or (2) mosaic them first and then run the analysis because you have enough computing resources. Either way, you need to mosaic them. The function is below and is also available as a Gist.
mosaicGTiffs = function(proj.loc, gtiffs, mosaicName, overwrite){ if("gdalUtils" %in% rownames(installed.packages()) == FALSE){ # checks if gdalutils is installed install.packages("gdalUtils", repos="http://r-forge.r-project.org") require(gdalUtils) } suppressWarnings(dir.create(paste(proj.loc,"Mosaicked",sep="/"))) # creates a directory to store mosaicked file gdalwarp(gtiffs, paste(proj.loc,"/","Mosaicked","/",mosaicName,".tif",sep=""),overwrite = overwrite) }
Let’s try it out on the files that we extracted and projected using the previous function.
proj.loc = "c:/Users/Hakim/Downloads/test_data/test/Projected" # set the location of your GeoTIFFS gtiffs2 = list.files("c:/Users/Hakim/Downloads/test_data/test/Projected/",pattern = ".tif", full.names = T) # list the files myMosaic = "MOSAIC1" # the name of the final mosaicked GeoTIFF # execute mosaicGTiffs(proj.loc = proj.loc, gtiffs = gtiffs2, mosaicName = myMosaic, overwrite = T) rm(proj.loc,gtiffs2,myMosaic) # remove variables to save memory
And viola! Below is our final mosaicked image. This particular one is composed of four GeoTIFFs covering the upper left, upper right, lower left and lower right sections of the image.