Earthdata Search

DAAC Home > Get Data > Science Themes > Vegetation and Forests > User guide

Annual Land Use and Urban Land Cover: Ethiopia, Nigeria, and South Africa, 2016-2020

Get Data

Documentation Revision Date: 2024-09-06

Dataset Version: 1

Summary

This dataset provides a two-tier annual Land Use (LU) and Urban Land Cover (LC) product suite over three African countries, Ethiopia, Nigeria, and South Africa, across a 5-year period of 2016-2020. Remote sensing data sources were used to create 30-m resolution LU maps (Tier-1), which were then utilized to delineate urban boundaries for 10-m resolution LC classes (Tier-2). Random Forest machine learning classifier models were trained on reference data for each tier and country (but one model was trained across all years); models were validated using a separate reference data set for each tier and country. Tier-1 LU maps were based on the 30-m Landsat time series, and Tier-2 urban LC maps were based on the 10-m Sentinel-2 time series. Additional data sources included climate, topography, night-time light, and soils. The overall map accuracy was 65-80% for Tier-1 maps and 60-80% for Tier-2 maps, depending on the year and country. The data are provided in cloud optimized GeoTIFF (COG) format.

This dataset includes 30 files in cloud optimized GeoTIFF (COG) format.

Figure 1. Predicted land use for Nigeria with a detailed land cover for Benin City (inset).

Citation

Vogeler, J., and S. Shah Heydari. 2024. Annual Land Use and Urban Land Cover: Ethiopia, Nigeria, and South Africa, 2016-2020. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/2367

Dataset Overview
Data Characteristics
Application and Derivation
Quality Assessment
Data Acquisition, Materials, and Methods
Data Access
References

Dataset Overview

Vegetation Collection

The ORNL DAAC compiles, archives, and distributes data on vegetation from local to global scales. Specific topic areas include: belowground vegetation characteristics and roots, vegetation biomass, fire and other disturbance, vegetation dynamics, land cover and land use change, vegetation characteristics, and NPP (Net Primary Production) data.

Related Publications:

Cardenas-Ritzert, O.S. E., J.C. Vogeler, S. Shah Heydari, P.A. Fekety, M. Laituri, and M. McHale. 2024. Automated Geospatial Approach for Assessing SDG Indicator 11.3.1: A multi-level evaluation of urban land use expansion across Africa. ISPRS International Journal of Geo-Information 13:226. https://doi.org/10.3390/ijgi13070226

Shah Heydari, S., J.C. Vogeler, O. Cardenas-Ritzert, S.K. Filippelli, M. Laituri, and M. McHale. 2024. Multi-tier land use and land cover mapping framework and its application in urbanization analysis in three African countries. Remote Sensing, In Press.

Acknowledgements:

This work was funded by the NASA Land Cover and Land Use Change Program (grant 80NSSC21K0313). O. Cardenas-Ritzert, S. Filippelli, M. Laituri, and M. McHale contributed to the development of project objectives and analyses.

Data Characteristics

Spatial Coverage: Ethiopia, Nigeria, and South Africa

Spatial Resolution: 30 m for Tier 1 land use; 10 m for Tier 2 land cover

Temporal Coverage: 2016-01-01 to 2020-12-31

Temporal Resolution: Annual estimates

Study Areas: Latitude and longitude are given in decimal degrees.

Site	Westernmost Longitude	Easternmost Longitude	Northernmost Latitude	Southernmost Latitude
Ethiopia	32.02876	49.68746	16.20533	2.98263
Nigeria	2.57257	14.90288	14.44025	4.16829
South Africa	15.98231	35.03725	-20.62296	-35.33684

Data File Information

This dataset includes 30 files in cloud optimized GeoTIFF (COG) format. The files hold integers that indicate land use/land cover classes. There are 10 files for each of three countries and two files for each country-year combination.

The file naming convention is <country>_<year>_<tier>.tif, where

<country> = "Ethiopia", "Nigeria", or "SouthAfrica"
<year> = year of estimate: 2016, 2017, 2018, 2019, or 2020
<tier> = "T1" for Tier 1 land use or "T2" for Tier 2 land cover.

Tier 1 (T1) files hold land use across the entire country while Tier 2 (T2) files provide land cover classification for delineated urban areas.

GeoTIFF characteristics:

Pixel Values: Integers indicating land use/land cover classes (Table 1).
Spatial resolution: 30 m for T1; 10 m for T2
Coordinate system: WGS 1984 Albers for Africa
- Proj.4 string: +proj=aea +lat_0=0 +lon_0=0 +lat_1=20 +lat_2=-23 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs

Table 1. Pixel values and land use/land cover classes for GeoTIFFs.

Pixel value	Tier 1 (land use)	Tier 2 (urban land cover)
0	No data	No data
1	Agriculture	Barren
2	Bare	Building
3	Developed	Pavement
4	Forest	Short Vegetation
5	Rangeland	Tall Vegetation
6	Water	Water
7	Wetland	Wetland

Application and Derivation

The aim of this work was to develop an integrated mapping framework to support urbanization related assessments of land use and land cover (LULC) change. The methodology allows for the identification of dynamic urban boundaries and quantification of annual urbanization-driven change for 2016 - 2020. The country-wide LU maps were the base data for capturing dynamic urban boundaries described in Cardenas-Ritzert et al. (2024). The Tier-2 maps were used to calculate the United Nations defined Sustainable Development Goal Indicator 11.3.1 related to urbanization rates and identifying hotspots of rapid urban expansion.

These LULC products for developing countries can serve as a basis for monitoring historic and current patterns of land use as well as its socioeconomic impacts. For Instance, incorporating multiple resolutions may elicit more nuanced depictions of how urbanization manifests on the landscape. Basic relationships between land uses and social-environmental services can lead to decisions that amplify inequities and to biased urban planning policies (McHale et al., 2018).

Quality Assessment

Model accuracy was assessed using an independent set of validation points for each country and product tier. One of the generated maps was used to make a stratified sample within map classes and the strata information was used to assess area-adjusted map accuracy for each year (e.g., Stehman, 2014). The overall map accuracy was 65-80% for Tier-1 maps and 60-80% for Tier-2 maps, depending on the evaluation year and country.

Table 2. Accuracy assessment for Tier 1 land uses for 2020.

	Ethiopia			Nigeria			South Africa
LU Class	UA	PA	F1	UA	PA	F1	UA	PA	F1
Agriculture	61.2	77	68.2	72.4	77.4	74.8	64.7	89.9	75.3
Bare	45.5	95.8	61.7	81	27.9	41.5	53.7	6.9	12.2
Developed	33.1	23.4	27.4	81.5	47.8	60.3	92	24.1	38.2
Forest	76.4	80.2	78.3	62.7	59	60.8	53.7	82.9	65.10
Range	86.4	72	78.5	57.9	57.4	57.6	77.4	88.5	82.6
Water	96.7	100	98.3	75.3	92.9	83.2	86.3	98.8	92.1
Wetland	57.7	64.7	61	78.3	49.7	60.8	67.60	8.30	14.8
	Map OA	95% CI		Map OA	95% CI		Map OA	95% CI
	74.60	±7.3		65.9	±5.4		73.60	±6.8

OA = overall accuracy, PA = producer accuracy, UA = user accuracy, F1 = the harmonic mean of PA and UA,
and 95% CI = error value for constructing 95% confidence interval for Map OA

Table 3. Accuracy assessment for Tier 2 land cover for 2020.

	Ethiopia			Nigeria			South Africa
LC Class	UA	PA	F1	UA	PA	F1	UA	PA	F1
Barren	40.1	60.4	48.2	66.9	31.8	43.1	68.8	36.4	47.6
Building	68.1	55.6	61.2	58.4	85.2	69.3	48	85.8	61.5
Pavement	29.3	63.1	40	66.2	26.9	38.30	46.1	41.7	43.8
Short vegetation	89.8	85.7	87.7	75.2	74.3	74.7	85.4	67.5	75.4
Tall vegetation	62.7	63.2	63	65.7	70.1	67.8	46.5	55.9	50.8
Water	96.7	95.8	96.3	90	96.4	93.1	77.5	82.3	79.8
Wetland	52.3	58.8	55.4	86.8	58.9	70.2	16.9	90.2	28.4
	Map OA	95% CI		Map OA	95% CI		Map OA	95% CI
	78.3	±5.1		66.3	±5.3		62.8	±3.7

Consult Shah Heydari et al. (2024, In Press) for full assessment results.

Data Acquisition, Materials, and Methods

All remote sensing data used in this project (Table 4) were obtained through Google Earth Engine (GEE) data repository except WorldClim data (www.worldclim.org, version 2.1). All data were resampled to the target resolution (30 m for Tier-1, 10 m for Tier-2). All optical data (Landsat/Sentinel-2) were filtered for cloud/cloud shadow pixels using the sensor quality bits. For Sentinel-2, an additional cloud processing was applied to enhance the detection using GEE’s cloud masking utility (s2cloudless; Google, 2023). Speckle filtering and a few more enhancements were applied to Sentinel-1 SAR data (Mullissa et al., 2021).

Table 4: Input data sources used for map synthesis in Tier-1 and Tier-2 products. Derived spatial indices (TCA/TCB/TCG/TCW, UCI, BAI/BAEI/NBAI, WI, and MNDWI) and calculated GLCM metrics are described in Shah Heydari et al. (2024, In Press).

	Imagery Details
Sensor type / Dataset	Tier 1 Land Use products	Tier 2 Land Cover products	Derived features
Optical	Landsat Collection-2 Surface Reflectance @ 30 m: six bands of Blue, Green, Red, NIR, SWIR1, and SWIR2 Zonal statistics calculated for green and NIR bands plus TCB, TCG, and TCW indices BAI/BAEI/NBAI not used in LU product	Sentinel-2 Top of Atmosphere @ 10 m: six bands of B2(blue), B3(green), B4(red), B8(NIR), B11(SWIR1), B12(SWIR2) Zonal statistics calculated for green and NIR bands plus TCB, TCG, and TCW indices GLCM metrics calculated using NIR band over radii of 5 and 9	NDVI Tasseled Cap indices (TCA/TCB/TCG/TCW) UCI, BAI, BAEI, NBAI MNDWI, WI %Water Zonal (optical/radar source): Min/Mean/Max/StDev of the pixel values within a 3x3 and 5x5 neighborhood Context (optical/radar source): GLCM metrics of ASM, Contrast, Correlation, Variance, Sum average, Entropy, Information Measures of correlation (1 and 2), Dissimilarity, Cluster shade, and Cluster prominence
Synthetic Aperture Radar	Sentinel-1 SAR Ground Range Detected VV polarization @ 30 m GLCM metrics calculated over radii of 5, 9, 13, and 17 pixels	Sentinel-1 SAR Ground Range Detected VV polarization @ 10 m GLCM metrics calculated over radii of 5 and 9 pixels
Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB) @ 460 m	Included	Included	Yearly median of average monthly radiance values (months with at least two observations are counted)
TerraClimate @1/24 degree (~4.5 km)	Not included	Included	Total year precipitation and yearly minimum/maximum temperature
WorldClim V2.1 @ 30 arcsec (~1 km)	Included	Included	19 bioclimatic variables (bio_01 to bio_19) featuring normal (30-years) temperature and precipitation statistics, as defined in Appendix 1
SRTM digital elevation data @ 30 m	Included	Included	Terrain parameter (elevation, slope, aspect) – static parameter
Continuous Heat-Insolation Load Index (CHILI_Index) @ 90 m	Included	Included	CHILI index (a number between 0 to 255)
iSDA soil texture class @ 30 m	Included	Included	USDA Texture Class at 0-20cm depth (a number from 0 to 12)
World Ecoregions (RESOLVE), vector dataset	Included	Included	Ecoregion identifier (a 3-digit number)

The training data were created using a mixture of random and manual points within each country’s extent (Table 5). The land use and land cover were interpreted manually for each year within 2016-2020 by trained interpreters using Google Earth high-resolution imagery, a Landsat time-series viewer application named TimeSync Plus (Oregon State University, 2023; Cohen et al., 2010.), and a similar tool developed for viewing Sentinel-2 data trajectories. One of the generated maps was then used to make a stratified sample within map classes and the sampled information was used to assess area-adjusted map accuracy for each year according to Stehman (2014) (see Tables 2-3).

Table 5. Number of interpreted reference pixels for training and validation for Tier 1 and Tier 2 across each study country.

Country	Tier	Training pixels	Validation pixels
Ethiopia	Tier 1, whole country	740	550
Ethiopia	Tier 2, final urban agglomerations	1172	700
Nigeria	Tier 1, whole country	687	700
Nigeria	Tier 2, final urban agglomerations	1200	525
South Africa	Tier 1, whole country	957	1000
South Africa	Tier 2, final urban agglomerations	2897	1050

Using training data for all years, a supervised Random Forests model (Scikit-learn, 2023; Pedregosa et al., 2011) was trained and optimized using the best features found after an iterative feature selection. A variety of parameter settings for the Random Forest models were tested including the number of trees, maximum depth, and minimum leaf size. The best balance between complexity and model improvements was provided by 100 trees grown to unlimited depth with a minimum leaf size of four.

The total number of features created from data sources (Table 4) was 154 features for Tier 1 maps and 170 features for Tier 2 maps. These numbers were reduced to a limited set of less-correlated features (between 28 - 53 features) for final models. For each country and product tier, the correlation matrix for the training data was calculated, and a dendrogram (hierarchical clustering model) of feature correlations was created. Through an iterative process, different thresholds were used to cut the dendrogram to generate a set of randomly selected features from each cluster. These features were used to train a Random Forests model, and model performance was calculated. A final set of 10-20 top performing models were selected for further visual refining. This extra refining step was needed because models with comparable model assessment results can still produce significantly different map products when applied spatially. For final model selection, the model outputs were visually compared within several heterogeneous test areas for each product tier and study country. The trained model was then applied to the full extent of Tier 1 (country-level) or Tier 2 (urban centers within each country) for each year within that period.

To make land use transitions more realistic for Tier 1 products, a 3-year window (current year, one year before, one year after) was applied to extract the features and train models. For post processing, small patches of <5 pixels with the same land use type were removed, and each pixel was replaced with the majority pixel value from the immediate eight neighbors. For Tier 2 processing, the features were extracted just for the current year because the land cover can have spontaneous yearly changes. A gap filling procedure was performed to fill the input features with a replacement from previous and/or next year if there was a null in the input data. No spatial post-processing was applied to the Tier 2 products.

Data Access

These data are available through the Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC).

Annual Land Use and Urban Land Cover: Ethiopia, Nigeria, and South Africa, 2016-2020

Contact for Data Center Access Information:

E-mail: uso@daac.ornl.gov
Telephone: +1 (865) 241-3952

References

Cohen, W. B., Z. Yang, and R. Kennedy. 2010. Detecting trends in forest disturbance and recovery using yearly Landsat time series: 2. TimeSync — Tools for calibration and validation. Remote Sensing of Environment 114:2911–2924. https://doi.org/10.1016/j.rse.2010.07.010

Google. 2023. Sentinel-2 Cloud Masking with s2cloudless. https://developers.google.com/earth-engine/tutorials/community/sentinel-2-s2cloudless

McHale, M.R., S.M. Beck, S.T. A. Pickett, D.L. Childers, M.L. Cadenasso, L. Rivers, L. Swemmer, L. Ebersohn, W. Twine, and D.N. Bunn. 2018. Democratization of ecosystem services—a radical approach for assessing nature’s benefits in the face of urbanization. Ecosystem Health and Sustainability 4:115–131. https://doi.org/10.1080/20964129.2018.1480905

Mullissa, A., A. Vollrath, C. Odongo-Braun, B. Slagter, J. Balling, Y. Gou, N. Gorelick, and J. Reiche. 2021. Sentinel-1 SAR Backscatter Analysis Ready Data Preparation in Google Earth Engine. Remote Sensing 13:1954. https://doi.org/10.3390/rs13101954

Oregon State University. 2023. eMapR/TimeSync-Plus: An application for gathering point and polygon spectral temporal information from Landsat time series data into a database. Retrieved September 19, 2023, from https://github.com/eMapR/TimeSync-Plus.

Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12:2825-2830. http://jmlr.org/papers/v12/pedregosa11a.html

Scikit. 2023. Scikit-learn: Machine learning in Python—Scikit-learn 1.3.0 documentation. Retrieved September 19, 2023, from https://scikit-learn.org/stable

Stehman, S.V. 2014. Estimating area and map accuracy for stratified random sampling when the strata are different from the map classes. International Journal of Remote Sensing 35:4923-4939. https://doi.org/10.1080/01431161.2014.930207

Worldclim. 2023. Worldclim Biometric variables. https://www.worldclim.org/data/bioclim.html

Annual Land Use and Urban Land Cover: Ethiopia, Nigeria, and South Africa, 2016-2020

Summary

Citation

Table of Contents

Dataset Overview

Data Characteristics

Application and Derivation

Quality Assessment

Data Acquisition, Materials, and Methods

Data Access

References