Map accuracy in R
This post will show you how to validate a classification map using the Olofsson et al., 2014 best practices protocol and the mapaccuracy
package.
library(mapaccuracy)
library(tidyverse)
Data
This data simulates the map accuracy results obtained from a stratified random sampling. This validation procedure is a modification of the Olofsson et al., 2014 recommendations, in which a buffer stratum is used to try to contain omission errors in the rarest classes (i.e., deforestation), following recommendations by Olofsson et al., 2020 and Arévalo et al., 2021.
The two datasets you will need to obtain the validation main results are: area estimates (obtained from cell counting in the classification) and results obtained from the stratified random sampling indicating the map (i.e., classified) and reference (i.e., visual interpretation of field data) classes.
areas2 <- tibble(Clase = c("Forest loss", "Perm Forest", "Perm Non-forest", "Buff Perm Forest", "Buff Perm Non-forest"),
ha = c(5, 1950, 8000, 50, 25))
df <- tibble(Map = c(rep("Forest loss", 50),
rep("Perm Forest", 360),
rep("Perm Non-forest", 90),
rep("Buff Perm Forest", 50),
rep("Buff Perm Non-forest", 25)),
Reference = c(rep("Forest loss", 43),
rep("Perm Non-forest", 2),
rep("Perm Forest", 5),
rep("Perm Non-forest", 10),
rep("Perm Forest", 350),
rep("Perm Non-forest", 81),
rep("Perm Forest", 9),
rep("Buff Perm Forest", 48),
rep("Forest loss", 2),
rep("Buff Perm Non-forest", 25)))
Process
Convert area estimates to a vector with names and calculate the total area.
areas <- areas2$ha
names(areas) <- areas2$Clase
totalarea <- sum(areas2$ha)
Then, let’s calculate the map accuracy estimates using Olofsson et al., 2014 equations.
resul <- olofsson(df$Reference, df$Map, Nh = areas)
Here, the results object contains estimates such as: Overall accuracy, User’s accuracy, Producer’s accuracy, unbiased area estimates (as proportion), Standard error of the accuracies (overall, user’s and producer’s) and area estimates, and the matrix expressed in area weights.
resul
# $OA
# [1] 0.9145696
#
# $UA
# Forest loss Perm Forest Perm Non-forest Buff Perm Forest
# 0.8600000 0.9722222 0.9000000 0.9600000
# Buff Perm Non-forest
# 1.0000000
#
# $PA
# Forest loss Perm Forest Perm Non-forest Buff Perm Forest
# 0.6825397 0.7031153 0.9925057 1.0000000
# Buff Perm Non-forest
# 1.0000000
#
# $area
# Forest loss Perm Forest Perm Non-forest Buff Perm Forest
# 0.0006281157 0.2688268528 0.7232668661 0.0047856431
# Buff Perm Non-forest
# 0.0024925224
#
# $SEoa
# [1] 0.02542024
#
# $SEua
# Forest loss Perm Forest Perm Non-forest Buff Perm Forest
# 0.049569576 0.008673299 0.031799936 0.027994168
# Buff Perm Non-forest
# 0.000000000
#
# $SEpa
# Forest loss Perm Forest Perm Non-forest Buff Perm Forest
# 0.152157323 0.066365204 0.002328898 0.000000000
# Buff Perm Non-forest
# 0.000000000
#
# $SEa
# Forest loss Perm Forest Perm Non-forest Buff Perm Forest
# 0.0001417231 0.0254198567 0.0254198515 0.0001395522
# Buff Perm Non-forest
# 0.0000000000
#
# $matrix
# Forest loss Perm Forest Perm Non-forest Buff Perm Forest
# Forest loss 0.0004287139 4.985045e-05 1.994018e-05 NA
# Perm Forest NA 1.890163e-01 5.400465e-03 NA
# Perm Non-forest NA 7.976072e-02 7.178465e-01 NA
# Buff Perm Forest 0.0001994018 NA NA 0.004785643
# Buff Perm Non-forest NA NA NA NA
# sum 0.0006281157 2.688269e-01 7.232669e-01 0.004785643
# Buff Perm Non-forest sum
# Forest loss NA 0.0004985045
# Perm Forest NA 0.1944167498
# Perm Non-forest NA 0.7976071785
# Buff Perm Forest NA 0.0049850449
# Buff Perm Non-forest 0.002492522 0.0024925224
# sum 0.002492522 1.0000000000
Afterward, you need to sum some area estimates and errors to merge the buffer classes with the total classes (e.g., Buff Perm Forest with Perm Forest). And calculate the lower and upeer limits of the unbiased area estimates, assuming a normal distribution. The classes you need to sum will vary depending on the sampling design used to validate the map.
exp_df <- tibble(clase = names(resul$area),
area = resul$area * totalarea,
SEa = resul$SEa * totalarea)
# Sum errors
exp_df$areaSum <- 0
exp_df$SEaSum <- 0
# Perm Forest
exp_df$areaSum[2] <- exp_df$area[2] + exp_df$area[5]
exp_df$SEaSum[2] <- exp_df$SEa[2] + exp_df$SEa[5]
# Perm Non-forest
exp_df$areaSum[3] <- exp_df$area[3] + exp_df$area[4]
exp_df$SEaSum[3] <- exp_df$SEa[3] + exp_df$SEa[4]
# Forest loss
exp_df$areaSum[1] <- exp_df$area[1]
exp_df$SEaSum[1] <- exp_df$SEa[1]
exp_df |>
slice_head(n = 3) |>
mutate(LIC = areaSum - 1.96 * SEaSum,
UIC = areaSum + 1.96 * SEaSum)
And you get your unbiased area estimates with a confidence interval.
# A tibble: 3 × 7
# clase area SEa areaSum SEaSum LIC UIC
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Forest loss 6.3 1.42 6.3 1.42 3.51 9.09
# 2 Perm Forest 2696. 255. 2721. 255. 2222. 3221.
# 3 Perm Non-forest 7254. 255. 7302. 256. 6800. 7805.
References
Arévalo, P., Olofsson, P., & Woodcock, C. E. (2020). Continuous monitoring of land change activities and post-disturbance dynamics from Landsat time series: A test methodology for REDD+ reporting. Remote Sensing of Environment, 238, 111051. https://doi.org/10.1016/j.rse.2019.01.013
Olofsson, P., Foody, G. M., Herold, M., Stehman, S. V., Woodcock, C. E., & Wulder, M. A. (2014). Good practices for estimating area and assessing accuracy of land change. Remote Sensing of Environment, 148, 42–57. https://doi.org/10.1016/j.rse.2014.02.015
Olofsson, P., Arévalo, P., Espejo, A. B., Green, C., Lindquist, E., McRoberts, R. E., & Sanz, M. J. (2020). Mitigating the effects of omission errors on area and area change estimates. Remote Sensing of Environment, 236. https://doi.org/10.1016/j.rse.2019.111492