Title: | Raster-Based Spatial Stratification Algorithms |
---|---|
Description: | Algorithms for the spatial stratification of landscapes, sampling and modeling of spatially-varying phenomena. These algorithms offer a simple framework for the stratification of geographic space based on raster layers representing landscape factors and/or factor scales. The stratification process follows a hierarchical approach, which is based on first level units (i.e., classification units) and second-level units (i.e., stratification units). Nonparametric techniques allow to measure the correspondence between the geographic space and the landscape configuration represented by the units. These correspondence metrics are useful to define sampling schemes and to model the spatial variability of environmental phenomena. The theoretical background of the algorithms and code examples are presented in Fuentes, Dorantes, and Tipton (2021). <doi:10.31223/X50S57>. |
Authors: | Bryan A. Fuentes [aut, cre] , Minerva J. Dorantes [aut] , John R. Tipton [aut] , Robert J. Hijmans [ctb] , Andrew G. Brown [ctb] |
Maintainer: | Bryan A. Fuentes <[email protected]> |
License: | AGPL (>= 3) |
Version: | 1.0.5 |
Built: | 2024-11-24 03:26:32 UTC |
Source: | https://github.com/bafuentes/rassta |
Given a SpatRaster whose cell values represent categories (e.g., soil types, land use/cover classes), a dummy layer indicating the presence/absence of each category is created, and optionally written on disk. Each category in the raster layer must be represented by a unique integer cell value. Output values indicating the presence or absence of categories in the dummy layers may be set using preval and absval arguments, respectively.
dummies( ca.rast, vals = NULL, preval = 100, absval = 0, to.disk = FALSE, outdir = ".", extension = ".tif", ... )
dummies( ca.rast, vals = NULL, preval = 100, absval = 0, to.disk = FALSE, outdir = ".", extension = ".tif", ... )
ca.rast |
SpatRaster, as in |
vals |
Integer. Vector or sequence of values representing the categories for which dummy layers will be created. If NULL, all categories will be processed. Default: NULL |
preval |
Integer. Value indicating presence of category. Default: 100 |
absval |
Integer. Value indicating absence of category. Default: 0 |
to.disk |
Boolean. Write output dummy layers to disk? Default: FALSE |
outdir |
Character. If to.disk = TRUE, string specifying the path for the output dummy raster layers. Default: "." |
extension |
Character. If to.disk = TRUE, String specifying the extension for the output raster layers (see Details). Default: ".tif" |
... |
If to.disk = TRUE, additional arguments as for
|
This function mainly differs from segregate
in
that presence and absence values can be set arbitrarily, thus allowing for
values other than those traditionally used in one-hot/dummy encoding (i.e.,
1 and 0).
Please note that the argument extension does not correspond to the
argument filetype in writeRaster
. However,
writeRaster
should recognize the appropriate extension
for the output raster layers from the extension argument. For
instance, by setting extension = ".tif",
writeRaster
will recognize the extension as
GeoTiff, which is the GDAL driver name.
Single-layer or multi-layer SpatRaster with dummy layer(s).
Other Miscellaneous Functions:
figure()
,
plot3D()
require(terra) p <- system.file("exdat", package = "rassta") # Single-layer SpatRaster of geologic units f <- list.files(path = p, pattern = "geology2.tif", full.names = TRUE) geol <- terra::rast(f) # Dummy layer from geologic unit 1 dums <- dummies(ca.rast = geol, vals = 1, preval = 100, absval = 0)
require(terra) p <- system.file("exdat", package = "rassta") # Single-layer SpatRaster of geologic units f <- list.files(path = p, pattern = "geology2.tif", full.names = TRUE) geol <- terra::rast(f) # Dummy layer from geologic unit 1 dums <- dummies(ca.rast = geol, vals = 1, preval = 100, absval = 0)
Modeling of spatially varying phenomena based on landscape similarity to stratification units. If each stratification unit across geographic space represents a distinct landscape configuration (in terms of multiple landscape factors and/or factor scales), and if each landscape configuration influences a phenomenon in a distinct way, then the spatial variability of that phenomenon can be assessed across a landscape by relating each geographic location to each distinct landscape configuration. Therefore, the more similar a geographic location is to the landscape configuration represented by a given stratification unit, then also the more similar the response of a phenomenon will be at that location to the typical response for conditions within the given stratification unit. Both continuous and categorical response variables are supported. For categorical responses, each category must be identified by an integer value.
engine( res.type = "cont", ls.rast, n.win = 3, su.repobs, tiles, parallel = FALSE, outdir = ".", tile.rm = TRUE, extension = ".tif", verbose = FALSE, ... )
engine( res.type = "cont", ls.rast, n.win = 3, su.repobs, tiles, parallel = FALSE, outdir = ".", tile.rm = TRUE, extension = ".tif", verbose = FALSE, ... )
res.type |
Character. Type of response to model. Options are "cont" for continuous, and "cat" for categorical response. Default: "cont" |
ls.rast |
SpatRaster, as in |
n.win |
Integer. Positive number indicating how many winning stratification units should be considered. See Details. Default: 3 |
su.repobs |
Data frame. The first column of this data frame must contain only the numeric code for the stratification units (without prefix). Each additional column must contain the value of the representative response observation for each stratification unit. Multiple response variables are allowed (one per column). Note that all response variables in the data frame must share the same type (res.type). See example on issues related to non-explicit column names. |
tiles |
SpatVector, as in |
parallel |
Boolean. Perform parallel processing? A parallel backend
needs to be registered beforehand with
|
outdir |
Character. String specifying the path for the output raster tiles/layer(s) of modeled response(s). Default: "." |
tile.rm |
Boolean. Should the tiles of modeled response(s) be removed from disk after the tile merging process? Default: TRUE |
extension |
Character. String specifying the extension for the output raster layer(s) of modeled response(s). Default: ".tif" |
verbose |
Boolean. Show warning messages in the console? Default: FALSE |
... |
Additional arguments as for |
The predictive modeling process is cell-wise, which means that it operates on a cell-by-cell basis. For a given cell occurring in the geographic space supported by a raster layer, the predictive modeling engine first identifies the n stratification units to which the given cell is most similar (i.e., 'winning stratification units'). The engine is able to identify the winning stratification units thanks to the user-provided set of landscape similarity layers ls.rast. Subsequently, the response value from the representative observation for each winning stratification unit is identified. In the case of a continuous response, a weighted average of representative response values is performed. For each representative response value, the weight is proportional to the corresponding stratification unit's landscape similarity value in the given cell. The result of the weighted average is assigned as the response value in the given cell. In the case of a categorical response, the modal value from the representative response values of the n winning stratification units is assigned to the given cell.
Note that the name for each raster layer in ls.rast should match the
numeric code of the corresponding stratification unit, which is obtained from
the column of numeric codes in su.repobs. Nevertheless, raster layer
names in ls.rast with a character prefix in the numeric code and/or
file extension should work fine (e.g., "su_1101.tif" instead of "1101"). If
the landscape similarity layers in ls.rast were created with
similarity
, then raster layer names will not have any prefix
nor extension as part of the numeric code.
When dealing with large geographic spaces, high raster resolutions (i.e., small cell sizes), or both, a considerable amount of memory is required to perform the modeling process. To reduce memory usage, the predictive modeling engine performs tile-based processing of landscape similarity layers and writes results directly on disk. Tile-based processing increases the computational time, thus parallelization is allowed by setting up a parallel backend. If parallelization is enabled, then care must be taken with the size of the tiles since larger sizes will have a greater impact on memory usage. Consequently, the parallel, tile-based processing will be less useful.
Multi-layer or single-layer SpatRaster with modeled response(s).
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of landscape similarities fls <- list.files(path = p, pattern = "su_", full.names = TRUE) ls <- terra::rast(fls) # Numeric code and representative response value for stratification units fro <-list.files(path = p, pattern = "repobs.csv", full.names = TRUE) ro <- read.csv(fro) # Extract only those stratification units with representative value ls <- ls[[as.character(ro$SU)]] # SpatVector with processing tiles fti <- list.files(path = p, pattern = "tiles.shp", full.names = TRUE) ti <- terra::vect(fti) # Directory for temporary files o <- tempdir() # Perform predictive modeling of continuous response r <- engine(res.type = "cont", ls.rast = ls, n.win = 2, su.repobs = ro, tiles = ti, outdir = o, overwrite = TRUE ) # Plot modeled response if(interactive()){plot(r)} # Clean temporary files file.remove(list.files(path = o, pattern = "soc.tif", full.names = TRUE)) # #------- # A note on non-explicit response's names (obtained from su.repobs): ## This will result in incorrectly modeled response values x <- c("SOM", "SOM_30cm", "SOM_45cm") # SOM = soil organic matter grep(x[1], x) # Non explicit grep(x[2], x) # Explicit grep(x[3], x) # Explicit ## This will result in correct values x <- c("SOM_15cm", "SOM_30cm", "SOM_45cm") grep(x[1], x) # Explicit grep(x[2], x) # Explicit grep(x[3], x) # Explicit
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of landscape similarities fls <- list.files(path = p, pattern = "su_", full.names = TRUE) ls <- terra::rast(fls) # Numeric code and representative response value for stratification units fro <-list.files(path = p, pattern = "repobs.csv", full.names = TRUE) ro <- read.csv(fro) # Extract only those stratification units with representative value ls <- ls[[as.character(ro$SU)]] # SpatVector with processing tiles fti <- list.files(path = p, pattern = "tiles.shp", full.names = TRUE) ti <- terra::vect(fti) # Directory for temporary files o <- tempdir() # Perform predictive modeling of continuous response r <- engine(res.type = "cont", ls.rast = ls, n.win = 2, su.repobs = ro, tiles = ti, outdir = o, overwrite = TRUE ) # Plot modeled response if(interactive()){plot(r)} # Clean temporary files file.remove(list.files(path = o, pattern = "soc.tif", full.names = TRUE)) # #------- # A note on non-explicit response's names (obtained from su.repobs): ## This will result in incorrectly modeled response values x <- c("SOM", "SOM_30cm", "SOM_45cm") # SOM = soil organic matter grep(x[1], x) # Non explicit grep(x[2], x) # Explicit grep(x[3], x) # Explicit ## This will result in correct values x <- c("SOM_15cm", "SOM_30cm", "SOM_45cm") grep(x[1], x) # Explicit grep(x[2], x) # Explicit grep(x[3], x) # Explicit
This function is intended to reproduce the figures presented in rassta: Raster-based Spatial Stratification Algorithms (Fuentes et al., 2021). Note that this function assumes that all the necessary inputs for each figure are loaded in the working environment. For the creation of each input, please refer to the data and examples presented in the aforementioned work. Also, please note that the use of this function is not intended for RStudio.
figure(x, d, scaling = 100, to.disk = FALSE, verbose = FALSE)
figure(x, d, scaling = 100, to.disk = FALSE, verbose = FALSE)
x |
Integer. Number identifying the figure to reproduce. |
d |
List. List with the data required for the figure to reproduce. |
scaling |
Integer. This number scales (i.e., resizes) the R's plotting device, such that width = x/scaling & height = x/scaling, with x = pixels. The default pixel size (not adjustable) and scaling value should work fine. Default = 100 |
to.disk |
Boolean. Save figure to disk? Default = FALSE |
verbose |
Boolean. Show warning messages in the console? Default: FALSE |
None
B.A. Fuentes, M.J. Dorantes, and J.R. Tipton. rassta: Raster-based Spatial Stratification Algorithms. EarthArXiv, 2021. doi:10.31223/X50S57
Other Miscellaneous Functions:
dummies()
,
plot3D()
if(interactive()){ require(terra) p <- system.file("exdat", package = "rassta") # Single-layer SpatRaster of geologic units f <- list.files(path = p, pattern = "geology.tif", full.names = TRUE) geol <- terra::rast(f) # Dummy layers from geologic units mat.sig <- dummies(ca.rast = geol, preval = 100, absval = 0) figure(17, d = mat.sig) }
if(interactive()){ require(terra) p <- system.file("exdat", package = "rassta") # Single-layer SpatRaster of geologic units f <- list.files(path = p, pattern = "geology.tif", full.names = TRUE) geol <- terra::rast(f) # Dummy layers from geologic units mat.sig <- dummies(ca.rast = geol, preval = 100, absval = 0) figure(17, d = mat.sig) }
Selection of the representative sampling locations based on landscape similarity values. For a give stratification unit, the representative sampling location is the XY position where the highest landscape similarity value occurs. This location is assumed to best reflect the influence that the landscape configuration of a given stratification unit exerts on response phenomena. Currently, two selection methods are supported: (1) maximum similarity within buffer zones ("buffer"), and (2) absolute maximum similarity ("absolute"). For the buffer method, the n largest zones enclosing landscape similarity values above a certain threshold are first identified. Then, for each zone, one sample is placed at the XY location where the landscape similarity value is maximized. For the absolute method, a sample is placed at the XY locations with the n maximum landscape similarity values. In both methods, it is possible to constrain the sampling process to the boundaries of the stratification unit. Constraining the process ensures that the sampling locations determined for a given unit are placed within the boundaries of that unit. See Details for some guidance in the use of this function for classification units.
locations( ls.rast, su.rast, method = "buffer", constrained = TRUE, buf.quant = 0.9, buf.n = 1, abs.n = 1, tol = 1, parallel = FALSE, to.disk = FALSE, outdir = ".", verbose = FALSE, ... )
locations( ls.rast, su.rast, method = "buffer", constrained = TRUE, buf.quant = 0.9, buf.n = 1, abs.n = 1, tol = 1, parallel = FALSE, to.disk = FALSE, outdir = ".", verbose = FALSE, ... )
ls.rast |
SpatRaster, as in |
su.rast |
SpatRaster. Single-layer SpatRaster representing the stratification units occurring across geographic space. Integer values are expected as cell values (i.e., numeric codes) of stratification units. |
method |
Character. String denoting the sampling method. Current options are "buffer" for the maximum similarity within buffer zones method, and "absolute" for the absolute maximum similarity method. Default: "buffer" |
constrained |
Boolean. Should the sampling process be constrained to the boundaries of each stratification unit? See Details. Default: TRUE |
buf.quant |
Numeric. Number expressed in quantile notation (0-1) indicating the similarity threshold for the creation of buffer zones. Only zones enclosing raster cells with landscape similarity >= buf.quant will be created and thus, considered for sampling. See Details. Default: 0.9 |
buf.n |
Integer. Positive integer indicating the n largest buffer zones for which sampling locations will be selected (n buffer zones per stratification unit, one sampling location per buffer zone). Default: 1 |
abs.n |
Integer. When method = "absolute", Positive integer indicating the number of sampling locations to select for each stratification unit. See Details. Default: 1 |
tol |
Numeric. When method = "absolute", this number will be subtracted from the sampled maximum value of a landscape similarity layer to ensure that the requested number of sampling locations will be found (see Details). The default assumes that landscape similarity values are on a scale of 1 to 100. If these values are on a different scale (e.g., decimal), then, tol needs to be adjusted accordingly. Default: 1 |
parallel |
Boolean. Perform parallel processing? A parallel backend must
be registered beforehand with |
to.disk |
Boolean. Should output SpatVector(s) (as in
|
outdir |
Character. If to.disk = TRUE, string specifying the path for the output SpatVector(s). Default: "." |
verbose |
Boolean. Show warning messages in the console? Default: FALSE |
... |
Additional arguments, as for |
Except when buf.n = 1 or abs.n = 1, the number of returned sampling locations per stratification unit may be smaller than requested, especially when constrained = TRUE. For the constrained buffer method, reducing the landscape similarity threshold value buf.quant will not always result in more buffer zones; i.e., more sampling locations. The reason for this is that reducing the threshold value for the creation of buffer zones may actually promote the spatial contiguity of zones. For instance, two buffer zones created at buf.quant = 0.9, may be merged into a single buffer zone when buf.quant = 0.80. This will occur if the raster cells between the two buffer zones satisfy: landscape similarity >= quantile(landscape similarity, 0.8). For the absolute method, increasing the value of the tol argument will ensure a safer search for n sampling locations and thus, greater chances of getting the total number of requested sampling locations per stratification unit.
Note that this sampling scheme can be applied for classification units. In order to do this, one should replace the multi-layer SpatRaster of landscape similarities with a multi-layer SpatRaster of spatial signatures. One should also replace the raster layer of stratification units with that of classification units.
If method = "buffer" and constrained = TRUE, a list with the following components:
locations: SpatVector of point geometry. Each point in this vector represents the sampling location placed at the maximum landscape similarity value within a stratification unit's buffer zone. Tabular attributes in this SpatVector are (1) SU = stratification unit's numeric code, (2) land_sim = landscape similarity value, (3) x = X coordinate, and (4) y = Y coordinate.
buffer: SpatVector of polygon geometry. Each polygon in this vector represents the buffer zone of an stratification unit.
If method = "buffer" and constrained = FALSE:
locations: Same as locations from method = "buffer" and constrained = TRUE.
If method = "absolute":
locations: SpatVector of point geometry. Each point in this vector represents the sampling location placed at the maximum landscape similarity value for an stratification unit. Tabular attributes in this SpatVector are (1) SU = stratification unit's numeric code, (2) land_sim = landscape similarity value, (3) x = X coordinate, and (4) y = Y coordinate.
Other Functions for Stratified Sampling:
observation()
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of landscape similarities fls <- list.files(path = p, pattern = "su_", full.names = TRUE) ls <- terra::rast(fls) # Single-layer SpatRaster of stratification units fsu <- list.files(path = p, pattern = "strata.tif", full.names = TRUE) su <- terra::rast(fsu) # Get 1 representative sampling location per stratification unit rl <- locations(ls.rast = ls, su.rast = su) # Plot representative locations (including buffer areas) if(interactive()){ plot(su, type = "classes", fun = function() c(points(rl$locations), polys(rl$buffers)) )}
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of landscape similarities fls <- list.files(path = p, pattern = "su_", full.names = TRUE) ls <- terra::rast(fls) # Single-layer SpatRaster of stratification units fsu <- list.files(path = p, pattern = "strata.tif", full.names = TRUE) su <- terra::rast(fsu) # Get 1 representative sampling location per stratification unit rl <- locations(ls.rast = ls, su.rast = su) # Plot representative locations (including buffer areas) if(interactive()){ plot(su, type = "classes", fun = function() c(points(rl$locations), polys(rl$buffers)) )}
Selection of the representative response observation for each stratification unit occurring across geographic space. One observation from a set of n observations of a response variable sampled/measured within the spatial boundaries of a given stratification unit is selected according to the following criteria: (1) maximum landscape similarity, (2) median response value, and (3) random selection. The maximum landscape similarity (mls) selects the observation at the XY spatial location where the landscape similarity value is maximized for a given stratification unit. The median response value (mrv) selects the observation whose response value is (closest to) the median of all response values, as measured from the observations spatially enclosed by a given stratification unit. The random selection, as implied by the name, randomly selects one observation from the set of observations spatially enclosed by a given stratification unit. See Details for some guidance in the use of this function for classification units.
observation( su.rast, obs, col.id, col.resp, method = "mls", ls.rast, verbose = FALSE )
observation( su.rast, obs, col.id, col.resp, method = "mls", ls.rast, verbose = FALSE )
su.rast |
SpatRaster, as in |
obs |
SpatVector, as in |
col.id |
Integer. Index of the ID column in the tabular attributes of obs. |
col.resp |
Integer. Index of the response' values column in the tabular attributes of obs. |
method |
Character. String specifying the selection method for the response representative observation. Options are "mls" for the maximum landscape similarity value, "mrv" for the median response value, and "random" for random selection. Default: "mls" |
ls.rast |
SpatRaster, as in |
verbose |
Boolean. Show warning messages in the console? Default: FALSE |
This selection scheme can be applied to classification units. For classification units, one should replace the multi-layer SpatRaster of landscape similarities with a multi-layer SpatRaster of spatial signatures. One should also replace the raster layer of stratification units with that of classification units.
A list with the following components:
su.repobs: Data table with the following attributes: (1) Original IDs of the selected observation, (2) representative response value, (3) stratification unit's numeric code, and (4) landscape similarity value at the XY location of the selected observation (only if method = "mls").
su.norepobs: List of the numeric codes of stratification units without observations.
su.repobs.sp: SpatVector of point geometry with the representative response observation for each stratification unit represented in su.rast.
Other Functions for Stratified Sampling:
locations()
require(terra) p <- system.file("exdat", package = "rassta") # Single-layer SpatRaster of stratification units fsu <- list.files(path = p, pattern = "strata.tif", full.names = TRUE) su <- terra::rast(fsu) # Observations with response values. ## For this example, soil organic carbon (SOC) collected at 15 cm soil depth fob <- list.files(path = p, pattern = "soc.shp", full.names = TRUE) ob <- terra::vect(fob) # Column indices for ID and measured response value id <- 1 re <- 2 # Multi-layer SpatRaster of landscape similarities fls <- list.files(path = p, pattern = "su_", full.names = TRUE) ls <- terra::rast(fls) # Selection of representative response observations for stratification units ro <- observation(su.rast = su, obs = ob, col.id = id, col.resp = re, ls.rast = ls ) # Plot representative observations if(interactive()){plot(su, type = "classes", fun = function() points(ro$su_repobs.sp) )}
require(terra) p <- system.file("exdat", package = "rassta") # Single-layer SpatRaster of stratification units fsu <- list.files(path = p, pattern = "strata.tif", full.names = TRUE) su <- terra::rast(fsu) # Observations with response values. ## For this example, soil organic carbon (SOC) collected at 15 cm soil depth fob <- list.files(path = p, pattern = "soc.shp", full.names = TRUE) ob <- terra::vect(fob) # Column indices for ID and measured response value id <- 1 re <- 2 # Multi-layer SpatRaster of landscape similarities fls <- list.files(path = p, pattern = "su_", full.names = TRUE) ls <- terra::rast(fls) # Selection of representative response observations for stratification units ro <- observation(su.rast = su, obs = ob, col.id = id, col.resp = re, ls.rast = ls ) # Plot representative observations if(interactive()){plot(su, type = "classes", fun = function() points(ro$su_repobs.sp) )}
Interactive maps showing the 3-dimensional (XYZ) variability in raster
layers representing continuous variables. The XYZ reference positions
will be obtained from an elevation layer and the values of the continuous
variables will be used as a surface color gradient. For this function to
work, there must be a raster layer of elevation (e.g., digital terrain model)
and at least one continuous variable among the raster layers to map. The maps
produced are interactive, meaning that manual axis rotation and zoom are
possible. Special consideration must be taken with large raster layers (large
spatial coverage and/or high spatial resolution). This function can
aggregates the spatial resolution (i.e., cell size) in order to handle large
raster layers. This is achieved by internally calling
aggregate
. An aggregation factor will determine the
final cell size, where final cell size = cell size*aggregation factor.
In addition, a spatial extent can be provided to reduce the total mapping
area and thus, to further reduce processing time. This function uses the
plotly library. See Details for current limitations.
plot3D( var.rast, z, ex = 0.1, agg = FALSE, fact = NULL, spext = NULL, pals = NA, rev = NA )
plot3D( var.rast, z, ex = 0.1, agg = FALSE, fact = NULL, spext = NULL, pals = NA, rev = NA )
var.rast |
SpatRaster, as in |
z |
Integer. Position (index) of the raster layer of elevation in var.rast. |
ex |
Numeric. Value indicating the exaggeration factor for the Z axis. This can be useful to enhance the visualization of subtle topographic variability. Default: 0.1 |
agg |
Boolean. Should the spatial resolution be aggregated to reduce processing time? Default: FALSE |
fact |
Numeric. If agg = TRUE, value indicating the aggregation factor. Default: NULL |
spext |
Numeric. List with the coordinates of the bounding box for spatial subset (xmin, xmax, ymin, ymax). SpatRaster or SpatVector from which a spatial extent can be calculated are also an acceptable input. Default: NULL |
pals |
Character. List of strings with the names of the n color
ramps (one per continuous variable). See
|
rev |
Character. List of n Booleans indicating whether or not to reverse the color ramp for each continuous variable. Default: NA |
Currently, this function does not allow to adjust the labels for XY axes so that actual coordinates are shown. Instead, the relative position values are shown on these axes.
List with plotly-htmlwidget objects. Each object calls the 3D map for a continuous variable in var.rast.
Other Miscellaneous Functions:
dummies()
,
figure()
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of topographic variables ft <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) tvars <- terra::rast(ft) # Single-layer SpatRaster of terrain elevation fe <- list.files(path = p, pattern = "^elevation", full.names = TRUE) e <- terra::rast(fe) # Add elevation to the SpatRaster of topographic variables etvars <- c(e, tvars) # Interactive 3D maps maps <- plot3D(var.rast = etvars, z = 1, ex = 0.2, pals = c("Zissou", "Plasma", "Spectral") ) if(interactive()){maps}
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of topographic variables ft <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) tvars <- terra::rast(ft) # Single-layer SpatRaster of terrain elevation fe <- list.files(path = p, pattern = "^elevation", full.names = TRUE) e <- terra::rast(fe) # Add elevation to the SpatRaster of topographic variables etvars <- c(e, tvars) # Interactive 3D maps maps <- plot3D(var.rast = etvars, z = 1, ex = 0.2, pals = c("Zissou", "Plasma", "Spectral") ) if(interactive()){maps}
Predicts constrained, univariate distribution functions across the geographic space supported by raster layers. For a given continuous variable used to create a classification unit, this function first calculates a user-defined distribution function for that variable using only observations selected from within the classification unit. In this way, the distribution function is univariate and constrained. Subsequently, a locally-estimated scatterplot smoothing (LOESS) or a generalized additive model (GAM) is fitted. This model is fitted using the variable’s observations as explanatory values and the values from the distribution function as the response values. Finally, the fitted model is predicted on the complete geographic space supported by the raster layer of the given variable. This process is iterated for all of the continuous variables and classification units. Each resulting layer can be thought of as a landscape correspondence measurement between an XY location in geographic space and the landscape configuration represented by a given classification unit in terms of a specific variable. The following distribution functions are currently supported: the probability density function (PDF), the empirical cumulative density function (ECDF), and the inverse of the empirical cumulative density function (iECDF). Please refer to Details for more information about how each distribution function is calculated. Also, see details on parallel processing.
predict_functions( cuvar.rast, cu.ind, cu, vars, dif, hist.type = "regular", hist.pen = "default", grid.mult = 1, kern = "normal", quant.sep = 0.01, method = "loess", span = 0.6, k = 20, discrete = TRUE, to.disk = FALSE, outdir = ".", prefix = "", extension = ".tif", overwrite = FALSE, verbose = FALSE, ... )
predict_functions( cuvar.rast, cu.ind, cu, vars, dif, hist.type = "regular", hist.pen = "default", grid.mult = 1, kern = "normal", quant.sep = 0.01, method = "loess", span = 0.6, k = 20, discrete = TRUE, to.disk = FALSE, outdir = ".", prefix = "", extension = ".tif", overwrite = FALSE, verbose = FALSE, ... )
cuvar.rast |
SpatRaster, as in |
cu.ind |
Integer. Position (index) of the raster layer of classification units in cuvar.rast. |
cu |
Integer. Vector of integer values that correspond to the numeric identifiers of the units in the raster layer of classification units. |
vars |
Character. Vector of strings containing the names of the n continuous variables in cuvar.rast. These names have to be sequentially repeated according to the number of classification units (See Examples). |
dif |
Character. Vector of strings containing the distribution function to calculate for each continuous variable within each classification unit. The function will match the position of the name of the distribution function with that of the name of the continuous variable in vars. |
hist.type |
Character. Type of histogram to calculate. Options are
"regular", "irregular" (unequally-sized bins, very slow), and "combined"
(the one with greater penalized likelihood is returned). See
|
hist.pen |
Character. Penalty to apply when calculating the histogram
(see |
grid.mult |
Numeric. Multiplying factor to increase/decrease the size of the "optimal" grid size for the Kernel Density Estimate (KDE). Default: 1 |
kern |
Character. Type of kernel to use for the KDE. Default: "normal" |
quant.sep |
Numeric. Spacing between quantiles for the calculation of the ECDF and iECDF. Quantiles are in the range of 0-1 thus spacing must be a decimal. Default: 0.01 |
method |
Character. Model to fit. Current options are "loess" for
locally-estimated scatterplot smoothing (see |
span |
Numeric. If method = "loess", degree of smoothing for LOESS fitting. Default: 0.6 |
k |
Numeric. If method = "gam", Number of knots for the cubic regression splines. Default: 20 |
discrete |
Boolean. If method = "gam", discretize variables for storage and efficiency reasons? Can reduce processing time significantly. Default: TRUE |
to.disk |
Boolean. Write the output raster layers of predicted distribution function to disk? See details an example about parallel processing. Default: FALSE |
outdir |
Character. If to.disk = TRUE, string specifying the path for the output raster layers of predicted distribution function. Default: "." |
prefix |
Character. If to.disk = TRUE, string specifying a prefix for the file names of the output raster layers of predicted distribution function. Default: "" |
extension |
Character. If to.disk = TRUE, string specifying the extension for the output raster layers of predicted distribution function. Default: ".tif" |
overwrite |
Boolean. If to.disk = TRUE, should raster layers in disk and with same name as the output raster layer(s) of predicted distribution function be overwritten? Default: FALSE |
verbose |
Boolean. Show warning messages in the console? Default: FALSE |
... |
If to.disk = TRUE, additional arguments as for
|
To calculate the PDF, this function uses the binned KDE for observations
drawn from the breaks of a regular/irregular histogram. The "optimal" number
of bins for the histogram is defined by calling the function
histogram
(Mildenberger et al., 2019) with the
user-defined penalty hist.pen. Subsequently, the optimal number of
bins is treated as equivalent to the "optimal" grid size for the binned KDE.
The grid size can be adjusted by specifying the multiplying factor
grid.mult. Lastly, the "optimal" bandwidth for the binned KDE is
calculated by applying the direct plugin method of Sheather and Jones
(1991). For the calculation of optimal bandwidth and for the binned KDE, the
package KernSmooth is called. To calculate both the ECDF and the
iECDF, this function calls the ecdf
function on
equally-spaced quantiles. The spacing between quantiles can be manually
adjusted via quant.sep. In the case of iECDF, the ECDF is inverted by
applying the formula: iECDF = ((x - max(ECDF)) * -1) + min(ECDF);
where x corresponds to each value of the ECDF.
The "cu", "vars", and "dif" parameters of this function are configured such
that the output table from select_functions
can be used
directly as input. (see Examples).
When writing output raster layer to disk, multiple distribution functions can
be predicted in parallel if a parallel backend is registered beforehand with
registerDoParallel
. Keep in mind that the function
may require a large amount of memory when using a parallel backend with large
raster layers (i.e., high resolution and/or large spatial coverage).
Some issues have been reported when manually creating cluster objects using
the parallel package. To overcome this issue, a cluster object can be
registered directly through registerDoParallel
without passing it first through makeCluster
. See
examples.
Single-layer or multi-layer SpatRaster with the predicted distribution function for each variable and for each classification unit.
T. Mildenberger, Y. Rozenholc, and D. Zasada. histogram: Construction of Regular and Irregular Histograms with Different Options for Automatic Choice of Bins, 2019. https://CRAN.R-project.org/package=histogram
S. Sheather and M. Jones. A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B. Methodological, 53:683–690, 1991.
Other Landscape Correspondence Metrics:
select_functions()
,
signature
,
similarity()
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of topographic variables ## 3 continuous variables ftva <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) tva <- terra::rast(ftva) # Single-layer SpatRaster of topographic classification units ## Five classification units ftcu <- list.files(path = p, pattern = "topography.tif", full.names = TRUE) tcu <- terra::rast(ftcu) # Add the classification units to the SpatRaster of topographic variables tcuvars <- c(tcu, tva) # Data frame with source for "cu", "vars", and "dif" ftdif <- list.files(path = p, pattern = "topodif.csv", full.names = TRUE) tdif <- read.csv(ftdif) # Check structure of source data frame head(tdif) # Predict distribution functions ## 1 distribution function per variable and classification unit = 1 tpdif <- predict_functions(cuvar.rast = tcuvars, cu.ind = 1, cu = tdif$Class.Unit[1:3], vars = tdif$Variable[1:3], dif = tdif$Dist.Func[1:3], grid.mult = 3, span = 0.9 ) # Plot predicted distribution functions if(interactive()){plot(tpdif, col = hcl.colors(100, "Oslo", rev = TRUE))} #-------- # Writing results to disk and parallel processing if(interactive()){ # Directory for temporary files o <- tempdir() # Register parallel backend require(doParallel) registerDoParallel(4) # Predict distribution functions tpdif <- predict_functions(cuvar.rast = tcuvars, cu.ind = 1, cu = tdif$Class.Unit[1:3], vars = tdif$Variable[1:3], dif = tdif$Dist.Func[1:3], grid.mult = 3, span = 0.9, to.disk = TRUE, outdir = o ) # Stop cluster stopImplicitCluster() # Clean temporary files file.remove(sources(tpdif)) }
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of topographic variables ## 3 continuous variables ftva <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) tva <- terra::rast(ftva) # Single-layer SpatRaster of topographic classification units ## Five classification units ftcu <- list.files(path = p, pattern = "topography.tif", full.names = TRUE) tcu <- terra::rast(ftcu) # Add the classification units to the SpatRaster of topographic variables tcuvars <- c(tcu, tva) # Data frame with source for "cu", "vars", and "dif" ftdif <- list.files(path = p, pattern = "topodif.csv", full.names = TRUE) tdif <- read.csv(ftdif) # Check structure of source data frame head(tdif) # Predict distribution functions ## 1 distribution function per variable and classification unit = 1 tpdif <- predict_functions(cuvar.rast = tcuvars, cu.ind = 1, cu = tdif$Class.Unit[1:3], vars = tdif$Variable[1:3], dif = tdif$Dist.Func[1:3], grid.mult = 3, span = 0.9 ) # Plot predicted distribution functions if(interactive()){plot(tpdif, col = hcl.colors(100, "Oslo", rev = TRUE))} #-------- # Writing results to disk and parallel processing if(interactive()){ # Directory for temporary files o <- tempdir() # Register parallel backend require(doParallel) registerDoParallel(4) # Predict distribution functions tpdif <- predict_functions(cuvar.rast = tcuvars, cu.ind = 1, cu = tdif$Class.Unit[1:3], vars = tdif$Variable[1:3], dif = tdif$Dist.Func[1:3], grid.mult = 3, span = 0.9, to.disk = TRUE, outdir = o ) # Stop cluster stopImplicitCluster() # Clean temporary files file.remove(sources(tpdif)) }
Selection of distribution functions for continuous raster layers that were
used to create a raster layer of classification units. The distribution
functions currently supported are the probability density function (PDF), the
empirical cumulative density function (ECDF), and the inverse of the
empirical cumulative density function (iECDF). Please note that
select_functions
DOES NOT calculate the aforementioned
distribution functions. The sole purpose of select_functions
is
to assist in the knowledge-driven selection of the most appropriate
distribution function for each continuous variable used to create a given
classification unit (see Details).
select_functions( cu.rast, var.rast, fun = mean, varscale = "uniminmax", mode = "auto", verbose = TRUE, ... )
select_functions( cu.rast, var.rast, fun = mean, varscale = "uniminmax", mode = "auto", verbose = TRUE, ... )
cu.rast |
SpatRaster, as in |
var.rast |
SpatRaster. Multi-layer SpatRaster containing the n continuous raster layers of the variables used to create the classification units. |
fun |
Character. Descriptive statistical measurement (e.g., mean, max).
See |
varscale |
Character. Variable scaling method. See scale argument
in |
mode |
Character. String specifying the selection mode for univariate distribution functions. Possible values are "inter" for interactive selection, and "auto" for automatic selection (see Details). Default: "auto" |
verbose |
Boolean. Show warning messages in the console? Default: FALSE |
... |
Additional arguments as for |
The selection of distribution functions is univariate, that is, for each variable, and it is constrained, meaning that the selection has to be made for each classification unit. Overall, the distribution functions are used to characterize typical values of a given continuous variable within a given classification unit. When the PDF is selected, values closer to, or at the peak of the PDF will be considered as the most typical. Contrarily, values at the tails of the PDF will be considered as the less typical. When the ECDF or the iECDF are selected, values toward (+)infinity and (-)infinity will be considered as the most typical values, respectively.
In order to assist the selection process, when mode = "inter", this
function displays an interactive parallel coordinates plot (see
ggplotly
) and a writable table (built in Shiny). For
each variable, the parallel coordinates plot shows a trend of a descriptive
statistical measurement (argument fun) across all of the
classification units. Using this trend, one can then select the most
appropriate distribution function for each variable based on the
occurrence/absence of "peaks" and "pits" in the observed
trend. For instance, a peak (highest point in the trend) would indicate that
the given classification unit contains on average, the highest values of that
variable. Conversely, a pit (lowest point in the trend) would indicate that
the given classification unit contains on average, the lowest values of that
variable. Thus, an ECDF and an iECDF can be selected for the peak and the
pit, respectively. The PDF can be selected for classification units whose
trend does not show either a peak or a pit. Please consider that peaks and
pits are only reference points and thus, one should validate the selection of
distribution functions based on domain knowledge.
When mode = "auto", the criteria for the selection of distribution functions will be based on peaks and pits in the parallel coordinates plot.
The output table (distfun) is intended to be used as input in the
predict_functions
function.
The selection of distribution functions is similar to the selection of membership functions in fuzzy logic. For example, if one wants to describe a phenomenon through distribution functions of continuous variables, then the functions can be considered to be membership curves. Accordingly, the PDF, ECDF, and iECDF will be equivalent to the Gaussian, S, and Z membership functions, respectively.
If mode = "inter":
distfun: A DT table (DataTables library) with the following attributes: (1) Class.Unit = numeric ID for classification units, (2) Variable = each of the n continuous raster layers of a classification unit, and (3) Dist.Func = Empty column whose cells can be filled with the following strings: "PDF, "ECDF", and "iECDF" (unquoted). This table can be saved on disk through the Shiny interface.
parcoord: A plotly-based parallel coordinate plot which can be saved on disk using the R package htmlwidgets.
If mode = "auto":
distfun: Same as distfun when mode = "inter", except for column "Dist.Func" whose cells were automatically filled.
parcoord: Same as parcoord when mode = "inter".
Other Landscape Correspondence Metrics:
predict_functions()
,
signature
,
similarity()
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of topographic variables ## 3 topographic variables tf <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) tvars <- terra::rast(tf) # Single-layer SpatRaster of topographic classification units ## 5 classification units tcf <- list.files(path = p, pattern = "topography.tif", full.names = TRUE) tcu <- terra::rast(tcf) # Automatic selection of distribution functions tdif <- select_functions(cu.rast = tcu, var.rast = tvars, fun = mean) # Parallel coordinates plot if(interactive()){tdif$parcoord}
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRaster of topographic variables ## 3 topographic variables tf <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) tvars <- terra::rast(tf) # Single-layer SpatRaster of topographic classification units ## 5 classification units tcf <- list.files(path = p, pattern = "topography.tif", full.names = TRUE) tcu <- terra::rast(tcf) # Automatic selection of distribution functions tdif <- select_functions(cu.rast = tcu, var.rast = tvars, fun = mean) # Parallel coordinates plot if(interactive()){tdif$parcoord}
Using a mathematical function, a raster layer is created from the cell-wise
aggregation of a set of predicted distribution functions for a classification
unit (see predict_functions
). Each element in this set can be
thought of as a correspondence measurement between an XY location in
the geographic space and the landscape configuration represented by a given
classification unit in terms of a specific variable. Therefore, aggregating
the set of predicted distribution functions into a single layer produces an
overall (multivariate) measurement of correspondence. This multivariate
landscape correspondence metric is considered to be the spatial
signature of the classification unit.
signature( pdif.rast, inprex, outname, fun = mean, to.disk = FALSE, outdir = ".", extension = ".tif", overwrite = FALSE, ... )
signature( pdif.rast, inprex, outname, fun = mean, to.disk = FALSE, outdir = ".", extension = ".tif", overwrite = FALSE, ... )
pdif.rast |
SpatRaster, as in |
inprex |
Character. Prefix in the name of raster layers representing predicted distribution functions belonging to the same set (i.e.,same classification unit). If spatial signatures for multiple sets are to be calculated, a vector of strings must be provided. See Details and Examples. |
outname |
Character. Output layer/file name for the raster layer(s) of spatial signature. If the spatial signatures for multiple classification units are to be calculated, then a vector of strings must be provided. |
fun |
Function. The mathematical function must take a vector of values
and return a single value (e.g., mean, max, sum, etc.). See
|
to.disk |
Boolean. Write the output raster layer(s) of spatial signature to disk? See details about parallel processing. Default: FALSE |
outdir |
Character. If to.disk = TRUE, string specifying the path for the output raster layer(s) of spatial signature. Default: "." |
extension |
Character. If to.disk = TRUE, String specifying the extension for the output raster layer(s) of spatial signature. Default: ".tif" |
overwrite |
Boolean. If to.disk = TRUE, should raster layers in disk and with same name as the output raster layer(s) of spatial signature be overwritten? Default: FALSE |
... |
If to.disk = TRUE, additional arguments as for
|
Raster layers of predicted distribution functions belonging to the same classification unit must be identified by a unique prefix in their layer names (argument inprex). This prefix is used as a string pattern to find all the predicted distribution functions belonging to one classification unit. Consequently, a unique prefix must be defined for each additional classification unit to distinguish between predicted distribution functions for different classification units. Similarly, an additional string (or vector of strings) of file/layer name(s) must be provided to distinguish between the resulting spatial signatures for different classification units (argument outname). The length of outname must match that from inprex.
When writing the output raster layers of spatial signature to disk, a
parallel backend can be registered before running this function with
registerDoParallel
to speed-up computation. Note
that this is only helpful when calculating spatial signatures for many
classification units.
From a spatial analysis standpoint, the aggregation of predicted distribution functions into spatial signature is similar to the application of fuzzy aggregation operators commonly used in GIS-based multi-criteria decision analysis. Moreover, The use of descriptive statistics to calculate signatures for landscape-related classification units can be traced back to the works of Pike and Rozema (1975), and Pike (1988).
Single-layer or multi-layer SpatRaster with the spatial signature(s) calculated from the set(s) of predicted distribution functions.
R. Pike. The geometric signature: quantifying landslide-terrain types from digital elevation models. Mathematical geology, 20(5):491–511, 1988. doi:10.1007/BF00890333
R. Pike and W. Rozema. Spectral analysis of landforms. Annals of the Association of American Geographers,65(4):499–516, 1975. doi:10.1111/j.1467-8306.1975.tb01058.x
Other Landscape Correspondence Metrics:
predict_functions()
,
select_functions()
,
similarity()
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRast with predicted distribution functions ## 3 continuous variables and 5 classification units, = 15 functions ft <- list.files(path = p, pattern = "topo_", full.names = TRUE) t <- terra::rast(ft) # Vector with the prefix for each set of predicted distribution functions ## 5 classification units = 5 sets it <- paste("topo_", seq(1, 5), "_", sep = "") # Vector of names for output raster layers of spatial signature ## 5 spatial signatures, one per classification unit ot <- paste("topography_", seq(1, 5), sep = "") # Calculate spatial signatures tsig <- signature(pdif.rast = t, inprex = it, outname = ot) # Plot spatial signatures if(interactive()){plot(tsig, col = hcl.colors(100, "Oslo", rev = TRUE))}
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRast with predicted distribution functions ## 3 continuous variables and 5 classification units, = 15 functions ft <- list.files(path = p, pattern = "topo_", full.names = TRUE) t <- terra::rast(ft) # Vector with the prefix for each set of predicted distribution functions ## 5 classification units = 5 sets it <- paste("topo_", seq(1, 5), "_", sep = "") # Vector of names for output raster layers of spatial signature ## 5 spatial signatures, one per classification unit ot <- paste("topography_", seq(1, 5), sep = "") # Calculate spatial signatures tsig <- signature(pdif.rast = t, inprex = it, outname = ot) # Plot spatial signatures if(interactive()){plot(tsig, col = hcl.colors(100, "Oslo", rev = TRUE))}
For each stratification unit present in a single-layer SpatRaster, a raster
layer of landscape similarity is created by aggregating the stratification
unit's corresponding set of spatial signatures (see signature
).
For a stratification unit x, the corresponding set of spatial
signatures consists of one spatial signature for each of the n
classification units that are present in the numeric code of x (one
classification unit per landscape factor/factor scale). The aggregation
process is performed cell-wise, and by using a mathematical function which
takes multiple values but return a single value (e.g., mean, sum, min, max).
The resulting raster layer represents the correspondence between an XY
location in geographic space and the landscape configuration represented by a
given stratification unit.
similarity( su.rast, su.code, sig.rast, fun = mean, to.disk = FALSE, outdir = ".", prefix = "su_", extension = ".tif", overwrite = FALSE, ... )
similarity( su.rast, su.code, sig.rast, fun = mean, to.disk = FALSE, outdir = ".", prefix = "su_", extension = ".tif", overwrite = FALSE, ... )
su.rast |
SpatRaster, as in |
su.code |
List. The structure of the stratification units' numeric code. This (nested) list should indicate the names of the landscape factors/factor scales used to create the stratification units, and the position (start, end) of their corresponding classification units' ID in the numeric code. See Examples. |
sig.rast |
SpatRaster. Multi-layer SpatRaster with the spatial signatures of all the classification units that were used to create the stratification units. The spatial signatures should follow this name convention: x_n; where x is the landscape factor/factor scale, and n is the numeric ID of the classification unit to which the spatial signature belongs. |
fun |
Function. The mathematical function must accept a vector of values
and return a single value (e.g., mean, max, sum, etc.). See
|
to.disk |
Boolean. Write the output raster layers of landscape similarity to disk? See note about parallel processing. Default: FALSE |
outdir |
Character. If to.disk = TRUE, string specifying the path for the output raster layers of landscape similarity. Default: "." |
prefix |
Character. If to.disk = TRUE, prefix for the file name of the output raster layers of landscape similarity. Default: "su_" |
extension |
Character. If to.disk = TRUE, string specifying the extension for the output raster layers of landscape signature. Default: ".tif" |
overwrite |
Boolean. When to.disk = TRUE, should raster layers in disk and with same name as the output landscape similarities be overwritten? Default: FALSE |
... |
Additional arguments as for |
The landscape similarity is a landscape correspondence metric. The aggregation of multiple spatial signatures into a single landscape similarity layer is somewhat similar to the application of fuzzy logic and aggregation operators in GIS-based multi-criteria decision analysis. Furthermore, the aggregation of raster layers indicating relative optimality for spatially-varying phenomena, like spatial signatures, can be guided by physical/ecological principles like Sprengel-Liebig's law of the minimum. In such case, one could select the min function when aggregating the spatial signatures into landscape similarities.
When writing the output raster layers of landscape similarity to disk, a
parallel backend can be registered before running this function with
registerDoParallel
to speed-up computation.
A list with the following components:
landsim: Multi-layer SpatRaster with the landscape similarity to each stratification unit present in su.rast.
codes: A data frame with the numeric code for each stratification unit and the corresponding classification units' numeric ID for each landscape factor/factor scale.
Other Landscape Correspondence Metrics:
predict_functions()
,
select_functions()
,
signature
require(terra) p <- system.file("exdat", package = "rassta") # Single-layer SpatRaster of stratification units fsu <- list.files(path = p, pattern = "strata2.tif", full.names = TRUE) su <- terra::rast(fsu) # Define the structure of the stratification units' numeric code code <- list(geology = c(1,1), climate = c(2,2), topography = c(3,3)) # Multi-layer SpatRaster of spatial signatures of classification units fsig <- list.files(path = p, pattern = "geology_|climate_|topography_", full.names = TRUE ) sig <- terra::rast(fsig) # Calculate landscape similarity to stratification units landsim <- similarity(su.rast = su, su.code = code, sig.rast = sig) # Plot some landscape similarities # if(interactive()){plot(landsim$landsim[[c(1,10,12,14)]], # col = hcl.colors(100, "Oslo", rev = TRUE) # )} #------- # A note on the numeric code of stratification units # For a given stratification unit, the structure of its corresponding numeric # code indicates: (1) the landscape factors and/or factor scales that were # accounted for when creating the stratification unit, and (2) the numeric id # of the classification unit from each landscape factor/factor scale. # Consider the following numeric code structure: su.code <- list(geology = c(1,1), climate = c(2,2), topography = c(3,4)) # The stratification units are composed of classification units from... # ...three landscape factors: geology, climate, and topography names(su.code) # For geology, the classification units are represented by the first... # ...digit in the numeric code su.code$geology # For climate, the classification units are represented by the second... # ...digit in the numeric code su.code$climate # For topography, the classification units are represented by the third... # ...and fourth digit in the numeric code su.code$topography # Thus, the numeric code of the stratification units 1101 and 2410 means: su <- c(1101, 2410) su[1] # 'geology' = 1, 'climate' = 1, and 'topography' = 1 su[2] # 'geology' = 2, 'climate' = 4, and 'topography' = 10
require(terra) p <- system.file("exdat", package = "rassta") # Single-layer SpatRaster of stratification units fsu <- list.files(path = p, pattern = "strata2.tif", full.names = TRUE) su <- terra::rast(fsu) # Define the structure of the stratification units' numeric code code <- list(geology = c(1,1), climate = c(2,2), topography = c(3,3)) # Multi-layer SpatRaster of spatial signatures of classification units fsig <- list.files(path = p, pattern = "geology_|climate_|topography_", full.names = TRUE ) sig <- terra::rast(fsig) # Calculate landscape similarity to stratification units landsim <- similarity(su.rast = su, su.code = code, sig.rast = sig) # Plot some landscape similarities # if(interactive()){plot(landsim$landsim[[c(1,10,12,14)]], # col = hcl.colors(100, "Oslo", rev = TRUE) # )} #------- # A note on the numeric code of stratification units # For a given stratification unit, the structure of its corresponding numeric # code indicates: (1) the landscape factors and/or factor scales that were # accounted for when creating the stratification unit, and (2) the numeric id # of the classification unit from each landscape factor/factor scale. # Consider the following numeric code structure: su.code <- list(geology = c(1,1), climate = c(2,2), topography = c(3,4)) # The stratification units are composed of classification units from... # ...three landscape factors: geology, climate, and topography names(su.code) # For geology, the classification units are represented by the first... # ...digit in the numeric code su.code$geology # For climate, the classification units are represented by the second... # ...digit in the numeric code su.code$climate # For topography, the classification units are represented by the third... # ...and fourth digit in the numeric code su.code$topography # Thus, the numeric code of the stratification units 1101 and 2410 means: su <- c(1101, 2410) su[1] # 'geology' = 1, 'climate' = 1, and 'topography' = 1 su[2] # 'geology' = 2, 'climate' = 4, and 'topography' = 10
Produces a low-dimensional representation of the input feature space for
subsequent estimation of the "optimal" number of clusters (k) in a
multivariate dataset. The dimension reduction is based on the
self-organizing map technique (SOM) of Kohonen (1982; 1990), and implemented
in R by the function supersom
of Wehrens and
Kruisselbrink (2018). To estimate the optimal k, the partitioning
around medoids (PAM) of Kaufman and Rousseeuw (1990), coupled with the gap
statistic of Tibshirani et al. (2001), is performed on the SOM's codebook
vectors. This is achieved by internally calling pam
and clusGap
(Maechler et al., 2021). See
Details for a brief theoretical background.
som_gap( var.rast, xdim = 12, ydim = 12, topo = "hexagonal", neighbourhood.fct = "gaussian", rlen = 600, dist.fcts = c("sumofsquares", "manhattan"), mode = "pbatch", K.max, stand = FALSE, B = 500, d.power = 2, spaceH0 = "original", method = "globalSEmax", SE.factor = 1, ... )
som_gap( var.rast, xdim = 12, ydim = 12, topo = "hexagonal", neighbourhood.fct = "gaussian", rlen = 600, dist.fcts = c("sumofsquares", "manhattan"), mode = "pbatch", K.max, stand = FALSE, B = 500, d.power = 2, spaceH0 = "original", method = "globalSEmax", SE.factor = 1, ... )
var.rast |
SpatRaster, as in |
xdim |
Integer. Horizontal dimension of the SOM's grid. Default: 12 |
ydim |
Integer. Vertical dimension of the SOM's grid. Default: 12 |
topo |
Character. Topology of the SOM's grid. Options = "rectangular", "hexagonal". Default: "hexagonal" |
neighbourhood.fct |
Character. Neighborhood of the SOM's grid. Options = "bubble", "gaussian". Default: "gaussian" |
rlen |
Integer. Number of times the complete dataset will be presented to the SOM's network. Default: 600 |
dist.fcts |
Character. Vector of length 2 containing the distance functions to use for SOM (First element, options = "sumofsquares", "euclidean", "manhattan") and for PAM (second element, options = "euclidean", "manhattan"). Default: c("sumofsquares", "manhattan") |
mode |
Character. Type of learning algorithm. Options are “online", "batch", and "pbatch". Default: "pbatch" |
K.max |
Integer. Maximum number of clusters to consider, must be at least two (2). |
stand |
Boolean. For PAM function, does SOM's codebook vectors need to be standardized? Default: FALSE |
B |
Integer. Number of bootstrap samples for the gap statistic. Default: 500 |
d.power |
Integer. Positive Power applied to euclidean distances for the gap statistic. Default: 2 |
spaceH0 |
Character. Space of the reference distribution for the gap statistic. Options = "scaledPCA", "original" (See Details). Default: "original" |
method |
Character. Optimal k selection criterion for the gap statistic.
Options = "globalmax", "firstmax", "Tibs2001SEmax", "firstSEmax",
"globalSEmax". See |
SE.factor |
Numeric. Factor to feed into the standard error rule for the
gap statistic. Only applicable for methods based on standard error (SE).
See |
... |
Additional arguments as for |
The clustering of SOM's codebook vectors has been proposed in several works, notably in that from Vesanto and Alhoniemi (2000). These authors proposed a two-stage clustering routine as an efficient method to reduce computational load, while obtaining satisfactory correspondence between the clustered codebook vectors and the clustered original feature space.
The main purpose of this function is to allow the use of clustering and k-selection algorithms that may result prohibitive for large datasets, such as matrices derived from raster layers commonly used during geocomputational routines. Thus, the SOM's codebook vectors can be subsequently used for the calculation of distance matrices, which given the large size of their input feature space, may otherwise be impossible to create due to insufficient memory allocation capacity. Similarly, robust clustering algorithms that require full pairwise distance matrices (e.g., hierarchical clustering, PAM) and/or eigenvalues (e.g., spectral clustering) may also be performed on SOM's codebook vectors.
Note that supersom
will internally equalize the
importance (i.e., weights) of variables such that differences in scale will
not affect distance calculations. This behavior can be prevented by setting
normalizeDataLayers = FALSE in additional arguments passed to
supersom
. Moreover, custom weights can also be passed
through the additional argument user.weights. In such case, user
weights are applied on top of the internal weights.
When working with large matrices, the additional SOM argument
keep.data may be set to FALSE. However, note that by doing so, the
suggested follow-up function for raster products som_pam
will
not work since it requires both original data and winning units.
For the gap statistic, method = "scaledPCA" has resulted in errors for R sessions with BLAS/LAPACK supported by the Intel Math Kernel Library (MKL).
SOM: An object of class kohonen (see
supersom
). The components of class kohonen returned by
this function are: (1) data = original input matrix, (2)
unit.classif = winning units for all observations, (3)
distances = distance between each observation and its corresponding
winning unit, (4) grid = object of class somgrid (see
somgrid
), (5) codes = matrix of codebook
vectors, (6) changes = matrix of mean average deviations from codebook
vectors, (7) dist.fcts = selected distance function, and other
arguments passed to supersom
(e.g., radius,
distance.weights, etc.). Note that components 1, 2, and 3 will only be
returned if keep.data = TRUE, which is the default.
SOMdist: Object of class dist. Matrix of pairwise distances calculated from the SOM's codebook vectors.
SOMgap: Object of class clusGap. The main component of
class clusGap returned by this function is Tab, which is a matrix of
the gap statistic results (see clusGap
). Additional
components are the arguments passed to the function (i.e., spaceH0,
B), the PAM function, n (number of observations) and
call (the clusGap call-type object).
Kopt: Optimal k, as selected by arguments method and (possibly) SE.factor.
L. Kaufman and P. Rousseeuw. Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, 1990. doi:10.1002/9780470316801
T. Kohonen. Self-organized formation of topologically correct feature maps. Biological cybernetics, 43 (1):59–69, 1982. doi:10.1007/bf00337288
T. Kohonen. The self-organizing map. Proceedings of the IEEE, 78(9):1464–1480, 1990. doi:10.1016/s0925-2312(98)00030-7
M. Maechler, P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik. cluster: Cluster Analysis Basics and Extensions, 2021. https://CRAN.R-project.org/package=cluster
R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2):411–423, 2001. doi:10.1111/1467-9868.00293
J. Vesanto and E. Alhoniemi. Clustering of the self-organizing map. IEEE Transactions on Neural Networks, 11(3):586–600, 2000. doi:10.1109/72.846731
R. Wehrens and J. Kruisselbrink. Flexible self-organizing maps in kohonen 3.0. Journal of Statistical Software, 87(1):1–18, 2018. doi:10.18637/jss.v087.i07
Other Functions for Landscape Stratification:
som_pam()
,
strata()
require(terra) # Multi-layer SpatRaster with topographic variables p <- system.file("exdat", package = "rassta") tf <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) t <- rast(tf) # Scale topographic variables (mean = 0, StDev = 1) ts <- scale(t) # Self-organizing map and gap statistic for optimum k set.seed(963) tsom <- som_gap(var.rast = ts, xdim = 8, ydim = 8, rlen = 150, mode = "online", K.max = 6, B = 300, spaceH0 = "original", method = "globalSEmax" ) # Optimum k tsom$Kopt
require(terra) # Multi-layer SpatRaster with topographic variables p <- system.file("exdat", package = "rassta") tf <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) t <- rast(tf) # Scale topographic variables (mean = 0, StDev = 1) ts <- scale(t) # Self-organizing map and gap statistic for optimum k set.seed(963) tsom <- som_gap(var.rast = ts, xdim = 8, ydim = 8, rlen = 150, mode = "online", K.max = 6, B = 300, spaceH0 = "original", method = "globalSEmax" ) # Optimum k tsom$Kopt
Creation of a rasterized representation of the outputs from the
self-organizing map (SOM) and partitioning around medoids (PAM). Given a
reference raster layer, each winning unit of the SOM and clustering value
from the PAM will be mapped on the corresponding cell in the reference layer
and across the geographic space supported by such layer. Note that this
function is a follow-up of the som_gap
function.
som_pam(ref.rast, kohsom, k, metric = "manhattan", stand = FALSE, ...)
som_pam(ref.rast, kohsom, k, metric = "manhattan", stand = FALSE, ...)
ref.rast |
SpatRaster, as in |
kohsom |
SOM Object of class kohonen, see
|
k |
Integer (positive value). Number of clusters to form from the codebook vectors of the SOM, where k < SOM's codebook vectors. |
metric |
Character. Distance function for PAM. Options are "euclidean", and "manhattan". Default: "manhattan" |
stand |
Boolean. For the PAM function, do SOM's codebook vectors need to be standardized? Default: FALSE |
... |
Additional arguments as for |
As in som_gap
, this function calls pam
to perform the clustering of SOM's codebook vectors. The SOM object must
belong to the class kohonen, as in supersom
.
Note that in order for som_pam
to perform efficiently, the
reference SpatRaster ref.rast must be a single-layer SpatRaster with
the same cell size, number of rows, number of columns, and index of valid
cells as those in the multi-layer SpatRaster object used in
som_gap
. If a multi-layer SpatRaster (with each layer possibly
having a different index of valid cells) is used as the ref.rast, the
efficiency of som_pam
(i.e., running time and/or memory
allocation) may be degraded when handling large SpatRaster objects.
For this function to work as intended, the additional argument
cluster.only in pam
must remain as FALSE, which
is the default.
sompam: Object of class pam. See ?pam.object for details.
sompam.rast: Multi-layer SpatRaster, as in
rast
. The first raster layer corresponds to the SOM's
winning units. The second raster layer corresponds to the clustered SOM's
codebook vectors by PAM.
Other Functions for Landscape Stratification:
som_gap()
,
strata()
require(terra) # Multi-layer SpatRaster with topographic variables p <- system.file("exdat", package = "rassta") ft <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) t <- rast(ft) # Scale topographic variables (mean = 0, StDev = 1) ts <- scale(t) # Self-organizing map and gap statistic for optimum k set.seed(963) tsom <- som_gap(var.rast = ts, xdim = 8, ydim = 8, rlen = 150, mode = "online", K.max = 6, B = 300, spaceH0 = "original", method = "globalSEmax" ) # Optimum k tsom$Kopt # PAM clustering of topographic SOM's codebook vectors tpam <- som_pam(ref.rast = t[[1]], kohsom = tsom$SOM, k = tsom$Kopt) # Plot topographic variables, SOM grid and PAM clustering if(interactive()){plot(c(t, tpam$sompam.rast))}
require(terra) # Multi-layer SpatRaster with topographic variables p <- system.file("exdat", package = "rassta") ft <- list.files(path = p, pattern = "^height|^slope|^wetness", full.names = TRUE ) t <- rast(ft) # Scale topographic variables (mean = 0, StDev = 1) ts <- scale(t) # Self-organizing map and gap statistic for optimum k set.seed(963) tsom <- som_gap(var.rast = ts, xdim = 8, ydim = 8, rlen = 150, mode = "online", K.max = 6, B = 300, spaceH0 = "original", method = "globalSEmax" ) # Optimum k tsom$Kopt # PAM clustering of topographic SOM's codebook vectors tpam <- som_pam(ref.rast = t[[1]], kohsom = tsom$SOM, k = tsom$Kopt) # Plot topographic variables, SOM grid and PAM clustering if(interactive()){plot(c(t, tpam$sompam.rast))}
Stratification units are created from the spatial intersection of raster layers representing different sets of classification units. Each set of classification units is related to a particular landscape factor (e.g., topography, climate) or to a particular spatial scale for a single landscape factor (e.g., micro-climate, macro-topography). Each resulting stratification unit is considered to represent a distinct landscape configuration in terms of multiple landscape factors/factor scales (represented by the classification units). This function automatically assigns a unique numeric code to each stratification unit. For x stratification unit, the numeric code represents the unique combination of classification units whose spatial intersection resulted in x. See Examples to get a better idea of the logic behind the code assignment process.
strata(cu.rast, to.disk = FALSE, outdir = ".", su.name, ...)
strata(cu.rast, to.disk = FALSE, outdir = ".", su.name, ...)
cu.rast |
SpatRaster, as in |
to.disk |
Boolean. Write the resulting raster layer of stratification units to disk? Default: FALSE |
outdir |
Character. If to.disk = TRUE, string specifying the path for the output raster layer of stratification units. Default: "." |
su.name |
Character. If to.disk = TRUE, file name (including extension) for the output raster layer of stratification units. |
... |
Additional arguments as for |
When printing su.rast$code.mult, the output shows the multiplier used for each landscape factor/factor scale. From this output, one can manually replicate the creation of stratification units through simple raster algebra. To do so, a weighted sum of the SpatRasters containing the classification units for each landscape factor/factor scale should be performed using the multipliers as weights. Note that the weights do not imply relative importance. The weights are required only to preserve a logical structure of the landscape factors/factor scales in the resulting numeric code.
su.rast: Single-layer SpatRaster representing the stratification units occurring across geographic space. The cell values in this raster layer represents the numeric codes of stratification units.
code.mult: Multipliers used for the creation of the numeric codes. See Details.
Other Functions for Landscape Stratification:
som_gap()
,
som_pam()
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRast with classification units (Cus) ## Three sets (i.e., landscape factors): geology, climate and topography fcu <- list.files(path = p, pattern = "geology.tif|climate.tif|topography.tif", full.names = TRUE ) cu <- terra::rast(fcu) # Stratification units (SUs) su <- strata(cu.rast = cu) # Plot the stratification units if(interactive()){plot(su$su.rast, type = "classes")} # # Note code structure from SUs and corresponding values from CUs z <- c(su$su.rast, cu)[46,61] # Example of one cell (row = 45, column = 45) su$code.mult # Multipliers z[c("SU", names(su$code.mult))] # Code structure # Note what happens when some landscape factors have cell values greater... #... than 1 digit (i.e., more than 9 distinct classification units) cu <- c(cu[[1]], cu[[2]]^4, cu[[3]]^2) su <- strata(cu.rast = cu) su$code.mult c(su$su.rast, cu[[names(su$code.mult)]])[46,61]
require(terra) p <- system.file("exdat", package = "rassta") # Multi-layer SpatRast with classification units (Cus) ## Three sets (i.e., landscape factors): geology, climate and topography fcu <- list.files(path = p, pattern = "geology.tif|climate.tif|topography.tif", full.names = TRUE ) cu <- terra::rast(fcu) # Stratification units (SUs) su <- strata(cu.rast = cu) # Plot the stratification units if(interactive()){plot(su$su.rast, type = "classes")} # # Note code structure from SUs and corresponding values from CUs z <- c(su$su.rast, cu)[46,61] # Example of one cell (row = 45, column = 45) su$code.mult # Multipliers z[c("SU", names(su$code.mult))] # Code structure # Note what happens when some landscape factors have cell values greater... #... than 1 digit (i.e., more than 9 distinct classification units) cu <- c(cu[[1]], cu[[2]]^4, cu[[3]]^2) su <- strata(cu.rast = cu) su$code.mult c(su$su.rast, cu[[names(su$code.mult)]])[46,61]