Local Indicators of Dispersion

Calculate dispersion indexes according to a given set of standards and expectations (Mejia Ramon and Munson 2023), obtaining group, non-group, and total values for local observations and the global dataset.

LID(
  x,
  w,
  index = "gini",
  expect = "self",
  standard = "global",
  n = rep(1, length(x)),
  mle = "mean",
  fun.name = paste0(index, "q"),
  type = "spatial",
  max.cross = .Machine$integer.max,
  canonical = FALSE,
  pb = FALSE,
  clear.mem = TRUE
)

Arguments

x: A vector values
w: A weights matrix of dimensions length(x) x length(x) representing that a given observation j (along the columns) is a part of i (along the rows)'s group. This can be the output of the makeWeights function.
index: A character string, either 'gini' (the default), or 'inoua', representing whether distances are calculated in L1 or L2 space, respectively. Alternatively, a numeric representing to what value distances and means are raised to when the index is calculated. index = 1 for Gini, and index = 2 for Inoua.
expect: Either a character string or a matrix with dimensions length(x) x length(x), representing the expectation value from which errors are calculated for each observation pair between i and j. If expect = 'self', the expectation is calculated as i; if expect = 'local', the expectation is the neighborhood weighted mean; if if expect = 'global', the expectation is the global mean. Expectations that depend on other metrics (including hypothesis-driven that do not depend on the observed dataset) can be provided by using an appropriate matrix.
standard: Either a character string or a matrix with dimensions length(x) x length(x), representing the standard by which errors are judged by each observation pair between i and j. If standard = 'self', the standard is calculated as i; if standard = 'other', the standard is calculated as j; if standard = 'local', the standard is the neighborhood weighted mean; if if standard = 'global', the standard is the global mean. Standards that depend on other metrics (including hypothesis-driven that do not depend on the observed dataset) can be provided by using an appropriate matrix.
n: A vector representing population weights. How much of an impact does a given observation have on any other observation regardless of its influence as provided for in w. Default is 1 for all.
mle: Character string identifying the maximum likelihood estimator to be used. Default is mle = 'mean' for the traditional Gini, although since it uses mean absolute error and implies Laplace distribution, mle = 'median' is recommended. Alternatively, index = 'inoua' and mle = 'mean' for Gaussian processes.
fun.name: If index != c('gini','inoua',1,2), how should the function be named? Default is fun.name = paste0(index,'q').
type: A character string, either the name or corresponding code of a particular standard-expectation pair, as defined in #Link to Mejia Ramon and Munson 2023#
max.cross: When processing, what is the maximum number of rows that an internal data.table can have? This is generally not a concern unless the number of observations approaches sqrt(.Machine$integer.max)--usually about 2^31 for most systems. Lower values result in a greater number of chunks thus allowing larger data.sets to be calculated.
canonical: Should the canonical Gini or Inoua value also be calculated? Default is FALSE, and is ignored if index > 2.
pb: Logical. Should a progress bar be displayed? Default is FALSE, although if a large dataset is processed that requires adjusting max.cross this can be useful
clear.mem: Logical. Should gc be run in the middle of the calculation? Default is clear.mem but set as TRUE if memory limits are a concern.

Value

A list with the following entries:

(1) $index A named character string with the code of the index, named with its name

(2) $local A data.table, with three columns: G_Gi, the local group dispersion index; G_NGi, the local non-group dispersion index; and G_i, the local total dispersion index. Rows are in the same order as the input vector. This data.table also contains the chosen expectations and standards as hidden attributes to be used by inferLID.

(3) $global A list with three entries: $G_G, the global group dispersion index; $G_NG, the global nongroup dispersion index; and $G, the global total dispersion index.

(4) $canonical The canonical Gini or Inoua index, if canonical = TRUE and index < 3.

Details

The output list can be passed to inferLID to determine whether the values are locally and globally higher or lower than would be expected if other values were randomly distributed among remaining observations.

Examples


# Generate dummy observations
x <- runif(10, 1, 100)

# Get distance matrix
dists <- dist(x)

# Get fuzzy weights considering 5 nearest neighbors based on 
# inverse square distance
weights <- makeWeights(dists, bw = 5, 
                       mode = 'adaptive', weighting = 'distance',
                       FUN = function(x) 1/x^2, minval = 0.1,
                       row.stand = 'fuzzy')
                       
# Obtain the 'local gini' value
lid <- LID(x, w = weights, index = 'gini', type = 'local')