LID.Rd
Calculate dispersion indexes according to a given set of standards and expectations (Mejia Ramon and Munson 2023), obtaining group, non-group, and total values for local observations and the global dataset.
A vector values
A weights matrix of dimensions length(x) x length(x)
representing that a given observation j
(along the columns)
is a part of i
(along the rows)'s group. This can be the output of the
makeWeights
function.
A character string, either 'gini' (the default), or 'inoua', representing
whether distances are calculated in L1 or L2 space, respectively. Alternatively,
a numeric representing to what value distances and means are raised to when the
index is calculated. index = 1
for Gini, and index = 2
for Inoua.
Either a character string or a matrix with dimensions length(x) x length(x)
,
representing the expectation value from which errors are calculated for each observation
pair between i
and j
. If expect = 'self'
, the expectation is calculated as i
;
if expect = 'local'
, the expectation is the neighborhood weighted mean; if
if expect = 'global'
, the expectation is the global mean. Expectations that depend on other
metrics (including hypothesis-driven that do not depend on the observed dataset) can
be provided by using an appropriate matrix.
Either a character string or a matrix with dimensions length(x) x length(x)
,
representing the standard by which errors are judged by each observation
pair between i
and j
. If standard = 'self'
, the standard is calculated as i
; if
standard = 'other'
, the standard is calculated as j
;
if standard = 'local'
, the standard is the neighborhood weighted mean; if
if standard = 'global'
, the standard is the global mean. Standards that depend on other
metrics (including hypothesis-driven that do not depend on the observed dataset) can
be provided by using an appropriate matrix.
A vector representing population weights. How much of an impact does a given
observation have on any other observation regardless of its influence as provided
for in w
. Default is 1
for all.
Character string identifying the maximum likelihood estimator to be used.
Default is mle = 'mean'
for the traditional Gini, although since it uses mean absolute error
and implies Laplace distribution, mle = 'median'
is recommended. Alternatively,
index = 'inoua'
and mle = 'mean'
for Gaussian processes.
If index != c('gini','inoua',1,2)
, how should the function
be named? Default is fun.name = paste0(index,'q')
.
A character string, either the name or corresponding code of a particular standard-expectation pair, as defined in #Link to Mejia Ramon and Munson 2023#
When processing, what is the maximum number of rows that
an internal data.table can have? This is generally not a concern unless
the number of observations approaches sqrt(.Machine$integer.max)
--usually
about 2^31 for most systems. Lower values result in a greater number of chunks
thus allowing larger data.sets to be calculated.
Should the canonical Gini or Inoua value also be calculated?
Default is FALSE
, and is ignored if index > 2
.
Logical. Should a progress bar be displayed? Default is FALSE
, although
if a large dataset is processed that requires adjusting max.cross
this can
be useful
Logical. Should gc
be run in the middle of the
calculation? Default is clear.mem
but set as TRUE
if memory limits are a concern.
A list with the following entries:
(1) $index
A named character string with the code of the index, named with its name
(2) $local
A data.table, with three columns: G_Gi
, the local group dispersion
index; G_NGi
, the local non-group dispersion index; and G_i
, the local
total dispersion index. Rows are in the same order as the input vector. This data.table
also contains the chosen expectations and standards as hidden attributes to be used by
inferLID
.
(3) $global
A list with three entries: $G_G
, the global group dispersion index;
$G_NG
, the global nongroup dispersion index; and $G
, the global
total dispersion index.
(4) $canonical
The canonical Gini or Inoua index, if canonical = TRUE
and
index < 3
.
The output list can be passed to inferLID
to determine
whether the values are locally and globally higher or lower than would be
expected if other values were randomly distributed among remaining observations.
# Generate dummy observations
x <- runif(10, 1, 100)
# Get distance matrix
dists <- dist(x)
# Get fuzzy weights considering 5 nearest neighbors based on
# inverse square distance
weights <- makeWeights(dists, bw = 5,
mode = 'adaptive', weighting = 'distance',
FUN = function(x) 1/x^2, minval = 0.1,
row.stand = 'fuzzy')
# Obtain the 'local gini' value
lid <- LID(x, w = weights, index = 'gini', type = 'local')