Title: | Global Envelopes |
---|---|
Description: | Implementation of global envelopes for a set of general d-dimensional vectors T in various applications. A 100(1-alpha)% global envelope is a band bounded by two vectors such that the probability that T falls outside this envelope in any of the d points is equal to alpha. Global means that the probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be used for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional analysis of variance, functional general linear model, n-sample test of correspondence of distribution functions), for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot) and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction). See Myllymäki and Mrkvička (2023) <doi:10.48550/arXiv.1911.06583>, Myllymäki et al. (2017) <doi:10.1111/rssb.12172>, Mrkvička and Myllymäki (2023) <doi:10.1007/s11222-023-10275-7>, Mrkvička et al. (2016) <doi:10.1016/j.spasta.2016.04.005>, Mrkvička et al. (2017) <doi:10.1007/s11222-016-9683-9>, Mrkvička et al. (2020) <doi:10.14736/kyb-2020-3-0432>, Mrkvička et al. (2021) <doi:10.1007/s11009-019-09756-y>, Myllymäki et al. (2021) <doi:10.1016/j.spasta.2020.100436>, Mrkvička et al. (2022) <doi:10.1002/sim.9236>, Dai et al. (2022) <doi:10.5772/intechopen.100124>, Dvořák and Mrkvička (2022) <doi:10.1007/s00180-021-01134-y>, Mrkvička et al. (2023) <doi:10.48550/arXiv.2309.04746>, and Konstantinou et al. (2024) <doi: 10.48550/arXiv.2403.01838>. |
Authors: | Mari Myllymäki [aut, cre], Tomáš Mrkvička [aut], Mikko Kuronen [ctb], Jiří Dvořák [ctb], Pavel Grabarnik [ctb], Ute Hahn [ctb], Michael Rost [ctb], Henri Seijo [ctb] |
Maintainer: | Mari Myllymäki <[email protected]> |
License: | GPL-3 |
Version: | 1.0-3 |
Built: | 2024-11-17 05:26:34 UTC |
Source: | https://github.com/myllym/get |
The GET package provides implementation of global envelopes for a set of general d-dimensional vectors T in various applications. A 100(1-alpha) the probability that T falls outside this envelope in any of the d points is equal to alpha. Global means that the probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be used for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot), for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional ANOVA, functional GLM, n-sample test of correspondence of distribution functions), and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction).
The GET package provides central regions (i.e. global envelopes) and global envelope tests with intrinsic graphical interpretation. The central regions can be constructed from (functional) data. The tests are Monte Carlo or permutation tests, which demand simulations from the tested null model. The methods are applicable for any multivariate vector data and functional data (after discretization).
To get an overview of the package, start R and type library("GET")
and vignette("GET")
.
To get examples of point pattern analysis, start R and type library("GET")
and vignette("pointpatterns")
.
To get examples of Mrkvička and Myllymäki (2022), start R and type library("GET")
and vignette("FDRenvelopes")
.
Central regions or global envelopes or confidence bands:
central_region
.
E.g. 50% central region of growth curves of girls growth
.
First create a curve_set of the growth curves, e.g.
cset <- curve_set(r = as.numeric(row.names(growth$hgtf)),
obs = growth$hgtf)
Then calculate 50% central region (see central_region
for further arguments)
cr <- central_region(cset, coverage = 0.5)
Plot the result (see plot.global_envelope
for plotting options)
plot(cr)
It is also possible to do combined central regions for several sets of curves provided in a list
for the function, see examples in central_region
.
Global envelope tests: global_envelope_test
is the main function.
E.g. A test of complete spatial randomness (CSR) for a point pattern X
:
X <- spruces # an example pattern from spatstat
Use the function envelope
of spatstat to create nsim simulations
under CSR and to calculate the functions you want (below K-functions by Kest).
Important: use the option 'savefuns=TRUE' and
specify the number of simulations nsim
.
env <- envelope(X, nsim=999, savefuns = TRUE, fun = Kest, simulate = expression(runifpoint(ex = X)))
Perform the test (see global_envelope_test
for further arguments)
res <- global_envelope_test(env)
Plot the result (see plot.global_envelope
for plotting options)
plot(res)
It is also possible to do combined global envelope tests for several sets of curves provided in a list
for the function, see examples in global_envelope_test
.
To obtain false discovery rate envelopes of Mrkvička and Myllymäki (2023) use the argument typeone = "fdr"
.
Functional ordering: central_region
and global_envelope_test
are based on different measures for ordering the functions (or vectors) from
the most extreme to the least extreme ones. The core functionality of calculating the measures
is in the function forder
, which can be used to obtain different measures for sets of
curves. Usually there is no need to call forder
directly.
Functional boxplots: fBoxplot
Adjusted global envelope tests for composite null hypotheses
GET.composite
, see a detailed example in saplings
One-way functional ANOVA:
Graphical functional ANOVA tests: graph.fanova
Global rank envelope based on F-values: frank.fanova
Functional general linear model (GLM):
Graphical functional GLM: graph.flm
Global rank envelope based on F-values: frank.flm
For large data (not fitting comfortably in memory): partial_forder
Functional clustering: fclustering
Global quantile regression: global_rq
Functions for performing global envelopes for other specific purposes:
Graphical n sample test of correspondence of distribution functions: GET.distrequal
Permutation-based tests of independence to samples from any bivariate distribution: GET.distrindep
Testing global and local dependence of point patterns on covariates: GET.spatialF
Testing local correlations: GET.localcor
Variogram and residual variogram with global envelopes: GET.variogram
Deviation tests (for simple hypothesis): deviation_test
(no graphical
interpretation)
Most functions accept the curves provided in a curve_set
object.
Use curve_set
to create a curve_set
object from the
functions. Other formats to provide the curves to the above functions are
also accepted, see the information on the help pages.
See the help files of the functions for examples.
To perform a test you always first need to obtain the test function
for your data (
) and for each simulation
(
) in one way or another.
Given the set of the functions
,
you can perform a test by
global_envelope_test
.
1) The workflow when using your own programs for simulations:
(Fit the model and) Create simulations from the (fitted) null model.
Calculate the functions .
Use curve_set
to create a curve_set
object
from the functions .
Perform the test
res <- global_envelope_test(curve_set)
where curve_set
is the 'curve_set'-object you created, and plot the result
plot(res)
2) The workflow utilizing spatstat: start R, type library("GET")
and vignette("pointpatterns")
,
which explains the workflow and gives many examples of point pattern analysis
It is possible to modify the curve set for the test.
You can choose the interval of distances by
crop_curves
.
For better visualisation, you can take by
residual
.
Here is the expectation of
under the null hypothesis.
abide_9002_23
: see help page
adult_trees
: a point pattern of adult rees
cgec
: centred government expenditure centralization (GEC) ratios (see graph.fanova
)
fallen_trees
: a point pattern of fallen trees
GDPtax
: GDP per capita with country groups and other covariates
imageset3
: a simulated set of images
rimov
: water temperature curves in 365 days of the 36 years
saplings
: a point pattern of saplings (see GET.composite
)
The data sets are used to show examples of the functions of the library.
If the number of functions is low, the choice of the measure (or type or depth) playes a role,
as explained in vignette("GET")
(Section 2.4).
Note that the recommended minimum number of simulations for the rank envelope test (Myllymäki et al., 2017) based on a single function in spatial statistics is nsim=2499. When the number of argument values is large, also larger number simulations is needed in order to have a narrow p-interval. The "erl", "cont", "area", "qdir" and "st" global envelope tests and deviation tests can be used with a lower number of simulations, although the Monte Carlo error is obviously larger with a lower number of simulations. For increasing the number of simulations, all the global rank envelopes approach the same curves.
Mrkvička et al. (2017) discussed the number of simulations for tests based on many functions.
Myllymäki and Mrkvička (2023) provides description of the package.
The material can also be found in the corresponding vignette, which is available by
starting R and typing library("GET")
and vignette("GET")
.
In the special case of spatial processes (spatial point processes, random sets),
the functions are typically estimators of summary functions. The package supports
the use of the R package spatstat for generating simulations and calculating
estimators of the chosen summary function, but alternatively these can be done by
any other way, thus allowing for any user-specified models/functions.
To see examples of global envelopes for analysing point pattern data,
start R, type library("GET")
and vignette("pointpatterns")
.
Mrkvička and Myllymäki (2023) developed false discovery rate (FDR) envelopes.
Examples can be found by in associated vignette: start R, and type
library("GET")
and vignette("pointpatterns")
.
Mrkvička et al. (2023a) proposed global quantile regression. An example of
global quantile regression is given in the vignette vignette("QuantileRegression")
.
The vignette vignette("HotSpots")
illustrates the methodology proposed by
Mrkvička et al. (2023b) for detecting hotspots on a linear network.
Type citation("GET") to get a full list of references.
Mikko Kuronen has made substantial contributions of code. Additional contributions and suggestions from Jiří Dvořák, Pavel Grabarnik, Ute Hahn, Michael Rost and Henri Seijo.
Mari Myllymäki ([email protected], [email protected]) and Tomáš Mrkvička ([email protected])
Dai, W., Athanasiadis, S., Mrkvička, T. (2021) A new functional clustering method with combined dissimilarity sources and graphical interpretation. Intech open, London, UK. DOI: 10.5772/intechopen.100124
Dvořák, J. and Mrkvička, T. (2022). Graphical tests of independence for general distributions. Computational Statistics 37, 671–699.
Mrkvička, T., Konstantinou, K., Kuronen, M. and Myllymäki, M. (2023a) Global quantile regression. arXiv:2309.04746 [stat.ME]. https://doi.org/10.48550/arXiv.2309.04746
Mrkvička, T., Kraft, S., Blažek, V. and Myllymäki, M. (2023b) Hotspots detection on a linear network with presence of covariates: a case study on road crash data. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4598454
Mrkvička, T., Myllymäki, M. and Hahn, U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5), 1239-1255. doi: 10.1007/s11222-016-9683-9
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Mrkvička, T., Myllymäki, M. (2023) False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7
Mrkvička, T., Roskovec, T. and Rost, M. (2021) A nonparametric graphical tests of significance in functional GLM. Methodology and Computing in Applied Probability 23, 593-612. doi: 10.1007/s11009-019-09756-y
Mrkvička, T., Soubeyrand, S., Myllymäki, M., Grabarnik, P., and Hahn, U. (2016) Monte Carlo testing in spatial statistics, with applications to spatial residuals. Spatial Statistics 18, Part A, 40-53. doi: http://dx.doi.org/10.1016/j.spasta.2016.04.005
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015) Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017) Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381-404. doi: 10.1111/rssb.12172
Myllymäki, M. and Mrkvička, T. (2023). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
Myllymäki, M., Kuronen, M. and Mrkvička, T. (2020). Testing global and local dependence of point patterns on covariates in parametric models. Spatial Statistics 42, 100436. doi: 10.1016/j.spasta.2020.100436
Imaging measurements for local brain activity at resting state
data("abide_9002_23")
data("abide_9002_23")
A list of the curve_set
containing the data,
coordinates (x,y) where the data have been observed (third dimension is 23),
the discrete factor Group
(1=Autism; 2=Control),
the discrete factor Sex
(1=Male; 2=Female),
and the continuous factor Age
.
The data are a small part of ABIDE fALFF data available at ABIDE: http://fcon_1000.projects.nitrc.org/indi/abide/ fALFF: http://fcp-indi.github.io/docs/user/alff.html and distributed under the CC BY-NC-SA 3.0 license, https://creativecommons.org/licenses/by-nc-sa/3.0/.
The data are fractional Amplitude of Low Frequency Fluctuations (fALFF) (Zou et al. 2008) for Autism Brain Imaging Data Exchange collected resting state functional magnetic resonance imaging (R-fMRI) datasets (Di Martino et al. 2013). This data set in GET contains only a tiny part of the whole brain, namely the region 9002 (the right Cerebelum Crus 1) at slice 23 (see Figure 2 in Mrkvicka et al., 2019) for 514 individuals with the autism spectrum disorder (ASD) and 557 typical controls (TC) as specified in the given Group variable. Further the sex and age of each subject is given.
Di Martino, A., Yan, C., Li, Q., Denio, E., Castellanos, F., Alaerts, K., Anderson, J., Assaf, M., Bookheimer, S., Dapretto, M., et al. (2013) The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry.
Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., Mazoyer, B., and Joliot, M. (2002), Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage, 15, 273-289.
Zou, Q.-H., Zhu, C.-Z., Yang, Y., Zuo, X.-N., Long, X.-Y., Cao, Q.-J., Wang, Y.-F., and Zang, Y.-F. (2008), An improved approach to detection of amplitude of low-frequency fluctuation (ALFF) for resting-state fMRI: fractional ALFF. Journal of neuroscience methods, 172, 137-141.
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Adult trees data set
data("adult_trees")
data("adult_trees")
A data.frame
containing the locations (x- and y-coordinates) of 67 trees
in an area of 75 m x 75 m.
A pattern of large trees (height > 25 m) originating from an uneven aged multi-species broadleaf nonmanaged forest in Kaluzhskie Zaseki, Russia.
The pattern is a sample part of data collected over 10 ha plot as a part of a research program headed by project leader Prof. O.V. Smirnova.
Grabarnik, P. and Chiu, S. N. (2002) Goodness-of-fit test for complete spatial randomness against mixtures of regular and clustered spatial point processes. Biometrika, 89, 411–421.
van Lieshout, M.-C. (2010) Spatial point process theory. In Handbook of Spatial Statistics (eds. A. E. Gelfand, P. J. Diggle, M. Fuentes and P. Guttorp), Handbooks of Modern Statistical Methods. Boca Raton: CRC Press.
if(require("spatstat.geom", quietly=TRUE)) { data("adult_trees") adult_trees <- as.ppp(adult_trees, W = square(75)) plot(adult_trees) }
if(require("spatstat.geom", quietly=TRUE)) { data("adult_trees") adult_trees <- as.ppp(adult_trees, W = square(75)) plot(adult_trees) }
If given an envelope object of spatstat or a fdata
object of
fda.usc, convert it into a curve_set object. If given a curve_set
object, check its correctness and give it back.
as.curve_set(curve_set, ...)
as.curve_set(curve_set, ...)
curve_set |
An object to be converted to a |
... |
Allows passing arguments to |
If an envelope
object of spatstat or an
fdata
object is given, return a corresponding
curve_set object. If a curve_set object was given, check it and return it
unharmed.
Provides central regions or global envelopes or confidence bands
central_region( curve_sets, type = "erl", coverage = 0.5, alternative = c("two.sided", "less", "greater"), probs = c(0.25, 0.75), quantile.type = 7, central = "median", nstep = 2, ... )
central_region( curve_sets, type = "erl", coverage = 0.5, alternative = c("two.sided", "less", "greater"), probs = c(0.25, 0.75), quantile.type = 7, central = "median", nstep = 2, ... )
curve_sets |
A |
type |
The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details. |
coverage |
A number between 0 and 1. The 100*coverage% central region will be calculated. A vector of values can also be provided, leading to the corresponding number of central regions. |
alternative |
A character string specifying the alternative hypothesis.
Must be one of the following: "two.sided" (default), "less" or "greater".
The last two options only available for types |
probs |
A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017). |
quantile.type |
As type argument of |
central |
Either "mean" or "median". If the curve sets do not contain the component
|
nstep |
1 or 2 for how to contruct a combined global envelope if list of curve sets is provided. 2 (default) for a two-step combining procedure, 1 for one-step. |
... |
Ignored. |
Given a curve_set
object,
or an envelope
object of spatstat or fdata
object of fda.usc,
the function central_region
constructs a central region, i.e. a global envelope,
from the given set of functions (or vectors).
Generally an envelope is a band bounded by the vectors (or functions)
and
.
A
% or 100*coverage% global envelope is a set
of envelope vectors
such that the probability that
falls outside this envelope
in any of the d points of the vector
is less or equal to
.
The global envelopes can be constructed based on different measures
that order the functions from the most extreme one to the least extreme one.
We use such orderings of the functions for which we are able to construct global envelopes
with intrinsic graphical interpretation.
The type of the global envelope can be chosen with the argument type
and
the options are given in the following.
Further information about the measures, on which the global envelopes are based,
can be found in Myllymäki and Mrkvička (2020, Section 2.).
'rank'
: The global rank envelope
proposed by Myllymäki et al. (2017) based on the extreme rank defined as the minimum of pointwise
ranks.
'erl'
: The global rank envelope based on the extreme rank
length (Myllymäki et al.,2017, Mrkvička et al., 2018).
This envelope is constructed as the convex hull of the functions which have extreme rank
length measure that is larger or equal to the critical level of the
extreme rank length measure.
'cont'
: The global rank envelope based on the continuous rank
(Hahn, 2015; Mrkvička et al., 2019) based on minimum of continuous pointwise ranks.
It is contructed as the convex hull in a similar way as the 'erl'
envelope.
'area'
: The global rank envelope based on the area rank (Mrkvička et al., 2019)
which is based on area between continuous pointwise ranks and minimum pointwise ranks
for those argument (r) values for which pointwise ranks achieve the minimum
(it is a combination of erl and cont).
It is contructed as the convex hull in a similar way as the 'erl'
and 'area'
envelopes.
'qdir'
: The directional quantile envelope based on
the directional quantile maximum absolute deviation (MAD) test (Myllymäki et al., 2017, 2015),
which takes into account the unequal variances of the test function T(r) for
different distances r and is also protected against asymmetry of distribution of T(r).
'st'
: The studentised envelope based on the studentised MAD
measure (Myllymäki et al., 2017, 2015),
which takes into account the unequal variances of the test function T(r) for different distances r.
'unscaled'
: The unscaled envelope (Ripley, 1981),
which leads to envelopes with constant width. It corresponds to the classical
maximum deviation test without scaling. This test suffers from unequal variance
of T(r) over the distances r and from the asymmetry of distribution of T(r).
We recommend to use the other alternatives instead. This unscaled global envelope is
provided for reference.
The values of the chosen measure M are determined for each curve in the curve_set
, and
based on the chosen measure, the central region, i.e. the global envelope, is constructed
for the given curves.
If a list of (suitable) objects are provided in the argument curve_sets
,
then by default (nstep = 2
) the two-step combining procedure is used to
construct the combined global envelope as described in Myllymäki and Mrkvička (2020, Section 2.2.).
If nstep = 1
and the lengths of the multivariate vectors in each component
of the list are equal, then the one-step combining procedure is used where the
functions are concatenated together into a one long vector (see again Myllymäki and Mrkvička, 2020, Section 2.2.).
Either an object of class global_envelope
and or an combined_global_envelope
object.
The former class is obtained when a set of curves is provided, while the latter in the case
that curve_sets
is a list of objects. The print and plot function are defined for the
returned objects (see examples).
The global_envelope
object is essentially a data frame containing columns
r = the vector of values of the argument r at which the test was made
lo = the lower envelope based on the simulated functions; in case of a vector of coverage values, several 'lo' exist with names paste0("lo.", 100*coverage)
hi = the upper envelope based on the simulated functions; in case of a vector of coverage values, several 'lo' exist with names paste0("hi.", 100*coverage)
central = If the curve_set
(or envelope
object) contains a theoretical curve,
then this function is used as the central curve and returned in this component.
Otherwise, the central curve is the mean or median (according to the argument central
)
of the test functions T_i(r), i=2, ..., s+1. Used for visualization only.
and potentially additionally
obs = the data function, if there is only one data function in the given curve_sets
.
Otherwise not existing.
(Most often central_region
is directly applied to functional data where all curves are observed.)
Additionally, the returned object has some attributes, where
M = A vector of the values of the chosen measure for all the function. If there is only one observed function, then M[1] gives the value of the measure for this.
M_alpha = The critical value of M corresponding to the 100(1-alpha)% global envelope (see Myllymäki and Mrkvička, 2020, Definition 1.1. IGI).
Further the object has some attributes for printing and plotting purposes, where
alternative
, type
, ties
, alpha
correspond to those in the function call
and method
gives a name for the method.
Attributes of an object res
can be obtained using the function
attr
, e.g. attr(res, "M")
for the values of the ordering measure.
If the given set of curves had the class envelope
of spatstat, then the returned
global_envelope
object has also the class fv
of spatstat, whereby one can utilize
also the plotting functions of spatstat, see example in plot.global_envelope
.
However, the envelope
objects are most often used with global_envelope_test
and not with central_region
.
For an fv
object, also some further attributes exists as required by fv
of spatstat.
The combined_global_envelope
is a list of global_envelope
objects, where
the components correspond to the components of curve_sets
.
The combined_global_envelope
object constructed with nstep = 2
contains,
in addition to some conventional ones (method
, alternative
, type
, alpha
,
M
, M_alpha
, see above), the second level envelope information as the attributes
level2_ge = The second level envelope on which the envelope construction is based
level2_curve_set = The second level curve_set
from which level2_ge
is constructed
In the case that the given curve sets are two-dimensional, i.e., their arguments values are two-dimensional,
then the returned objects have in addition to the class global_envelope
or combined_global_envelope
,
the class global_envelope2d
or combined_global_envelope2d
, respectively. This class is assigned
for plotting purposes: For the 2d envelopes, also the default plots are 2d.
Otherwise the 1d and 2d objects are similar.
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381-404. doi: 10.1111/rssb.12172
Myllymäki, M. and Mrkvička, T. (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
Ripley, B.D. (1981). Spatial statistics. Wiley, New Jersey.
## A central region of a set of functions #---------------------------------------- if(requireNamespace("fda", quietly=TRUE)) { cset <- curve_set(r=as.numeric(row.names(fda::growth$hgtf)), obs=fda::growth$hgtf) plot(cset) + ggplot2::ylab("height") cr <- central_region(cset, coverage=0.50, type="erl") plot(cr) } ## Confidence bands for linear or polynomial regression #------------------------------------------------------ # Simulate regression data according to the cubic model # f(x) = 0.8x - 1.8x^2 + 1.05x^3 for x in [0,1] par <- c(0,0.8,-1.8,1.05) # Parameters of the true polynomial model res <- 100 # Resolution x <- seq(0, 1, by=1/res); x2=x^2; x3=x^3; f <- par[1] + par[2]*x + par[3]*x^2 + par[4]*x^3 # The true function d <- f + rnorm(length(x), 0, 0.04) # Data # Plot the true function and data plot(f, type="l", ylim=range(d)) points(d) # Estimate polynomial regression model reg <- lm(d ~ x + x2 + x3) ftheta <- reg$fitted.values resid0 <- reg$residuals s0 <- sd(resid0) # Bootstrap regression B <- 2000 # Number of bootstrap samples ftheta1 <- array(0, c(B,length(x))) s1 <- array(0,B) for(i in 1:B) { u <- sample(resid0, size=length(resid0), replace=TRUE) reg1 <- lm((ftheta+u) ~ x + x2 + x3) ftheta1[i,] <- reg1$fitted.values s1[i] <- sd(reg1$residuals) } # Centering and scaling meanftheta <- apply(ftheta1, 2, mean) m <- array(0, c(B,length(x))) for(i in 1:B) { m[i,] <- (ftheta1[i,]-meanftheta)/s1[i] } # Central region computation boot.cset <- curve_set(r=1:length(x), obs=ftheta+s0*t(m)) cr <- central_region(boot.cset, coverage=c(0.50, 0.80, 0.95), type="erl") # Plotting the result plot(cr) + ggplot2::labs(x=expression(italic(x)), y=expression(italic(f(x)))) + ggplot2::geom_point(data=data.frame(id=1:length(d), points=d), ggplot2::aes(x=id, y=points)) + # data points ggplot2::geom_line(data=data.frame(id=1:length(d), points=f), ggplot2::aes(x=id, y=points)) # true function
## A central region of a set of functions #---------------------------------------- if(requireNamespace("fda", quietly=TRUE)) { cset <- curve_set(r=as.numeric(row.names(fda::growth$hgtf)), obs=fda::growth$hgtf) plot(cset) + ggplot2::ylab("height") cr <- central_region(cset, coverage=0.50, type="erl") plot(cr) } ## Confidence bands for linear or polynomial regression #------------------------------------------------------ # Simulate regression data according to the cubic model # f(x) = 0.8x - 1.8x^2 + 1.05x^3 for x in [0,1] par <- c(0,0.8,-1.8,1.05) # Parameters of the true polynomial model res <- 100 # Resolution x <- seq(0, 1, by=1/res); x2=x^2; x3=x^3; f <- par[1] + par[2]*x + par[3]*x^2 + par[4]*x^3 # The true function d <- f + rnorm(length(x), 0, 0.04) # Data # Plot the true function and data plot(f, type="l", ylim=range(d)) points(d) # Estimate polynomial regression model reg <- lm(d ~ x + x2 + x3) ftheta <- reg$fitted.values resid0 <- reg$residuals s0 <- sd(resid0) # Bootstrap regression B <- 2000 # Number of bootstrap samples ftheta1 <- array(0, c(B,length(x))) s1 <- array(0,B) for(i in 1:B) { u <- sample(resid0, size=length(resid0), replace=TRUE) reg1 <- lm((ftheta+u) ~ x + x2 + x3) ftheta1[i,] <- reg1$fitted.values s1[i] <- sd(reg1$residuals) } # Centering and scaling meanftheta <- apply(ftheta1, 2, mean) m <- array(0, c(B,length(x))) for(i in 1:B) { m[i,] <- (ftheta1[i,]-meanftheta)/s1[i] } # Central region computation boot.cset <- curve_set(r=1:length(x), obs=ftheta+s0*t(m)) cr <- central_region(boot.cset, coverage=c(0.50, 0.80, 0.95), type="erl") # Plotting the result plot(cr) + ggplot2::labs(x=expression(italic(x)), y=expression(italic(f(x)))) + ggplot2::geom_point(data=data.frame(id=1:length(d), points=d), ggplot2::aes(x=id, y=points)) + # data points ggplot2::geom_line(data=data.frame(id=1:length(d), points=f), ggplot2::aes(x=id, y=points)) # true function
Centred government expenditure centralization (GEC) ratios
data("cgec")
data("cgec")
A list of two components. The first one is the curve_set
object containing the observed
values of centred GEC observed in year 1995-2016 for the above countries.
The second component group
gives the grouping.
The data includes the government expenditure centralization (GEC) ratio in percent that has been centred with respect to country average in order to remove the differences in absolute values of GEC. The GEC ratio is the ratio of central government expenditure to the total general government expenditure. Data were collected from the Eurostat (2018) database. Only those European countries were included, where the data were available from 1995 to 2016 without interruption. Finally, 29 countries were classified into three groups in the following way:
Group 1: Countries joining EC between 1958 and 1986 (Belgium, Denmark, France, Germany (until 1990 former territory of the FRG), Greece, Ireland, Italy, Luxembourg, Netherlands, Portugal, Spain, United Kingdom. These countries have long history of European integration, representing the core of integration process.
Group 2: Countries joining the EU in 1995 (Austria, Sweden, Finland) and 2004 (Malta, Cyprus), except CEEC (separate group), plus highly economically integrated non-EU countries, EFTA members (Norway, Switzerland). Countries in this group have been, or in some case even still are standing apart from the integration mainstream. Their level of economic integration is however very high.
Group 3: Central and Eastern European Countries (CEEC), having similar features in political end economic history. The process of economic and political integration have been initiated by political changes in 1990s. CEEC joined the EU in 2004 and 2007 (Bulgaria, Czech Republic, Estonia, Hungary, Latvia, Lithuania, Poland, Romania, Slovakia, Slovenia, data for Croatia joining in 2013 are incomplete, therefore not included).
This grouping is used in examples.
Eurostat (2018). "Government revenue, expenditure and main aggregates (gov10amain)”. Retrieved from https://ec.europa.eu/eurostat/data/database(26/10/2018).
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56 (3), 432-458. doi: 10.14736/kyb-2020-3-0432
data("cgec") # Plot data in groups for(i in 1:3) assign(paste0("p", i), plot(subset(cgec$cgec, cgec$group == i)) + ggplot2::labs(title=paste("Group ", i, sep=""), y="Centred GEC")) p3 if(require("patchwork", quietly=TRUE)) p1 + p2 + p3
data("cgec") # Plot data in groups for(i in 1:3) assign(paste0("p", i), plot(subset(cgec$cgec, cgec$group == i)) + ggplot2::labs(title=paste("Group ", i, sep=""), y="Centred GEC")) p3 if(require("patchwork", quietly=TRUE)) p1 + p2 + p3
Given a list of curve_set
objects, a combined global scaled (directional quantile
or studentized) MAD envelope test is performed with the test functions saved in the curve set objects.
Details of this combined test can be found in Mrkvicka et al. (2017).
The implementation of this test is provided here for historical reasons:
we recommend now instead the use of global_envelope_test
also for combined tests;
these combined tests are there implemented as described in Myllymäki and Mrkvička (2020).
combined_scaled_MAD_envelope_test( curve_sets, type = c("qdir", "st"), alpha = 0.05, probs = c(0.025, 0.975), central = "mean", ... )
combined_scaled_MAD_envelope_test( curve_sets, type = c("qdir", "st"), alpha = 0.05, probs = c(0.025, 0.975), central = "mean", ... )
curve_sets |
A |
type |
Either "qdir" for the direction quantile envelope test or "st" for the studentized envelope test. |
alpha |
The significance level. The 100(1-alpha)% global envelope will be calculated under the 'fwer' or 'fdr' control. If a vector of values is provided, the global envelopes are calculated for each value. |
probs |
A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017). |
central |
Either "mean" or "median". If the curve sets do not contain the component
|
... |
Additional parameters to be passed to |
Mrkvička, T., Myllymäki, M. and Hahn, U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5): 1239–1255. DOI: 10.1007/s11222-016-9683-9
Myllymäki, M. and Mrkvička, T. (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
if(require("spatstat.explore", quietly=TRUE)) { # As an example test CSR of the saplings point pattern from spatstat by means of # L, F, G and J functions. data("saplings") X <- as.ppp(saplings, W=square(75)) nsim <- 499 # Number of simulations for the tests # Specify distances for different test functions n <- 500 # the number of r-values rmin <- 0; rmax <- 20; rstep <- (rmax-rmin)/n rminJ <- 0; rmaxJ <- 8; rstepJ <- (rmaxJ-rminJ)/n r <- seq(0, rmax, by=rstep) # r-distances for Lest rJ <- seq(0, rmaxJ, by=rstepJ) # r-distances for Fest, Gest, Jest # Perform simulations of CSR and calculate the L-functions env_L <- envelope(X, nsim=nsim, simulate=expression(runifpoint(ex=X)), fun="Lest", correction="translate", transform=expression(.-r), # Take the L(r)-r function instead of L(r) r=r, # Specify the distance vector savefuns=TRUE, # Save the estimated functions savepatterns=TRUE) # Save the simulated patterns # Take the simulations from the returned object simulations <- attr(env_L, "simpatterns") # Then calculate the other test functions F, G, J for each simulated pattern env_F <- envelope(X, nsim=nsim, simulate=simulations, fun="Fest", correction="Kaplan", r=rJ, savefuns=TRUE) env_G <- envelope(X, nsim=nsim, simulate=simulations, fun="Gest", correction="km", r=rJ, savefuns=TRUE) env_J <- envelope(X, nsim=nsim, simulate=simulations, fun="Jest", correction="none", r=rJ, savefuns=TRUE) # Crop the curves to the desired r-interval I curve_set_L <- crop_curves(env_L, r_min=rmin, r_max=rmax) curve_set_F <- crop_curves(env_F, r_min=rminJ, r_max=rmaxJ) curve_set_G <- crop_curves(env_G, r_min=rminJ, r_max=rmaxJ) curve_set_J <- crop_curves(env_J, r_min=rminJ, r_max=rmaxJ) # The combined directional quantile envelope test res <- combined_scaled_MAD_envelope_test( curve_sets=list(L=curve_set_L, F=curve_set_F, G=curve_set_G, J=curve_set_J), type="qdir") plot(res) }
if(require("spatstat.explore", quietly=TRUE)) { # As an example test CSR of the saplings point pattern from spatstat by means of # L, F, G and J functions. data("saplings") X <- as.ppp(saplings, W=square(75)) nsim <- 499 # Number of simulations for the tests # Specify distances for different test functions n <- 500 # the number of r-values rmin <- 0; rmax <- 20; rstep <- (rmax-rmin)/n rminJ <- 0; rmaxJ <- 8; rstepJ <- (rmaxJ-rminJ)/n r <- seq(0, rmax, by=rstep) # r-distances for Lest rJ <- seq(0, rmaxJ, by=rstepJ) # r-distances for Fest, Gest, Jest # Perform simulations of CSR and calculate the L-functions env_L <- envelope(X, nsim=nsim, simulate=expression(runifpoint(ex=X)), fun="Lest", correction="translate", transform=expression(.-r), # Take the L(r)-r function instead of L(r) r=r, # Specify the distance vector savefuns=TRUE, # Save the estimated functions savepatterns=TRUE) # Save the simulated patterns # Take the simulations from the returned object simulations <- attr(env_L, "simpatterns") # Then calculate the other test functions F, G, J for each simulated pattern env_F <- envelope(X, nsim=nsim, simulate=simulations, fun="Fest", correction="Kaplan", r=rJ, savefuns=TRUE) env_G <- envelope(X, nsim=nsim, simulate=simulations, fun="Gest", correction="km", r=rJ, savefuns=TRUE) env_J <- envelope(X, nsim=nsim, simulate=simulations, fun="Jest", correction="none", r=rJ, savefuns=TRUE) # Crop the curves to the desired r-interval I curve_set_L <- crop_curves(env_L, r_min=rmin, r_max=rmax) curve_set_F <- crop_curves(env_F, r_min=rminJ, r_max=rmaxJ) curve_set_G <- crop_curves(env_G, r_min=rminJ, r_max=rmaxJ) curve_set_J <- crop_curves(env_J, r_min=rminJ, r_max=rmaxJ) # The combined directional quantile envelope test res <- combined_scaled_MAD_envelope_test( curve_sets=list(L=curve_set_L, F=curve_set_F, G=curve_set_G, J=curve_set_J), type="qdir") plot(res) }
Create a curve set consisting of a set of images, given a list containing
the values of the 2d functions in the right form.
Only 2d functions in a rectangular windows are supported; the values are provided
in matrices (arrays). For more general 2d functions see create_curve_set
.
create_image_set(image_set, ...)
create_image_set(image_set, ...)
image_set |
A list containing elements |
... |
Do not use. (For internal use only.) |
The given list as a curve_set
.
a <- create_image_set(list(obs=array(runif(4*5*6), c(4,5,6)))) plot(a) plot(a, idx=1:6) a <- create_image_set(list(r=list(x=c(10,20,30,40), y=1:5*0.1), obs=array(runif(4*5*6), c(4,5,6)))) plot(a) a <- create_image_set(list(r=list(xmin=c(1, 2, 4, 7), xmax=c(2, 4, 7, 11), ymin=c(1,1.1,2,2.1,3), ymax=c(1.1,2,2.1,3,3.1)), obs=array(runif(4*5*6), c(4,5,6)))) plot(a) plot(a, idx=1:5)
a <- create_image_set(list(obs=array(runif(4*5*6), c(4,5,6)))) plot(a) plot(a, idx=1:6) a <- create_image_set(list(r=list(x=c(10,20,30,40), y=1:5*0.1), obs=array(runif(4*5*6), c(4,5,6)))) plot(a) a <- create_image_set(list(r=list(xmin=c(1, 2, 4, 7), xmax=c(2, 4, 7, 11), ymin=c(1,1.1,2,2.1,3), ymax=c(1.1,2,2.1,3,3.1)), obs=array(runif(4*5*6), c(4,5,6)))) plot(a) plot(a, idx=1:5)
Crop the curves to a certain interval, or crop missing and infinite argument values from the curves
crop_curves(curve_set, allfinite = TRUE, r_min = NULL, r_max = NULL)
crop_curves(curve_set, allfinite = TRUE, r_min = NULL, r_max = NULL)
curve_set |
A |
allfinite |
Logical. TRUE means that the argument values where any of the
curves have missing or infinite values are removed. FALSE means that only
|
r_min |
The minimum radius to include. |
r_max |
The maximum radius to include. |
The curves can be cropped to a certain interval defined by the arguments r_min and r_max.
Also the argument values of the sets of curves which have missing or infinite
values for any of the curves can be removed from the set (allfinite = TRUE
).
The interval should generally be chosen carefully for classical deviation tests.
A curve_set object containing the cropped summary functions and the cropped radius vector.
Create a curve_set object out of data provided in the right form.
curve_set( obs, sim = NULL, r = NULL, theo = NULL, allfinite = FALSE, verbose = TRUE ) create_curve_set(curve_set, allfinite = FALSE, verbose = TRUE)
curve_set( obs, sim = NULL, r = NULL, theo = NULL, allfinite = FALSE, verbose = TRUE ) create_curve_set(curve_set, allfinite = FALSE, verbose = TRUE)
obs |
The observed data. See details. |
sim |
The simulated data. See details. |
r |
The argument values where the functions/vectors have been observed (or simulated). See details. |
theo |
The theoretical function. See details. |
allfinite |
Logical. TRUE requires that all values of the curves must be
finite (not infinite and not missing, see |
verbose |
Logical flag indicating whether to warn about the content. |
curve_set |
A list containing the element obs, and optionally the elements r, sim_m (same as sim, but in matrix format only) and theo. See details. |
The functions are used to clump together the functional data in the form
that can be handled by the other GET functions (forder
,
central_region
, global_envelope_test
etc.).
The functions take care of checking the content of the data,
and save relevant information of the curves for global envelope
methods to be used in particular for plotting the results with graphical
interpretation.
obs
must be either
a vector containing the data function/vector, or
a matrix containing the s data functions/vectors, in which case it is assumed that each column corresponds to a data function/vector, or
a list containing the s data functions/vectors.
If obs
is a vector, sim
must be either
a matrix containing the simulated functions/vectors, each column
corresponding to a function/vector and the number of rows matching the length
of obs
, or
a list containing the simulated functions/vectors.
If obs
is a matrix or a list, sim
is ignored.
If given, r
describes the 1- or 2-dimensional argument values where the functions/vectors
have been observed (or simulated). It must be either
a vector,
a data.frame with columns "x", "y", "width" and "height", where the width and height give the width and height of the pixels placed at x and y, or
a data.frame with columns "xmin", "xmax", "ymin" and "ymax" giving the corner coordinates of the pixels where the data have been observed.
If obs
is a vector, theo
can be given and it should then correspond
to a theoretical function (e.g., under the null hypothesis). If present, its
length must match the length of obs
.
An object of class curve_set
containing the data.
If the argument values are two-dimensional, then the curve_set
is additionally
a curve_set2d
object.
plot.curve_set
, plot.curve_set2d
# 1d cset <- curve_set(r = 1:10, obs = matrix(runif(10*5), ncol=5)) plot(cset) # 2d cset <- curve_set(r = data.frame(x=c(rep(1:3, 3), 4), y=c(rep(1:3, each=3), 1), width=1, height=1), obs = matrix(runif(10*5), ncol=5)) plot(cset)
# 1d cset <- curve_set(r = 1:10, obs = matrix(runif(10*5), ncol=5)) plot(cset) # 2d cset <- curve_set(r = data.frame(x=c(rep(1:3, 3), 4), y=c(rep(1:3, each=3), 1), width=1, height=1), obs = matrix(runif(10*5), ncol=5)) plot(cset)
Crop the curve set to the interval of distances [r_min, r_max],
calculate residuals, scale the residuals and perform a deviation test
with a chosen deviation measure.
The deviation tests are well known in spatial statistics; in GET they are
provided for comparative purposes. Some (maximum type) of the deviation test
have their corresponding envelope tests available, see Myllymäki et al., 2017
(and 'unscaled', 'st' and 'qdir' in global_envelope_test
).
deviation_test( curve_set, r_min = NULL, r_max = NULL, use_theo = TRUE, scaling = "qdir", measure = "max", savedevs = FALSE )
deviation_test( curve_set, r_min = NULL, r_max = NULL, use_theo = TRUE, scaling = "qdir", measure = "max", savedevs = FALSE )
curve_set |
A residual curve_set object. Can be obtained by using residual(). |
r_min |
The minimum radius to include. |
r_max |
The maximum radius to include. |
use_theo |
Whether to use the theoretical summary function or the mean of the functions in the curve_set. |
scaling |
The name of the scaling to use. Options include 'none', 'q', 'qdir' and 'st'. 'qdir' is default. |
measure |
The deviation measure to use. Default is 'max'. Must be one of the following: 'max', 'int' or 'int2'. |
savedevs |
Logical. Should the global rank values k_i, i=1,...,nsim+1 be returned? Default: FALSE. |
The deviation test is based on a test function and it works as follows:
1) The test function estimated for the data, , and for nsim simulations
from the null model,
, must be saved in 'curve_set'
and given to the deviation_test function.
2) The deviation_test function then
Crops the functions to the chosen range of distances .
If the curve_set does not consist of residuals (see residual
),
then the residuals are calculated, where
is the
expectation of
under the null hypothesis.
If use_theo = TRUE, the theoretical value given in the curve_set$theo is used for
as
, if it is given. Otherwise,
is estimated by the mean of
,
.
Scales the residuals. Options are
'none' No scaling. Nothing done.
'q' Quantile scaling.
'qdir' Directional quantile scaling.
'st' Studentised scaling.
See for details Myllymäki et al. (2013).
Calculates the global deviation measure ,
, see options
for 'measure'.
'max' is the maximum deviation measure
'int2' is the integral deviation measure
'int' is the 'absolute' integral deviation measure
Calculates the p-value.
Currently, there is no special way to take care of the same values of
occuring possibly for small distances. Thus, it is preferable to exclude from
the test the very small distances r for which ties occur.
If 'savedevs=FALSE' (default), the p-value is returned.
If 'savedevs=TRUE', then a list containing the p-value and calculated deviation measures
,
(where
corresponds to the data pattern) is returned.
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11: 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381–404. doi: 10.1111/rssb.12172
## Testing complete spatial randomness (CSR) #------------------------------------------- if(require("spatstat.explore", quietly=TRUE)) { pp <- unmark(spruces) nsim <- 999 # Generate nsim simulations under CSR, calculate L-function for the data and simulations env <- envelope(pp, fun="Lest", nsim=nsim, savefuns=TRUE, correction="translate") # The deviation test using the integral deviation measure res <- deviation_test(env, measure='int') res # or res <- deviation_test(env, r_min=0, r_max=7, measure='int2') }
## Testing complete spatial randomness (CSR) #------------------------------------------- if(require("spatstat.explore", quietly=TRUE)) { pp <- unmark(spruces) nsim <- 999 # Generate nsim simulations under CSR, calculate L-function for the data and simulations env <- envelope(pp, fun="Lest", nsim=nsim, savefuns=TRUE, correction="translate") # The deviation test using the integral deviation measure res <- deviation_test(env, measure='int') res # or res <- deviation_test(env, r_min=0, r_max=7, measure='int2') }
Fallen trees
data("fallen_trees")
data("fallen_trees")
A list
of two data frames, where trees
contains the locations (x and y coordinates)
and heights (=marks) of 232 trees in a window with polygonal boundary, and window
species the polygonal
window (see examples).
The dataset comprised the locations and heights of 232 trees, which fell during two large wind gusts (1967 and 1990) in the west of France (Pontailler et al., 1997). The study area was a biological reserve, which had been preserved for at least four centuries, with little human influence for a long period (Guinier, 1950). Thus, the forest stand followed almost natural dynamics. It was an uneven-aged beech stand with a few old oaks.
The data was analysed in Myllymäki et al. (2017, Supplementary material).
Guinier, P. (1950) Foresterie et protection de la nature. l’exemple de fontainebleau. Rev Forestière Fr., II, 703-717.
Pontailler, J.-Y., Faille, A. and Lemée, G. (1997) Storms drive successional dynamics in natural forests: a case study in fontainebleau forest (france). Forest Ecol. Manag., 98, 1-15.
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381–404. doi: 10.1111/rssb.12172
data("fallen_trees") if(require("spatstat.geom", quietly=TRUE)) { fallen_trees <- as.ppp(fallen_trees$trees, W = owin(poly=fallen_trees$window)) plot(fallen_trees) }
data("fallen_trees") if(require("spatstat.geom", quietly=TRUE)) { fallen_trees <- as.ppp(fallen_trees$trees, W = owin(poly=fallen_trees$window)) plot(fallen_trees) }
Functional boxplot based on central region computed by a specified measure.
The options of the measures can be found in central_region
.
fBoxplot(curve_sets, factor = 1.5, coverage = 0.5, ...)
fBoxplot(curve_sets, factor = 1.5, coverage = 0.5, ...)
curve_sets |
A |
factor |
The constant factor to inflate the central region to produce a functional boxplot and determine fences for outliers. Default is 1.5 as in a classical boxplot. |
coverage |
A number between 0 and 1. The 100*coverage% central region will be calculated. A vector of values can also be provided, leading to the corresponding number of central regions. |
... |
Additional parameters to be passed to |
if(requireNamespace("fda", quietly=TRUE)) { years <- paste(1:18) curves <- fda::growth[['hgtf']][years,] # Heights cset1 <- curve_set(r = as.numeric(years), obs = curves) bp <- fBoxplot(cset1, coverage=0.50, type="area", factor=1) plot(bp) # Considering simultaneously heights and height differences cset2 <- curve_set(r = as.numeric(years[-1]), obs = curves[-1,] - curves[-nrow(curves),]) csets <- list(Height=cset1, Change=cset2) res <- fBoxplot(csets, type='area', factor=1.5) plot(res) + ggplot2::labs(x="Age (years)", y="") }
if(requireNamespace("fda", quietly=TRUE)) { years <- paste(1:18) curves <- fda::growth[['hgtf']][years,] # Heights cset1 <- curve_set(r = as.numeric(years), obs = curves) bp <- fBoxplot(cset1, coverage=0.50, type="area", factor=1) plot(bp) # Considering simultaneously heights and height differences cset2 <- curve_set(r = as.numeric(years[-1]), obs = curves[-1,] - curves[-nrow(curves),]) csets <- list(Height=cset1, Change=cset2) res <- fBoxplot(csets, type='area', factor=1.5) plot(res) + ggplot2::labs(x="Age (years)", y="") }
Functional clustering based on a specified measure.
The options of the measures can be found in central_region
.
fclustering( curve_sets, k, type = c("area", "st", "erl", "cont"), triangineq = FALSE, ... )
fclustering( curve_sets, k, type = c("area", "st", "erl", "cont"), triangineq = FALSE, ... )
curve_sets |
A |
k |
The number of clusters. |
type |
The measure which is used to compute the dissimilarity matrix. The preferred options
are |
triangineq |
Logical. Whether or not to compute the proportion of combinations of functions which satisfies the triangular inequality, see 'Value'. |
... |
Additional parameters to be passed to |
Functional clustering joins the list of curve_set
objects in one curve_set
with long functions and
applies on the differences of all functions the specified measure. This provides a dissimilarity matrix
which is used in partitioning around medoids procedure. The resulting clusters can then be shown by plotting
the function respectively for each curve_set
. Thus for each curve_set
, the panel with all the medoids
is shown followed by all clusters represented by central region, medoid and all curves belonging to it, when
the result object is plotted.
If there are less than three curves in some of the groups, then the central region is not plotted. This leads to a warning message from ggplot2.
An object having the class fclust
, containing
curve_sets = The set(s) of functions determined for clustering
k = Number of clusters
type = Type of clustering method
triangineq = The proportion of combinations of functions which satisfies the triangular inequality. The triangular inequality must hold to ensure the chosen measure forms a metric. In some weird cases it does not hold for ‘area’ measure, therefore this check is provided to ensure the data forms metric with the ‘area’ measure. The triangineq must be 1 to ensure the inequality holds for all functions.
dis = The joined dissimilarity matrix
pam = Results of the partitioning around medoids (pam) method applied on the joined functions
with the dissimilarity matrix (dis). See pam
.
Dai, W., Athanasiadis, S., Mrkvička, T. (2021) A new functional clustering method with combined dissimilarity sources and graphical interpretation. Intech open, London, UK. DOI: 10.5772/intechopen.100124
# Read raw data from population growth rdata # with countries over million inhabitants data("popgrowthmillion") # Create centred data m <- apply(popgrowthmillion, 2, mean) # Country-wise means cpopgrowthmillion <- popgrowthmillion for(i in 1:dim(popgrowthmillion)[1]) { cpopgrowthmillion[i,] <- popgrowthmillion[i,] - m } # Create scaled data t2 <- function(v) { sqrt(sum(v^2)) } s <- apply(cpopgrowthmillion, 2, t2) spopgrowthmillion <- popgrowthmillion for(i in 1:dim(popgrowthmillion)[1]) { spopgrowthmillion[i,] <- cpopgrowthmillion[i,]/s } # Create curve sets r <- 1951:2015 cset1 <- curve_set(r = r, obs = popgrowthmillion) cset2 <- curve_set(r = r, obs = spopgrowthmillion) csets <- list(Raw = cset1, Shape = cset2) # Functional clustering with respect to joined "st" difference measure # and "joined" central regions of each group res <- fclustering(csets, k=3, type="area") p <- plot(res, plotstyle = "marginal", coverage = 0.5) p[[1]] # Central functions p[[2]] # Groups: central functions and regions # To collect the two figures into one use, e.g., patchwork: if(require("patchwork", quietly=TRUE)) { p[[1]] + p[[2]] + plot_layout(widths = c(1, res$k)) } # Silhouette plot of pam plot(res$pam)
# Read raw data from population growth rdata # with countries over million inhabitants data("popgrowthmillion") # Create centred data m <- apply(popgrowthmillion, 2, mean) # Country-wise means cpopgrowthmillion <- popgrowthmillion for(i in 1:dim(popgrowthmillion)[1]) { cpopgrowthmillion[i,] <- popgrowthmillion[i,] - m } # Create scaled data t2 <- function(v) { sqrt(sum(v^2)) } s <- apply(cpopgrowthmillion, 2, t2) spopgrowthmillion <- popgrowthmillion for(i in 1:dim(popgrowthmillion)[1]) { spopgrowthmillion[i,] <- cpopgrowthmillion[i,]/s } # Create curve sets r <- 1951:2015 cset1 <- curve_set(r = r, obs = popgrowthmillion) cset2 <- curve_set(r = r, obs = spopgrowthmillion) csets <- list(Raw = cset1, Shape = cset2) # Functional clustering with respect to joined "st" difference measure # and "joined" central regions of each group res <- fclustering(csets, k=3, type="area") p <- plot(res, plotstyle = "marginal", coverage = 0.5) p[[1]] # Central functions p[[2]] # Groups: central functions and regions # To collect the two figures into one use, e.g., patchwork: if(require("patchwork", quietly=TRUE)) { p[[1]] + p[[2]] + plot_layout(widths = c(1, res$k)) } # Silhouette plot of pam plot(res$pam)
Calculate the FDR envelope based on the ATSE or IATSE algorithm of Mrkvička and Myllymäki (2023).
fdr_envelope( curve_sets, alpha = 0.05, alternative = c("two.sided", "less", "greater"), algorithm = c("IATSE", "ATSE"), lower = NULL, upper = NULL )
fdr_envelope( curve_sets, alpha = 0.05, alternative = c("two.sided", "less", "greater"), algorithm = c("IATSE", "ATSE"), lower = NULL, upper = NULL )
curve_sets |
A |
alpha |
The significance level. The 100(1-alpha)% global envelope will be calculated under the 'fwer' or 'fdr' control. If a vector of values is provided, the global envelopes are calculated for each value. |
alternative |
A character string specifying the alternative hypothesis.
Must be one of the following: "two.sided" (default), "less" or "greater".
The last two options only available for types |
algorithm |
The algorithm for the computation of the FDR envelope. Either "IATSE" or "ATSE" standing for the iteratively adaptive two-stage envelope and the adaptive two-stage envelope, respectively, see Mrkvička and Myllymäki (2023). |
lower |
A single number (or a vector of suitable length) giving a lower bound for the functions. Used only for the extension of the FDR envelope. |
upper |
A single number (or a vector of suitable length) giving an upper bound for the functions. Used only for the extension of the FDR envelope. |
Typical use of this function is through other functions.
fdr_envelope(cset)
is the same as global_envelope_test(cset, typeone = "fdr")
.
Functions such as graph.fanova
, graph.flm
, frank.flm
allow to use the FDR control by specifying typeone = "fdr"
appropriately
(passing this to global_envelope_test
).
Mrkvička and Myllymäki (2023). False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7
# A GLM example data(rimov) nsim <- 1000 # Number of simulations res <- graph.flm(nsim=nsim, formula.full = Y~Year, formula.reduced = Y~1, curve_sets = list(Y=rimov), factors = data.frame(Year = 1979:2014), GET.args = list(typeone = "fdr")) plot(res)
# A GLM example data(rimov) nsim <- 1000 # Number of simulations res <- graph.flm(nsim=nsim, formula.full = Y~Year, formula.reduced = Y~1, curve_sets = list(Y=rimov), factors = data.frame(Year = 1979:2014), GET.args = list(typeone = "fdr")) plot(res)
Calculates different measures for ordering the functions (or vectors) from the most extreme to least extreme one
forder( curve_sets, measure = "erl", scaling = "qdir", alternative = c("two.sided", "less", "greater"), use_theo = TRUE, probs = c(0.025, 0.975), quantile.type = 7 )
forder( curve_sets, measure = "erl", scaling = "qdir", alternative = c("two.sided", "less", "greater"), use_theo = TRUE, probs = c(0.025, 0.975), quantile.type = 7 )
curve_sets |
A |
measure |
The measure to use to order the functions from the most extreme to the least extreme one. Must be one of the following: 'rank', 'erl', 'cont', 'area', 'max', 'int', 'int2'. Default is 'erl'. |
scaling |
The name of the scaling to use if measure is 'max', 'int' or 'int2'. Options include 'none', 'q', 'qdir' and 'st', where 'qdir' is the default. |
alternative |
A character string specifying the alternative hypothesis.
Must be one of the following: "two.sided" (default), "less" or "greater".
The last two options only available for types |
use_theo |
Logical. When calculating the measures 'max', 'int', 'int2',
should the theoretical function from |
probs |
A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017). |
quantile.type |
As type argument of |
Given a curve_set
object or an envelope
object of spatstat,
which contains curves ,
the functions are ordered from the most extreme one to the least extreme one
by one of the following measures (specified by the argument
measure
).
Note that 'erl'
, 'cont'
and 'area'
were proposed as a refinement to
the extreme ranks 'rank'
, because the extreme ranks can contain many ties.
All of these completely non-parametric measures are smallest for the most extreme functions
and largest for the least extreme ones,
whereas the deviation measures ('max'
, 'int'
and 'int2'
) obtain largest values
for the most extreme functions.
'rank'
: extreme rank (Myllymäki et al., 2017).
The extreme rank is defined as the minimum of pointwise ranks of the curve
, where the pointwise rank is the rank of the value of the curve for a
specific r-value among the corresponding values of the s other curves such that the lowest
ranks correspond to the most extreme values of the curves. How the pointwise ranks are determined
exactly depends on the whether a one-sided (
alternative
is "less" or "greater") or the
two-sided test (alternative="two.sided"
) is chosen.
'erl'
: extreme rank length (Myllymäki et al., 2017).
Considering the vector of pointwise ordered ranks of the ith curve,
the extreme rank length measure
is equal to
where if and only if
there exists
such that for the first k,
, pointwise ordered
ranks of
and
are equal and the n'th rank of
is smaller than that of
.
The scaling by
is applied to normalize the ranks following Mrkvička et al. (2019) and Narisetty and Nair (2016).
'cont'
: continuous rank (Hahn, 2015; Mrkvička et al., 2019)
based on minimum of continuous pointwise ranks
'area'
: area rank (Mrkvička et al., 2019) based on area between continuous
pointwise ranks and minimum pointwise ranks for those argument (r) values for which pointwise
ranks achieve the minimum (it is a combination of erl and cont)
'max'
and 'int'
and 'int2'
:
Further options for the measure
argument that can be used together with scaling
.
See the help in deviation_test
for these options of measure
and scaling
.
These measures are largest for the most extreme functions and smallest for the least extreme ones.
The arguments use_theo
and probs
are relevant for these measures only (otherwise ignored).
For details see Myllymäki and Mrkvička et al. (2020, Section 2)
A vector containing one of the above mentioned measures k for each of the functions
in the curve set. If the component obs
in the curve set is a vector, then its measure
will be the first component (named 'obs') in the returned vector.
Hahn U (2015). “A note on simultaneous Monte Carlo tests.” Technical report, Centre for Stochastic Geometry and advanced Bioimaging, Aarhus University.
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381-404. doi: 10.1111/rssb.12172
Narisetty, N. N. and Nair, V. J. (2016) Extremal depth for functional data and applications. Journal of the American Statistical Association 111, 1705-1714.
if(requireNamespace("fda", quietly = TRUE)) { # Consider ordering of the girls in the Berkeley Growth Study data # available from the R package fda, see ?growth, according to their # annual heights or/and changes within years. # First create sets of curves (vectors), for raw heights and # for the differences within the years years <- paste(1:18) curves <- fda::growth[['hgtf']][years,] cset1 <- curve_set(r = as.numeric(years), obs = curves) cset2 <- curve_set(r = as.numeric(years[-1]), obs = curves[-1,] - curves[-nrow(curves),]) # Order the girls from most extreme one to the least extreme one, below using the 'area' measure # a) according to their heights forder(cset1, measure = 'area') # Print the 10 most extreme girl indices order(forder(cset1, measure = 'area'))[1:10] # b) according to the changes (print indices) order(forder(cset2, measure = 'area'))[1:10] # c) simultaneously with respect to heights and changes (print indices) csets <- list(Height = cset1, Change = cset2) order(forder(csets, measure = 'area'))[1:10] }
if(requireNamespace("fda", quietly = TRUE)) { # Consider ordering of the girls in the Berkeley Growth Study data # available from the R package fda, see ?growth, according to their # annual heights or/and changes within years. # First create sets of curves (vectors), for raw heights and # for the differences within the years years <- paste(1:18) curves <- fda::growth[['hgtf']][years,] cset1 <- curve_set(r = as.numeric(years), obs = curves) cset2 <- curve_set(r = as.numeric(years[-1]), obs = curves[-1,] - curves[-nrow(curves),]) # Order the girls from most extreme one to the least extreme one, below using the 'area' measure # a) according to their heights forder(cset1, measure = 'area') # Print the 10 most extreme girl indices order(forder(cset1, measure = 'area'))[1:10] # b) according to the changes (print indices) order(forder(cset2, measure = 'area'))[1:10] # c) simultaneously with respect to heights and changes (print indices) csets <- list(Height = cset1, Change = cset2) order(forder(csets, measure = 'area'))[1:10] }
A one-way functional ANOVA based on the rank envelope applied to F values
frank.fanova( nsim, curve_set, groups, variances = "equal", test.equality = c("mean", "var", "cov"), cov.lag = 1, savefuns = FALSE, ... )
frank.fanova( nsim, curve_set, groups, variances = "equal", test.equality = c("mean", "var", "cov"), cov.lag = 1, savefuns = FALSE, ... )
nsim |
The number of random permutations. |
curve_set |
The original data (an array of functions) provided as a
|
groups |
The original groups (a factor vector representing the assignment to groups). |
variances |
Either "equal" or "unequal". If "equal", then the traditional F-values are used.
If "unequal", then the corrected F-values are used. The current implementation uses
|
test.equality |
A character with possible values
If
where
See Mrkvicka et al. (2020) for more details. |
cov.lag |
The lag of the covariance for testing the equality of covariances,
see |
savefuns |
Logical. If TRUE, then the functions from permutations are saved to the attribute simfuns. |
... |
Additional parameters to be passed to |
The test assumes that there are groups which contain
functions
.
The functions should be given in the argument x, and the groups in the argument groups.
The test assumes that there exists non random functions
and
such that
where are independent and normally distributed.
The test vector is
where stands for the F-statistic. The simulations are performed by
permuting the test functions. Further details can be found in Mrkvička et al. (2020).
The argument variances="equal"
means that equal variances across groups are assumed.
The correction for unequal variances can be done by using the corrected F-statistic
(option variances="unequal"
).
Unfortunately this test is not able to detect which groups are different from each other.
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56 (3), 432-458. doi: 10.14736/kyb-2020-3-0432
graph.fanova
data("rimov") groups <- factor(c(rep(1, times=12), rep(2, times=12), rep(3, times=12))) res <- frank.fanova(nsim = 2499, curve_set = rimov, groups = groups) plot(res) data("imageset3") res2 <- frank.fanova(nsim = 19, # Increase nsim for serious analysis! curve_set = imageset3$image_set, groups = imageset3$Group) plot(res2) plot(res2, fixedscales=FALSE)
data("rimov") groups <- factor(c(rep(1, times=12), rep(2, times=12), rep(3, times=12))) res <- frank.fanova(nsim = 2499, curve_set = rimov, groups = groups) plot(res) data("imageset3") res2 <- frank.fanova(nsim = 19, # Increase nsim for serious analysis! curve_set = imageset3$image_set, groups = imageset3$Group) plot(res2) plot(res2, fixedscales=FALSE)
Multiple testing in permutation inference for the general linear model (GLM)
frank.flm( nsim, formula.full, formula.reduced, curve_sets, factors = NULL, savefuns = TRUE, lm.args = NULL, GET.args = NULL, mc.cores = 1, mc.args = NULL, cl = NULL, method = c("best", "mlm", "lm", "ne") )
frank.flm( nsim, formula.full, formula.reduced, curve_sets, factors = NULL, savefuns = TRUE, lm.args = NULL, GET.args = NULL, mc.cores = 1, mc.args = NULL, cl = NULL, method = c("best", "mlm", "lm", "ne") )
nsim |
The number of random permutations. |
formula.full |
The formula specifying the general linear model,
see |
formula.reduced |
The formula of the reduced model with nuisance factors only. This model should be nested within the full model. |
curve_sets |
A named list of sets of curves giving the dependent variable (Y), and
possibly additionally factors whose values vary across the argument values of the functions.
The dimensions of the elements should match with each other.
Note that factors that are fixed across the functions can be given in the argument |
factors |
A data frame of factors. An alternative way to specify factors when they are constant for all argument values of the functions. The number of rows of the data frame should be equal to the number of curves. Each column should specify the values of a factor. |
savefuns |
Logical or "return". If TRUE, then the functions from permutations are saved to the attribute simfuns.
If "return", then the function returns the permutations in a curve_set, instead of the result of the envelope test on those;
this can be used by |
lm.args |
A named list of additional arguments to be passed to |
GET.args |
A named list of additional arguments to be passed to |
mc.cores |
The number of cores to use, i.e. at most how many child processes will be run simultaneously.
Must be at least one, and parallelization requires at least two cores. On a Windows computer mc.cores must be 1
(no parallelization). For details, see |
mc.args |
A named list of additional arguments to be passed to |
cl |
Allows parallelization through the use of |
method |
For advanced use. |
The function frank.flm
performs
a nonparametric test of significance of a covariate in the functional GLM.
Similarly as in the graphical functional GLM (graph.flm
),
the Freedman-Lane algorithm (Freedman and Lane, 1983) is applied to permute the functions
(to obtain the simulations under the null hypothesis of "no effects");
consequently, the test achieves the desired significance level only approximately.
If the reduced model contains only a constant, then the algorithm corresponds to
simple permutation of raw data.
In contrast to the graphical functional GLM, the F rank functional GLM is based on the
F-statistics that are calculated at each argument value of the functions.
The global envelope test is applied to the observed and simulated F-statistics.
The test is able to find if the factor of interest is significant and also which
argument values of the functional domain are responsible for the potential rejection.
The specification of the full and reduced formulas is important. The reduced model should be nested within the full model. The full model should include in addition to the reduced model the interesting factors whose effects are under investigation.
There are different versions of the implementation depending on the application.
If there are no extra arguments given by the user in lm.args
, then a
fast implementation by solving the normal equations is used to directly compute the F-statistics.
If all the covariates are constant across the functions, but there are some extra arguments,
then a linear model is fitted separately by least-squares estimation to
the data at each argument value of the functions fitting a multiple linear model by lm
.
The possible extra arguments passed in lm.args
to lm
must be of the form that
lm
accepts for fitting a multiple linear model. In the basic case, no extra arguments are
needed.
If some of the covariates vary across the space and there are user specified extra arguments given in
lm.args
, then the implementation fits a linear model at each argument value of the functions using
lm
, which can be rather slow. The arguments lm.args
are passed to lm
for fitting each linear model.
By default the fastest applicable method is used. This can be changed by setting method
argument.
The cases above correspond to "ne", "mlm" and "lm". Changing the default can be useful for
checking the validity of the implementation.
A global_envelope
object, which can be printed and plotted directly.
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Freedman, D., & Lane, D. (1983) A nonstochastic interpretation of reported significance levels. Journal of Business & Economic Statistics 1(4), 292-298. doi:10.2307/1391660
data("GDPtax") factors.df <- data.frame(Group = GDPtax$Group, Tax = GDPtax$Profittax) nsim <- 999 res.tax_within_group <- frank.flm(nsim = nsim, formula.full = Y~Group+Tax+Group:Tax, formula.reduced = Y~Group+Tax, curve_sets = list(Y=GDPtax$GDP), factors = factors.df) plot(res.tax_within_group) # Image set examples data("abide_9002_23") iset <- abide_9002_23$curve_set res.F <- frank.flm(nsim = 19, # Increase nsim for serious analysis! formula.full = Y ~ Group + Age + Sex, formula.reduced = Y ~ Age + Sex, curve_sets = list(Y = iset), factors = abide_9002_23[['factors']], GET.args = list(type = "area")) plot(res.F)
data("GDPtax") factors.df <- data.frame(Group = GDPtax$Group, Tax = GDPtax$Profittax) nsim <- 999 res.tax_within_group <- frank.flm(nsim = nsim, formula.full = Y~Group+Tax+Group:Tax, formula.reduced = Y~Group+Tax, curve_sets = list(Y=GDPtax$GDP), factors = factors.df) plot(res.tax_within_group) # Image set examples data("abide_9002_23") iset <- abide_9002_23$curve_set res.F <- frank.flm(nsim = 19, # Increase nsim for serious analysis! formula.full = Y ~ Group + Age + Sex, formula.reduced = Y ~ Age + Sex, curve_sets = list(Y = iset), factors = abide_9002_23[['factors']], GET.args = list(type = "area")) plot(res.F)
Gross domestic product (GDP)
data("GDP")
data("GDP")
A curve_set
object containing the GDP values for different countries.
The data here was constructed based on the following data:
The GDP data are publicly available at
https://data.worldbank.org/indicator/NY.GDP.PCAP.CD.
The excel file that we downloaded was called
API_NY.GDP.PCAP.CD_DS2_en_excel_v2_3358980.xls
.
The inflation rates are publicly available at
https://data.worldbank.org/indicator/NY.GDP.DEFL.KD.ZG.
The excel file that we downloaded was called
API_NY.GDP.DEFL.KD.ZG_DS2_en_excel_v2_3469555.xls
,
from there we took only the inflation rates for United States.
Both are distributed under the CC-BY 4.0 license (see
https://datacatalog.worldbank.org/public-licenses#cc-by).
Then we discounted the GDP of every country in the study to the 1960 USD,
and we extrapolated the missing values of the GDP of a country using the
closest known ratio of the GDP of the country and the median GDP in that year.
Further, the missing values of GDP were interpolated using linear
interpolation of the two closest ratios. Appendix of the
vignette(FDRenvelopes)
includes the code to prepare the
curve_set
object.
GDP per capita with country groups and profit tax
data("GDPtax")
data("GDPtax")
A list of a three components. The first one (GDP
) is a curve_set
object with components r
and obs
containing the years of observations and the GDP curves, i.e. the observed values of GDP in those years.
Each column of obs
contains the GDP for the years for a particular country (seen as column names).
The country grouping is given in the list component Group
and the profit tax in Profittax
.
The data includes the GDP per capita (current US$) for years 1980-2017 (World Bank national accounts data, and OECD National Accounts data files). The data have been downloaded from the webpage https://datamarket.com/data/set/15c9/gdp-per-capita-current-us#!ds=15c9!hd1&display=line, distributed under the CC-BY 4.0 (https://datacatalog.worldbank.org/public-licenses#cc-by). From the same webpage the profit tax in 2010 (World Bank, Doing Business Project (http://www.doingbusiness.org/ExploreTopics/PayingTaxes/) and Total tax rate ( were downloaded. Furthermore, different country groups were formed from countries for which the GDP was available for 1980-2017 and profit tax for 2010:
Group 1 (Major Advanced Economies (G7)): "Canada", "France", "Germany", "Italy", Japan"
Group 2 (Euro Area excluding G7): "Austria", "Belgium", "Cyprus", "Finland", "Greece", "Ireland", "Luxembourg", "Netherlands", "Portugal", "Spain"
Group 3 (Other Advanced Economies (Advanced Economies excluding G7 and Euro Area)): "Australia", "Denmark", "Iceland", "Norway", "Sweden", "Switzerland"
Group 4 (Emerging and Developing Asia): "Bangladesh", "Bhutan", "China", "Fiji", "India", "Indonesia", "Malaysia", "Nepal", "Philippines", "Thailand", "Vanuatu"
World Bank national accounts data, and OECD National Accounts data files. URL: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD World Bank, Doing Business Project (http://www.doingbusiness.org/ExploreTopics/PayingTaxes/). URL: https://data.worldbank.org/indicator/IC.TAX.PRFT.CP.ZS
data("GDPtax") # Plot data in groups for(i in 1:4) assign(paste0("p", i), plot(subset(GDPtax$GDP, GDPtax$Group == i)) + ggplot2::labs(title=paste("Group ", i, sep=""), y="GDP")) p4 if(require("patchwork", quietly=TRUE)) p1 + p2 + p3 + p4
data("GDPtax") # Plot data in groups for(i in 1:4) assign(paste0("p", i), plot(subset(GDPtax$GDP, GDPtax$Group == i)) + ggplot2::labs(title=paste("Group ", i, sep=""), y="GDP")) p4 if(require("patchwork", quietly=TRUE)) p1 + p2 + p3 + p4
geom_central_region
and stat_central_region
can be used to compute
and plot central_region
from data arranged in a data.frame
.
geom_central_region( mapping = NULL, data = NULL, stat = "CentralRegion", position = "identity", ..., coverage = 0.5, type = "erl", filled = TRUE, drawcenterline = TRUE, colours = grey.colors(length(coverage), start = 0.9, end = 0.5), na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_central_region( mapping = NULL, data = NULL, position = "identity", ..., coverage = 0.5, type = "erl", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
geom_central_region( mapping = NULL, data = NULL, stat = "CentralRegion", position = "identity", ..., coverage = 0.5, type = "erl", filled = TRUE, drawcenterline = TRUE, colours = grey.colors(length(coverage), start = 0.9, end = 0.5), na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_central_region( mapping = NULL, data = NULL, position = "identity", ..., coverage = 0.5, type = "erl", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
The statistical transformation to use on the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
... |
Other arguments passed on to
|
coverage |
A number between 0 and 1. The 100*coverage% central region will be calculated. A vector of values can also be provided, leading to the corresponding number of central regions. |
type |
The options and details for |
filled |
Boolean. Should the ribbon be filled? |
drawcenterline |
Boolean. Should the center line be drawn? |
colours |
Colours for different coverage levels |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
Plots of central regions (global envelopes) with the specified coverage
and type
(see central_region
).
When splitting the set of functions to groups by aesthetics or facets, see
examples, the central regions are constructed separately for each group,
each having the specified coverage
.
If Nfunc*(1-coverage) < 1, where Nfunc is the number of functions/curves, the curves are plotted instead of any region.
geom_central_region
requires x
, y
and curveid
.
Additionally geom_central_region
uses the same aesthetics as
geom_ribbon
if filled==TRUE
and
geom_line
otherwise.
For multiple coverage values additional aesthetics are not currently supported.
stat_central_region
computes
after_stat(ymax)
and after_stat(ymin)
for the high and low value of the central region.
For multiple coverages the variables use the same names as central_region
,
i.e. hi.95
and lo.95
for the region with 95% coverage.
central_region
for the basic computation and,
geom_ribbon
for the default base geom.
require("ggplot2") ## Generate some data #------------------------------------------------------ # Simulate regression data according to the cubic model # f(x) = 0.8x - 1.8x^2 + 1.05x^3 for x in [0,1] par <- c(0,0.8,-1.8,1.05) # Parameters of the true polynomial model res <- 100 # Resolution x <- seq(0, 1, by=1/res); x2=x^2; x3=x^3; f <- par[1] + par[2]*x + par[3]*x^2 + par[4]*x^3 # The true function d <- f + rnorm(length(x), 0, 0.04) # Data # Estimate polynomial regression model reg <- lm(d ~ x + x2 + x3) ftheta <- reg$fitted.values resid0 <- reg$residuals # Bootstrap regression B <- 200 # Number of bootstrap samples df <- NULL for(i in 1:B) { u <- sample(resid0, size=length(resid0), replace=TRUE) reg1 <- lm((ftheta+u) ~ x + x2 + x3) df <- rbind(df, data.frame(y=reg1$fitted.values, x=x, i=i, g=ifelse(i<14, "A", "B"), g2=ifelse(i<100, "A", "B"))) } ggplot(df) + geom_line(aes(x, y, group=i)) ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=0.50) ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=0.50, filled=FALSE) # Central regions for two groups as specified by 'g2' ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i, col=g2), coverage=0.90, filled=FALSE) ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=0.90) + facet_wrap(vars(g2)) # Central regions with multiple coverage levels ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=c(0.2,0.4,0.6)) + theme_minimal() ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=seq(0.1, 0.9, length=20), colours=rainbow(20)) # Colors for multiregions are not supported ggplot(df) + geom_central_region(aes(x=x, y=y+0.1*(g2=="B"), curveid=i, col=as.factor(g2)), coverage=c(0.05, 0.2,0.4,0.6)) ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=c(0.05, 0.2,0.4,0.6)) + facet_wrap(vars(g2)) # Using stat_central_region with geom_linerange and geom_rect ggplot(df) + geom_linerange(aes(curveid=i, x=x, y=y, ymax=after_stat(ymax), ymin=after_stat(ymin), group=g2, col=factor(g2)), stat="central_region", coverage = 0.90, position=position_dodge(0.01)) ggplot(within(df, {x = x+0.004*(g2=="B")})) + geom_rect(aes(curveid=i, x=x, y=y, xmax=after_stat(x), xmin=after_stat(x+0.004), ymax=after_stat(ymax), ymin=after_stat(ymin), group=g2, fill=factor(g2)), stat="central_region", coverage = 0.90) # Non-finite values are not supported ggplot(within(df, {y = ifelse(runif(length(y)) < 0.001, Inf, y)})) + geom_central_region(aes(x=x, y=y, curveid=i))
require("ggplot2") ## Generate some data #------------------------------------------------------ # Simulate regression data according to the cubic model # f(x) = 0.8x - 1.8x^2 + 1.05x^3 for x in [0,1] par <- c(0,0.8,-1.8,1.05) # Parameters of the true polynomial model res <- 100 # Resolution x <- seq(0, 1, by=1/res); x2=x^2; x3=x^3; f <- par[1] + par[2]*x + par[3]*x^2 + par[4]*x^3 # The true function d <- f + rnorm(length(x), 0, 0.04) # Data # Estimate polynomial regression model reg <- lm(d ~ x + x2 + x3) ftheta <- reg$fitted.values resid0 <- reg$residuals # Bootstrap regression B <- 200 # Number of bootstrap samples df <- NULL for(i in 1:B) { u <- sample(resid0, size=length(resid0), replace=TRUE) reg1 <- lm((ftheta+u) ~ x + x2 + x3) df <- rbind(df, data.frame(y=reg1$fitted.values, x=x, i=i, g=ifelse(i<14, "A", "B"), g2=ifelse(i<100, "A", "B"))) } ggplot(df) + geom_line(aes(x, y, group=i)) ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=0.50) ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=0.50, filled=FALSE) # Central regions for two groups as specified by 'g2' ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i, col=g2), coverage=0.90, filled=FALSE) ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=0.90) + facet_wrap(vars(g2)) # Central regions with multiple coverage levels ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=c(0.2,0.4,0.6)) + theme_minimal() ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=seq(0.1, 0.9, length=20), colours=rainbow(20)) # Colors for multiregions are not supported ggplot(df) + geom_central_region(aes(x=x, y=y+0.1*(g2=="B"), curveid=i, col=as.factor(g2)), coverage=c(0.05, 0.2,0.4,0.6)) ggplot(df) + geom_central_region(aes(x=x, y=y, curveid=i), coverage=c(0.05, 0.2,0.4,0.6)) + facet_wrap(vars(g2)) # Using stat_central_region with geom_linerange and geom_rect ggplot(df) + geom_linerange(aes(curveid=i, x=x, y=y, ymax=after_stat(ymax), ymin=after_stat(ymin), group=g2, col=factor(g2)), stat="central_region", coverage = 0.90, position=position_dodge(0.01)) ggplot(within(df, {x = x+0.004*(g2=="B")})) + geom_rect(aes(curveid=i, x=x, y=y, xmax=after_stat(x), xmin=after_stat(x+0.004), ymax=after_stat(ymax), ymin=after_stat(ymin), group=g2, fill=factor(g2)), stat="central_region", coverage = 0.90) # Non-finite values are not supported ggplot(within(df, {y = ifelse(runif(length(y)) < 0.001, Inf, y)})) + geom_central_region(aes(x=x, y=y, curveid=i))
Adjusted global envelope tests for composite null hypothesis.
GET.composite( X, X.ls = NULL, nsim = 499, nsimsub = nsim, simfun = NULL, fitfun = NULL, calcfun = function(X) { X }, testfuns = NULL, ..., type = "erl", alpha = 0.05, alternative = c("two.sided", "less", "greater"), probs = c(0.025, 0.975), r_min = NULL, r_max = NULL, take_residual = FALSE, save.cons.envelope = savefuns, savefuns = FALSE, verbose = TRUE, MrkvickaEtal2017 = FALSE, mc.cores = 1L )
GET.composite( X, X.ls = NULL, nsim = 499, nsimsub = nsim, simfun = NULL, fitfun = NULL, calcfun = function(X) { X }, testfuns = NULL, ..., type = "erl", alpha = 0.05, alternative = c("two.sided", "less", "greater"), probs = c(0.025, 0.975), r_min = NULL, r_max = NULL, take_residual = FALSE, save.cons.envelope = savefuns, savefuns = FALSE, verbose = TRUE, MrkvickaEtal2017 = FALSE, mc.cores = 1L )
X |
An object containing the data in some form.
A |
X.ls |
A list of objects as |
nsim |
The number of simulations to be generated in the primary test.
Ignored if |
nsimsub |
Number of simulations in each basic test. There will be |
simfun |
A function for generating simulations from the null model. If given, this function
is called by |
fitfun |
A function for estimating the parameters of the null model.
The function should return the fitted model in the form that it can be directly
passed to |
calcfun |
A function for calculating a summary function from a simulation of the model.
The default is the identity function, i.e. the simulations from the model are functions themselves.
The use of |
testfuns |
A list of lists of parameters to be passed to the |
... |
Additional parameters to the |
type |
The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details. |
alpha |
The significance level. The 100(1-alpha)% global envelope will be calculated under the 'fwer' or 'fdr' control. If a vector of values is provided, the global envelopes are calculated for each value. |
alternative |
A character string specifying the alternative hypothesis.
Must be one of the following: "two.sided" (default), "less" or "greater".
The last two options only available for types |
probs |
A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017). |
r_min |
The minimum argument value to include in the test. |
r_max |
The maximum argument value to include in the test.
in calculating functions by the |
take_residual |
Logical. If TRUE (needed for visual reasons only) the mean of the simulated functions is reduced from the functions in each first and second stage test. |
save.cons.envelope |
Logical flag indicating whether to save the unadjusted envelope test results. |
savefuns |
Logical flag indicating whether to save all the simulated function values.
Similar to the same argument of the |
verbose |
Logical flag indicating whether to print progress reports during the simulations.
Similar to the same argument of |
MrkvickaEtal2017 |
Logical. If TRUE, type is "st" or "qdir" and several test functions are used,
then the combined scaled MAD envelope presented in Mrkvička et al. (2017) is calculated. Otherwise,
the two-step procedure described in |
mc.cores |
The number of cores to use, i.e. at most how many child processes will be run simultaneously.
Must be at least one, and parallelization requires at least two cores. On a Windows computer mc.cores must be 1
(no parallelization). For details, see |
The specification of X, X.ls, fitfun, simfun is important:
If X.ls
is provided, then the global envelope test is calculated based on
functions in these objects. X
should be a curve_set
object,
or an envelope
object of spatstat, including the observed function and simulations
from the tested model. X.ls
should be a list of curve_set
or
envelope (of R package spatstat) objects, where each component contains an "observed"
function f that has been simulated from the model fitted to the data and the simulations
that have been obtained from the same model that has been fitted to the "observed" f.
The user has the responsibility that the functions have been generated correctly,
the test is done based on these provided simulations. See the examples.
Otherwise, if simfun
and fitfun
are provided, X
can be general.
The function fitfun
is used for fitting the desired model M and the function simfun
for simulating from a fitted model M. These functions should be coupled with each other such
that the object returned by fitfun
is directly accepted as the (single) argument in
simfun
.
In the case, that the global envelope should not be calculated directly for X
(X
is
not a function), calcfun
can be used to specify how to calculate the function from
X
and from simulations generated by simfun
.
Special attention is needed in the correct specification of the functions, see examples.
Otherwise, X
should be either a fitted (point process) model object or a ppp
object of the R package spatstat.
If X
is a fitted (point process) model object of the R package spatstat,
then the simulations from this model are generated and summary functions for testing calculated
by the envelope
function of spatstat. Which summary function to use and how to calculate it,
can be passed to envelope
either in ...
or testfuns
.
Unless otherwise specified the default function of envelope
,
i.g. the K-function, is used. The argument testfuns
should be used to specify the
test functions in the case where one wants to base the test on several test functions.
If X
is a ppp
object of spatstat, then the envelope
function
is used for simulations and model fitting and the complete spatial randomness (CSR) is tested
(with intensity parameter).
For the rank envelope test, the global envelope test is the test described in Myllymäki et al. (2017) with the adjustment of Baddeley et al. (2017). For other test types, the test (also) uses the two-stage procedure of Dao and Genton (2014) with the adjustment of Baddeley et al. (2017) as descripbed in Myllymäki and Mrkvička (2020).
See examples also in saplings
.
An object of class global_envelope
or combined_global_envelope
, which can be
printed and plotted directly. See global_envelope_test
.
Baddeley, A., Hardegen, A., Lawrence, T., Milne, R. K., Nair, G. and Rakshit, S. (2017). On two-stage Monte Carlo tests of composite hypotheses. Computational Statistics and Data Analysis 114: 75-87. doi: http://dx.doi.org/10.1016/j.csda.2017.04.003
Dao, N.A. and Genton, M. (2014). A Monte Carlo adjusted goodness-of-fit test for parametric models describing spatial point patterns. Journal of Graphical and Computational Statistics 23, 497-517.
Mrkvička, T., Myllymäki, M. and Hahn, U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5): 1239-1255. doi: 10.1007/s11222-016-9683-9
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381-404. doi: 10.1111/rssb.12172
Myllymäki, M. and Mrkvička, T. (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
global_envelope_test
, plot.global_envelope
, saplings
# Graphical normality test (Myllymaki and Mrkvicka, 2020, Section 3.3.) #========================= if(require("fda.usc", quietly=TRUE)) { data("poblenou") dat <- poblenou[['nox']][['data']][,'H10'] n <- length(dat) # The number of simulations nsim <- nsimsub <- 199 set.seed(200127) # General setup #============== # 1. Fit the model mu <- mean(dat) sigma <- sd(dat) # 2. Simulate a sample from the fitted null model and # compute the test vectors for data (obs) and each simulation (sim) r <- seq(min(dat), max(dat), length=100) obs <- stats::ecdf(dat)(r) sim <- sapply(1:nsimsub, function(i) { x <- rnorm(n, mean = mu, sd = sigma) stats::ecdf(x)(r) }) cset <- create_curve_set(list(r = r, obs = obs, sim_m = sim)) # 3. Simulate another sample from the fitted null model. # 4. Fit the null model to each of the patterns, # simulate a sample from the null model, # and compute the test vectors for all. cset.ls <- vector("list", nsim) for(rep in 1:nsim) { x <- rnorm(n, mean = mu, sd = sigma) mu2 <- mean(x) sigma2 <- sd(x) obs2 <- stats::ecdf(x)(r) sim2 <- sapply(1:nsimsub, function(i) { x2 <- rnorm(n, mean = mu2, sd = sigma2) stats::ecdf(x2)(r) }) cset.ls[[rep]] <- create_curve_set(list(r = r, obs = obs2, sim_m = sim2)) } # Perform the adjusted test res <- GET.composite(X = cset, X.ls = cset.ls, type = 'erl') plot(res) + ggplot2::labs(x = "NOx", y = "Ecdf") } # Example of a point pattern data #================================ # Test the fit of a Matern cluster process. if(require("spatstat.model", quietly=TRUE)) { data("saplings") saplings <- as.ppp(saplings, W = square(75)) # First choose the r-distances rmin <- 0.3; rmax <- 10; rstep <- (rmax-rmin)/500 r <- seq(0, rmax, by = rstep) # Number of simulations nsim <- 19 # Increase nsim for serious analysis! # Option 1: Give the fitted model object to GET.composite #-------------------------------------------------------- # This can be done and is preferable when the model is # a point process model of spatstat. # 1. Fit the Matern cluster process to the pattern # (using minimum contrast estimation with the K-function) M1 <- kppm(saplings~1, clusters = "MatClust", statistic = "K") summary(M1) # 2. Make the adjusted global area rank envelope test using the L(r)-r function adjenvL <- GET.composite(X = M1, nsim = nsim, testfuns = list(L =list(fun="Lest", correction="translate", transform=expression(.-r), r=r)), # passed to envelope type = "area", r_min = rmin, r_max = rmax) # Plot the test result plot(adjenvL) # Option 2: Generate the simulations "by yourself" #------------------------------------------------- # and provide them as curve_set or envelope objects # Preferable when you want to have a control # on the simulations yourself. # 1. Fit the model M1 <- kppm(saplings~1, clusters = "MatClust", statistic = "K") # 2. Generate nsim simulations by the given function using the fitted model X <- envelope(M1, nsim = nsim, savefuns = TRUE, fun = "Lest", correction = "translate", transform = expression(.-r), r = r) plot(X) # 3. Create another set of simulations to be used to estimate # the second-state p-value (as proposed by Baddeley et al., 2017). simpatterns2 <- simulate(M1, nsim = nsim) # 4. Calculate the functions for each pattern simf <- function(rep) { # Fit the model to the simulated pattern Xsims[[rep]] sim_fit <- kppm(simpatterns2[[rep]], clusters = "MatClust", statistic = "K") # Generate nsim simulations from the fitted model envelope(sim_fit, nsim = nsim, savefuns = TRUE, fun = "Lest", correction = "translate", transform = expression(.-r), r = r) } X.ls <- parallel::mclapply(X = 1:nsim, FUN = simf, mc.cores = 1) # list of envelope objects # 5. Perform the adjusted test res <- GET.composite(X = X, X.ls = X.ls, type = "area", r_min = rmin, r_max = rmax) plot(res) }
# Graphical normality test (Myllymaki and Mrkvicka, 2020, Section 3.3.) #========================= if(require("fda.usc", quietly=TRUE)) { data("poblenou") dat <- poblenou[['nox']][['data']][,'H10'] n <- length(dat) # The number of simulations nsim <- nsimsub <- 199 set.seed(200127) # General setup #============== # 1. Fit the model mu <- mean(dat) sigma <- sd(dat) # 2. Simulate a sample from the fitted null model and # compute the test vectors for data (obs) and each simulation (sim) r <- seq(min(dat), max(dat), length=100) obs <- stats::ecdf(dat)(r) sim <- sapply(1:nsimsub, function(i) { x <- rnorm(n, mean = mu, sd = sigma) stats::ecdf(x)(r) }) cset <- create_curve_set(list(r = r, obs = obs, sim_m = sim)) # 3. Simulate another sample from the fitted null model. # 4. Fit the null model to each of the patterns, # simulate a sample from the null model, # and compute the test vectors for all. cset.ls <- vector("list", nsim) for(rep in 1:nsim) { x <- rnorm(n, mean = mu, sd = sigma) mu2 <- mean(x) sigma2 <- sd(x) obs2 <- stats::ecdf(x)(r) sim2 <- sapply(1:nsimsub, function(i) { x2 <- rnorm(n, mean = mu2, sd = sigma2) stats::ecdf(x2)(r) }) cset.ls[[rep]] <- create_curve_set(list(r = r, obs = obs2, sim_m = sim2)) } # Perform the adjusted test res <- GET.composite(X = cset, X.ls = cset.ls, type = 'erl') plot(res) + ggplot2::labs(x = "NOx", y = "Ecdf") } # Example of a point pattern data #================================ # Test the fit of a Matern cluster process. if(require("spatstat.model", quietly=TRUE)) { data("saplings") saplings <- as.ppp(saplings, W = square(75)) # First choose the r-distances rmin <- 0.3; rmax <- 10; rstep <- (rmax-rmin)/500 r <- seq(0, rmax, by = rstep) # Number of simulations nsim <- 19 # Increase nsim for serious analysis! # Option 1: Give the fitted model object to GET.composite #-------------------------------------------------------- # This can be done and is preferable when the model is # a point process model of spatstat. # 1. Fit the Matern cluster process to the pattern # (using minimum contrast estimation with the K-function) M1 <- kppm(saplings~1, clusters = "MatClust", statistic = "K") summary(M1) # 2. Make the adjusted global area rank envelope test using the L(r)-r function adjenvL <- GET.composite(X = M1, nsim = nsim, testfuns = list(L =list(fun="Lest", correction="translate", transform=expression(.-r), r=r)), # passed to envelope type = "area", r_min = rmin, r_max = rmax) # Plot the test result plot(adjenvL) # Option 2: Generate the simulations "by yourself" #------------------------------------------------- # and provide them as curve_set or envelope objects # Preferable when you want to have a control # on the simulations yourself. # 1. Fit the model M1 <- kppm(saplings~1, clusters = "MatClust", statistic = "K") # 2. Generate nsim simulations by the given function using the fitted model X <- envelope(M1, nsim = nsim, savefuns = TRUE, fun = "Lest", correction = "translate", transform = expression(.-r), r = r) plot(X) # 3. Create another set of simulations to be used to estimate # the second-state p-value (as proposed by Baddeley et al., 2017). simpatterns2 <- simulate(M1, nsim = nsim) # 4. Calculate the functions for each pattern simf <- function(rep) { # Fit the model to the simulated pattern Xsims[[rep]] sim_fit <- kppm(simpatterns2[[rep]], clusters = "MatClust", statistic = "K") # Generate nsim simulations from the fitted model envelope(sim_fit, nsim = nsim, savefuns = TRUE, fun = "Lest", correction = "translate", transform = expression(.-r), r = r) } X.ls <- parallel::mclapply(X = 1:nsim, FUN = simf, mc.cores = 1) # list of envelope objects # 5. Perform the adjusted test res <- GET.composite(X = X, X.ls = X.ls, type = "area", r_min = rmin, r_max = rmax) plot(res) }
Compare the distributions of two (or more) samples.
GET.distrequal( x, stat = "ECDF", nsim, r = seq(min(unlist((lapply(x, min)))), max(unlist((lapply(x, max)))), length = 100), tau = seq(0.1, 0.9, length = 100), contrasts = FALSE, GET.args = NULL, density.args = NULL, approxfun.args = NULL, rq.args = NULL, savefuns = FALSE, ... )
GET.distrequal( x, stat = "ECDF", nsim, r = seq(min(unlist((lapply(x, min)))), max(unlist((lapply(x, max)))), length = 100), tau = seq(0.1, 0.9, length = 100), contrasts = FALSE, GET.args = NULL, density.args = NULL, approxfun.args = NULL, rq.args = NULL, savefuns = FALSE, ... )
x |
A list of numeric vectors, one for each sample. |
stat |
Character string indicating which test statistic to be used. See details. |
nsim |
The number of random permutations. |
r |
The sequence of argument values at which the test functions are to be compared. The default is 100 equally spaced values between the minimum and maximum over all groups. |
tau |
The sequence of argument values for the QR test statistic. The default values are 100 equally spaced values between 0.1 and 0.9. |
contrasts |
Logical. FALSE and TRUE specify the two test functions as described in description part of this help file. |
GET.args |
A named list of additional arguments to be passed to |
density.args |
A named list of additional arguments to be passed for the estimation of the test statistic |
approxfun.args |
A named list of additional arguments to be passed for the estimation of the the test statistic |
rq.args |
A named list of additional arguments to be passed for the estimation of the test statistic |
savefuns |
Logical. If TRUE, then the functions from permutations are saved to the attribute simfuns. |
... |
Additional parameters to be passed to |
A global envelope test can be performed to investigate whether the n distribution functions
differ from each other and how do they differ. This test is a generalization of
the two-sample Kolmogorov-Smirnov test with a graphical interpretation.
We assume that the observations in the sample are an i.i.d. sample from the distribution
and we want to test the hypothesis
If contrasts = FALSE
(default), then the default test statistic ("ECDF") is taken to be
where
is the ecdf of the
th sample evaluated at argument values
.
Another possibility is given by stat = "DIFF"
, and then the test statistic is
still based on the ECDFs and constructed from all pairwise differences,
The choices contrasts = TRUE
and stat = "ECDF"
lead to the same test statistic.
For other (or multiple) values of stat
, the argument contrasts
is ignored.
All the options as the test statistics are the following:
"ECDF"
: The ECDFs of the n-samples, as specified above
"DIFF"
: The pairwise differences between the ECDFs, as specified above
"DEN"
: The kernel estimated density functions of the n-samples as the test statistic
"QQ"
: The pairwise comparisons of empirical quantiles
"SHIFT"
The de-trended QQ-plot (shift plot)
"QR"
: The quantile regression process, i.e. the -coefficients of the quantile regression.
By default, the reference category of this test statistic is the first sample.
The test statistics are described in detail in Konstantinou et al. (2024).
The simulations under the null hypothesis that the distributions are the same are obtained
by permuting the individuals of the groups. The default number of permutation, if nsim
is not specified,
is for the case
contrasts = FALSE
and
for the case
contrasts = TRUE
,
where is the length of
.
Konstantinou K., Mrkvička T. and Myllymäki M. (2024) Graphical n-sample tests of correspondence of distributions. arXiv:2403.01838 [stat.ME] https://doi.org/10.48550/arXiv.2403.01838
if(require("fda", quietly=TRUE)) { # Heights of boys and girls at age 10 f.a <- growth$hgtf["10",] # girls at age 10 m.a <- growth$hgtm["10",] # boys at age 10 # Empirical cumulative distribution functions plot(ecdf(f.a)) plot(ecdf(m.a), col='grey70', add=TRUE) # Create a list of the data fm.list <- list(Girls=f.a, Boys=m.a) res <- GET.distrequal(fm.list) plot(res) # If you want to change the labels: plot(res, scales = "free", labels = c("Girls", "Boys")) # If you want to change the x-label (y-label similarly): require("ggplot2") myxlab <- substitute(paste(italic(i), " (", j, ")", sep = ""), list(i = "x", j = "Height in cm")) plot(res, scales = "free") + xlab(myxlab) # Use instead the test statistics QQ and DEN res <- GET.distrequal(fm.list, stat = c("QQ", "DEN")) plot(res, scales = "free") # Heights of boys and girls at age 14 f.a <- growth$hgtf["14",] # girls at age 14 m.a <- growth$hgtm["14",] # boys at age 14 # Empirical cumulative distribution functions plot(ecdf(f.a)) plot(ecdf(m.a), col='grey70', add=TRUE) # Create a list of the data fm.list <- list(Girls=f.a, Boys=m.a) res <- GET.distrequal(fm.list) plot(res) + xlab(myxlab) res <- GET.distrequal(fm.list, stat = c("QQ", "DEN")) plot(res, scales = "free") + xlab(myxlab) } if(require("datasets", quietly=TRUE)) { data("iris") virginica <- subset(iris, Species == "virginica") setosa <- subset(iris, Species == "setosa") versicolor <- subset(iris, Species == "versicolor") res <- GET.distrequal(x = list(virginica = virginica$Sepal.Length, setosa = setosa$Sepal.Length, versicolor = versicolor$Sepal.Length), stat = c("QQ", "DEN")) plot(res, scales = "free", ncol = 3) }
if(require("fda", quietly=TRUE)) { # Heights of boys and girls at age 10 f.a <- growth$hgtf["10",] # girls at age 10 m.a <- growth$hgtm["10",] # boys at age 10 # Empirical cumulative distribution functions plot(ecdf(f.a)) plot(ecdf(m.a), col='grey70', add=TRUE) # Create a list of the data fm.list <- list(Girls=f.a, Boys=m.a) res <- GET.distrequal(fm.list) plot(res) # If you want to change the labels: plot(res, scales = "free", labels = c("Girls", "Boys")) # If you want to change the x-label (y-label similarly): require("ggplot2") myxlab <- substitute(paste(italic(i), " (", j, ")", sep = ""), list(i = "x", j = "Height in cm")) plot(res, scales = "free") + xlab(myxlab) # Use instead the test statistics QQ and DEN res <- GET.distrequal(fm.list, stat = c("QQ", "DEN")) plot(res, scales = "free") # Heights of boys and girls at age 14 f.a <- growth$hgtf["14",] # girls at age 14 m.a <- growth$hgtm["14",] # boys at age 14 # Empirical cumulative distribution functions plot(ecdf(f.a)) plot(ecdf(m.a), col='grey70', add=TRUE) # Create a list of the data fm.list <- list(Girls=f.a, Boys=m.a) res <- GET.distrequal(fm.list) plot(res) + xlab(myxlab) res <- GET.distrequal(fm.list, stat = c("QQ", "DEN")) plot(res, scales = "free") + xlab(myxlab) } if(require("datasets", quietly=TRUE)) { data("iris") virginica <- subset(iris, Species == "virginica") setosa <- subset(iris, Species == "setosa") versicolor <- subset(iris, Species == "versicolor") res <- GET.distrequal(x = list(virginica = virginica$Sepal.Length, setosa = setosa$Sepal.Length, versicolor = versicolor$Sepal.Length), stat = c("QQ", "DEN")) plot(res, scales = "free", ncol = 3) }
Permutation-based test of independence in a bivariate vector using as the test statistic either 1) the empirical joint cumulative distribution function, 2) the matrix of observed counts of a 2D contingency table, or 3) the smoothed Q-Q plot.
GET.distrindep( X, nsim = 999, statistic = c("cdf", "contingency", "qq"), ngrid, seq.x, seq.y, sigma, atoms.x, atoms.y, ... )
GET.distrindep( X, nsim = 999, statistic = c("cdf", "contingency", "qq"), ngrid, seq.x, seq.y, sigma, atoms.x, atoms.y, ... )
X |
A matrix with n rows and 2 columns. Each row contains one bivariate observation. |
nsim |
The number of random permutations used. |
statistic |
Either "cdf", "contingency" or "qq" corresponding to the three test functions. |
ngrid |
Vector with two elements, giving the number of grid points to be used in the test statistic for each of the two marginals. The default is 20 in each marginal for "cdf" and 64 for "qq". (This is not relevant for "contingency".) |
seq.x |
For the first marginal, the values at which the empirical cumulative distribution function will be evaluated. If NULL (the default), sequence of quantiles will be used, equidistant in terms of probability. seq.x and seq.y only relevant for "cdf". |
seq.y |
For the second marginal, the values at which the empirical cumulative distribution function will be evaluated. If NULL (the default), sequence of quantiles will be used, equidistant in terms of probability. seq.x and seq.y only relevant for "cdf". |
sigma |
Standard deviation of the smoothing kernel to be used for smoothing the Q-Q plot when computing the test statistic. If NULL, sensible default value is used based on the number of observations. |
atoms.x |
Vector specifying atomic values in the first marginal. Only relevant for "qq". See Examples. |
atoms.y |
Vector specifying atomic values in the second marginal. Only relevant for "qq". See Examples. |
... |
Additional parameters to be passed to |
The function performs permutation-based test of independence in a bivariate sample
based on three different test statistics chosen by the argument statistic
.
If the observed data are the pairs ,
the permutations are obtained by randomly permuting the values
in the second marginal, i.e.
.
The first alternative statistic = "cdf"
is the empirical joint cumulative distribution function
computed on a grid of ngrid[1]
times ngrid[2]
arguments.
The grid points are chosen according to the quantiles of the
marginal distributions.
The second alternative statistic = "contingency"
is to test of independence in a 2D contingency table,
using the matrix of observed counts as the test statistic.
The third alternative statistic = "qq"
is based on Q-Q representation and estimate of the intensity function
computed on a regular grid of ngrid[1]
times ngrid[2]
points.
The test itself is in each case performed using the global envelope test of the chosen version,
see the argument type
of global_envelope_test
.
In the case of a 2D contingency table, instead of plotting, text output can be printed in the console by typing the object name. The cells in which the observed value exceeds the upper envelope printed in red, and cells in which the observed value is lower than the lower envelope printed in cyan. Standard output of the global envelope test is also returned and can be plotted or analyzed accordingly.
Dvořák, J. and Mrkvička, T. (2022). Graphical tests of independence for general distributions. Computational Statistics 37, 671–699.
#- Example of cdf #---------------- # Generate sample data data <- matrix(rnorm(n=200), ncol=2) %*% matrix(c(1,1,0,1), ncol=2) plot(data) # Compute the CDF test and plot the significant regions res <- GET.distrindep(data, statistic="cdf", ngrid=c(20,15), nsim=1999) plot(res) + ggplot2::scale_radius(range = 2 * c(1, 6)) # Extract the p-value attr(res,"p") #- Example of a 2D contingency table #----------------------------------- # Generate sample data: data <- matrix(c(sample(4, size=100, replace=TRUE), sample(2, size=100, replace=TRUE)), ncol=2) data[,2] <- data[,2] + data[,1] # Observed contingency table (with row names and column names) table(data[,1], data[,2]) # Permutation-based envelope test res <- GET.distrindep(data, statistic="contingency", nsim=999) res plot(res) + ggplot2::scale_radius(range = 5 * c(1, 6)) # Extract the p-value attr(res,"p") # Example of QQ #-------------- # Generate sample data data <- matrix(rnorm(n=200), ncol=2) %*% matrix(c(1,1,0,1), ncol=2) plot(data) # Compute the QQ test and plot the significant regions res <- GET.distrindep(data, statistic="qq", ngrid=c(30,20), nsim=999) plot(res) # Extract the p-value attr(res,"p") # With atoms, independent data <- cbind(rnorm(n=100), sample(4, size=100, replace=TRUE)) plot(data) res <- GET.distrindep(data, statistic="qq", nsim=999, atoms.y=c(1,2,3,4)) plot(res) # With atoms, dependent data <- cbind(sort(rnorm(n=100)), sort(sample(4, size=100, replace=TRUE))) plot(data) res <- GET.distrindep(data, statistic="qq", nsim=999, atoms.y=c(1,2,3,4)) plot(res, sign.type="col", what=c("obs", "lo", "hi", "lo.sign", "hi.sign")) # Atoms in both variables data <- cbind(rnorm(n=100), rnorm(n=100)) %*% matrix(c(1,1,0,1), ncol=2) data[,1][data[,1]<=-1] <- -1 data[,2][data[,2]<=-0.5] <- -0.5 plot(data) # Perform the test res <- GET.distrindep(data, statistic="qq", nsim=999, atoms.x=c(-1), atoms.y=c(-0.5)) plot(res, sign.type="col", what=c("obs", "lo", "hi", "lo.sign", "hi.sign"))
#- Example of cdf #---------------- # Generate sample data data <- matrix(rnorm(n=200), ncol=2) %*% matrix(c(1,1,0,1), ncol=2) plot(data) # Compute the CDF test and plot the significant regions res <- GET.distrindep(data, statistic="cdf", ngrid=c(20,15), nsim=1999) plot(res) + ggplot2::scale_radius(range = 2 * c(1, 6)) # Extract the p-value attr(res,"p") #- Example of a 2D contingency table #----------------------------------- # Generate sample data: data <- matrix(c(sample(4, size=100, replace=TRUE), sample(2, size=100, replace=TRUE)), ncol=2) data[,2] <- data[,2] + data[,1] # Observed contingency table (with row names and column names) table(data[,1], data[,2]) # Permutation-based envelope test res <- GET.distrindep(data, statistic="contingency", nsim=999) res plot(res) + ggplot2::scale_radius(range = 5 * c(1, 6)) # Extract the p-value attr(res,"p") # Example of QQ #-------------- # Generate sample data data <- matrix(rnorm(n=200), ncol=2) %*% matrix(c(1,1,0,1), ncol=2) plot(data) # Compute the QQ test and plot the significant regions res <- GET.distrindep(data, statistic="qq", ngrid=c(30,20), nsim=999) plot(res) # Extract the p-value attr(res,"p") # With atoms, independent data <- cbind(rnorm(n=100), sample(4, size=100, replace=TRUE)) plot(data) res <- GET.distrindep(data, statistic="qq", nsim=999, atoms.y=c(1,2,3,4)) plot(res) # With atoms, dependent data <- cbind(sort(rnorm(n=100)), sort(sample(4, size=100, replace=TRUE))) plot(data) res <- GET.distrindep(data, statistic="qq", nsim=999, atoms.y=c(1,2,3,4)) plot(res, sign.type="col", what=c("obs", "lo", "hi", "lo.sign", "hi.sign")) # Atoms in both variables data <- cbind(rnorm(n=100), rnorm(n=100)) %*% matrix(c(1,1,0,1), ncol=2) data[,1][data[,1]<=-1] <- -1 data[,2][data[,2]<=-0.5] <- -0.5 plot(data) # Perform the test res <- GET.distrindep(data, statistic="qq", nsim=999, atoms.x=c(-1), atoms.y=c(-0.5)) plot(res, sign.type="col", what=c("obs", "lo", "hi", "lo.sign", "hi.sign"))
The test of local correlations using Vilodomat et al. (2014) procedure for resamples and the FDR envelope of Mrkvička and Myllymäki (2023).
GET.localcor( data, Delta, nsim = 1000, ..., varying.bandwidth = FALSE, bandwidth.nn = 0.1, bandwidth.h = 5.281, maxk = 300, savefuns = FALSE, N_s = 1000, mc.cores = 1L, mc.args = NULL, cl = NULL, notest = FALSE )
GET.localcor( data, Delta, nsim = 1000, ..., varying.bandwidth = FALSE, bandwidth.nn = 0.1, bandwidth.h = 5.281, maxk = 300, savefuns = FALSE, N_s = 1000, mc.cores = 1L, mc.args = NULL, cl = NULL, notest = FALSE )
data |
A data.frame where the first two columns correspond to the values of the two random fields, whose correlations are to be studied, and the third and fourth columns correspond to the x- and y-coordinates where these random fields have been observed. In addition, the width and height of the pixels at each (x,y) can be given in the fifth and sixth column. Warning: no checks for the data input. |
Delta |
A smoothing parameter of the local correlation. According to Vilodomat et al. (2014): Delta is a set of values for the proportion of neighbors to consider for the smoothing step. No default. The user may have to experiment with different values to find one suitable for their data. |
nsim |
The number of resamples. |
... |
Additional parameters to be passed to |
varying.bandwidth |
Logical, whether to use a varying bandwidth to calculate the local correlations or not. See Vilodomat et al. (2014). |
bandwidth.nn |
Nearest neighbor component of the smoothing parameter for varying bandwidth to
be passed to the argument |
bandwidth.h |
Non-varying bandwidth, to be passed to the argument |
maxk |
See |
savefuns |
Logical. If TRUE, then the functions from permutations are saved to the attribute simfuns. |
N_s |
If the number of observations is bigger than |
mc.cores |
The number of cores to use, i.e. at most how many child processes will be run simultaneously.
Must be at least one, and parallelization requires at least two cores. On a Windows computer mc.cores must be 1
(no parallelization). For details, see |
mc.args |
A named list of additional arguments to be passed to |
cl |
Allows parallelization through the use of |
notest |
Logical. FALSE means that the test is done. TRUE allows to calculate only local correlation for the data, which can be beneficial for choosing the bandwidths before running the test. If TRUE, then only the observed local correlations will be returned. |
The code is a modification of the supporting information code of Vilodomat et al. (2014)
available at https://doi.org/10.1111/biom.12139. The modification includes the FDR or FWER envelopes
(as specified by the argument typeone
in ...
, passed to global_envelope_test
)
for the test of local correlations, i.e. multiple testing correction and
graphical illustration of the test results.
Variograms are calculated using the package geoR and the local correlations using the R
package locfit. These packages should be installed to use GET.localcor
.
Currently the data is provided in the format of Vilodomat et al. (2014, Supporting information).
Additionally width and height of area represented by a data point can be provided, see the argument data
.
This information is used for plotting purposes when plotting the output by plot()
.
Examples are provided in the vignette 'FDRenvelopes', see e.g. https://cran.r-project.org/package=GET.
A global envelope object (with possible additional classes), see description of main components
in global_envelope
(Value).
Additional attributes: p_global
contains the Monte Carlo p-value for the global test of correlation.
cor_global
and cor_global_sim
contain the value of the correlation for data and permuted data,
respectively. If savefuns = TRUE
, then permutations
contain the permuted values of the first
random field according to Viladomat et al. (2014) procedure, and cset
contains all the local correlations
for the data and permuted data in a curve_set
object (see create_curve_set
).
Viladomat, J., Mazumder, R., McInturff, A., McCauley, D.J. and Hastie, T. (2014). Assessing the significance of global and local correlations under spatial autocorrelation: A nonparametric approach. Biometrics 70, 409-418. doi: 10.1111/biom.12139
Mrkvička, T., Myllymäki, M. (2023) False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7
Compute the spatial F- and S-statistics and perform the one-stage global envelope tests proposed by Myllymäki et al. (2020).
GET.spatialF( X, formula.full, formula.reduced, fitfun, covariates, nsim, bw = spatstat.explore::bw.scott(X), bw.S = bw, dimyx = NULL, ... )
GET.spatialF( X, formula.full, formula.reduced, fitfun, covariates, nsim, bw = spatstat.explore::bw.scott(X), bw.S = bw, dimyx = NULL, ... )
X |
A |
formula.full |
A formula for the trend of the full model. |
formula.reduced |
A formula for the trend of the reduced model that is a submodel of the full model. |
fitfun |
A function of a point pattern, model formula and covariates,
giving a fitted model object that can be used with |
covariates |
A list of covariates. |
nsim |
The number of simulations. |
bw |
The bandwidth for smoothed residuals. |
bw.S |
The radius for the local S(u)-statistic. |
dimyx |
Pixel array dimensions for smoothed residuals. See |
... |
Additional arguments to be passed to |
list with three components
F = the global envelope test based on the F(u) statistic
S = the global envelope test based on the S(u) statistic
coef = the coefficients of the full model given by fitfun
Myllymäki, M., Kuronen, M. and Mrkvička, T. (2020). Testing global and local dependence of point patterns on covariates in parametric models. Spatial Statistics 42, 100436. doi: 10.1016/j.spasta.2020.100436
if(require("spatstat.model", quietly=TRUE)) { # Example of tropical rain forest trees data("bei") fullmodel <- ~ grad reducedmodel <- ~ 1 fitppm <- function(X, model, covariates) { ppm(X, model, covariates=covariates) } nsim <- 19 # Increase nsim for serious analysis! res <- GET.spatialF(bei, fullmodel, reducedmodel, fitppm, bei.extra, nsim) plot(res$F) plot(res$S) # Example of forest fires data("clmfires") # Choose the locations of the lightnings in years 2004-2007: pp.lightning <- unmark(subset(clmfires, cause == "lightning" & date >= "2004-01-01" & date < "2008-01-01")) covariates <- clmfires.extra$clmcov100 covariates$forest <- covariates$landuse == "conifer" | covariates$landuse == "denseforest" | covariates$landuse == "mixedforest" fullmodel <- ~ elevation + landuse reducedmodel <- ~ landuse nsim <- 19 # Increase nsim for serious analysis! res <- GET.spatialF(pp.lightning, fullmodel, reducedmodel, fitppm, covariates, nsim) plot(res$F) plot(res$S) # Examples of the fitfun functions for clustered and regular processes # fitfun for the log Gaussian Cox Process with exponential covariance function fitLGCPexp <- function(X, model, covariates) { kppm(X, model, clusters="LGCP", model="exponential", covariates=covariates) } # fitfun for the hardcore process with hardcore radius 0.01 fitHardcore <- function(X, model, covariates) { ppm(X, model, interaction=Hardcore(0.01), covariates=covariates) } }
if(require("spatstat.model", quietly=TRUE)) { # Example of tropical rain forest trees data("bei") fullmodel <- ~ grad reducedmodel <- ~ 1 fitppm <- function(X, model, covariates) { ppm(X, model, covariates=covariates) } nsim <- 19 # Increase nsim for serious analysis! res <- GET.spatialF(bei, fullmodel, reducedmodel, fitppm, bei.extra, nsim) plot(res$F) plot(res$S) # Example of forest fires data("clmfires") # Choose the locations of the lightnings in years 2004-2007: pp.lightning <- unmark(subset(clmfires, cause == "lightning" & date >= "2004-01-01" & date < "2008-01-01")) covariates <- clmfires.extra$clmcov100 covariates$forest <- covariates$landuse == "conifer" | covariates$landuse == "denseforest" | covariates$landuse == "mixedforest" fullmodel <- ~ elevation + landuse reducedmodel <- ~ landuse nsim <- 19 # Increase nsim for serious analysis! res <- GET.spatialF(pp.lightning, fullmodel, reducedmodel, fitppm, covariates, nsim) plot(res$F) plot(res$S) # Examples of the fitfun functions for clustered and regular processes # fitfun for the log Gaussian Cox Process with exponential covariance function fitLGCPexp <- function(X, model, covariates) { kppm(X, model, clusters="LGCP", model="exponential", covariates=covariates) } # fitfun for the hardcore process with hardcore radius 0.01 fitHardcore <- function(X, model, covariates) { ppm(X, model, interaction=Hardcore(0.01), covariates=covariates) } }
The function accompanies the function variogram
with global envelopes
that are based on permutations of the variable(s) or residuals for which the variogram is calculated.
Therefore, one can inspect the hypothesis of "no spatial autocorrelation" of the variable or the residuals
of the fitted model.
GET.variogram( object, nsim = 999, data = NULL, ..., GET.args = NULL, savefuns = TRUE )
GET.variogram( object, nsim = 999, data = NULL, ..., GET.args = NULL, savefuns = TRUE )
object |
An object of class |
nsim |
The number of permutations. |
data |
A data frame where the names in formula are to be found. If NULL,
the data are assumed to be found in the |
... |
Additional parameters to be passed to |
GET.args |
A named list of additional arguments to be passed to |
savefuns |
Logical. If TRUE, then the functions from permutations are saved to the attribute simfuns. |
if(require("sp", quietly=TRUE) & require("gstat", quietly=TRUE)) { # Examples from gstat complemented with global envelopes #------------------------------------------------------- data("meuse") coordinates(meuse) <- ~x+y # topsoil zinc concentration, mg kg-1 soil ("ppm") bubble(meuse, "zinc", col=c("#00ff0088", "#00ff0088"), main="zinc concentrations (ppm)") # Variogram can be calculated as follows by the function variogram of the gstat package. # The function variogram takes a formula as its first argument: # log(zinc)~1 means that we assume a constant trend for the variable log(zinc). lzn.vgm <- variogram(object=log(zinc)~1, data=meuse) plot(lzn.vgm) # Variogram with global envelopes is as easy: lzn.vgm.GET <- GET.variogram(object=log(zinc)~1, data=meuse) plot(lzn.vgm.GET) # Instead of the constant mean, denoted by ~1, a mean function can # be specified, e.g. using ~sqrt(dist) as a predictor variable: lznr.vgm <- variogram(log(zinc)~sqrt(dist), meuse) # In this case, the variogram of residuals with respect # to a fitted mean function are shown. plot(lznr.vgm) # The variogram with global envelopes (obtained by permuting the residuals): lznr.vgm.GET <- GET.variogram(object=log(zinc)~sqrt(dist), data=meuse) plot(lznr.vgm.GET) # Directional variograms lzn.dir <- variogram(object=log(zinc)~1, data=meuse, alpha=c(0, 45, 90, 135)) plot(lzn.dir) # with global envelopes lzn.dir.GET <- GET.variogram(object=log(zinc)~1, data=meuse, alpha=c(0, 45, 90, 135)) plot(lzn.dir.GET) # Use instead gstat objects g <- gstat(id="ln.zinc", formula=log(zinc)~1, data=meuse) # or: g <- gstat(id="ln.zinc", formula=log(zinc)~sqrt(dist), data=meuse) # The variogram plot(variogram(g)) # The variogram with global envelopes: g.GET <- GET.variogram(object=g) plot(g.GET) }
if(require("sp", quietly=TRUE) & require("gstat", quietly=TRUE)) { # Examples from gstat complemented with global envelopes #------------------------------------------------------- data("meuse") coordinates(meuse) <- ~x+y # topsoil zinc concentration, mg kg-1 soil ("ppm") bubble(meuse, "zinc", col=c("#00ff0088", "#00ff0088"), main="zinc concentrations (ppm)") # Variogram can be calculated as follows by the function variogram of the gstat package. # The function variogram takes a formula as its first argument: # log(zinc)~1 means that we assume a constant trend for the variable log(zinc). lzn.vgm <- variogram(object=log(zinc)~1, data=meuse) plot(lzn.vgm) # Variogram with global envelopes is as easy: lzn.vgm.GET <- GET.variogram(object=log(zinc)~1, data=meuse) plot(lzn.vgm.GET) # Instead of the constant mean, denoted by ~1, a mean function can # be specified, e.g. using ~sqrt(dist) as a predictor variable: lznr.vgm <- variogram(log(zinc)~sqrt(dist), meuse) # In this case, the variogram of residuals with respect # to a fitted mean function are shown. plot(lznr.vgm) # The variogram with global envelopes (obtained by permuting the residuals): lznr.vgm.GET <- GET.variogram(object=log(zinc)~sqrt(dist), data=meuse) plot(lznr.vgm.GET) # Directional variograms lzn.dir <- variogram(object=log(zinc)~1, data=meuse, alpha=c(0, 45, 90, 135)) plot(lzn.dir) # with global envelopes lzn.dir.GET <- GET.variogram(object=log(zinc)~1, data=meuse, alpha=c(0, 45, 90, 135)) plot(lzn.dir.GET) # Use instead gstat objects g <- gstat(id="ln.zinc", formula=log(zinc)~1, data=meuse) # or: g <- gstat(id="ln.zinc", formula=log(zinc)~sqrt(dist), data=meuse) # The variogram plot(variogram(g)) # The variogram with global envelopes: g.GET <- GET.variogram(object=g) plot(g.GET) }
Global envelope test, global envelopes and p-values
global_envelope_test( curve_sets, typeone = c("fwer", "fdr"), alpha = 0.05, alternative = c("two.sided", "less", "greater"), type = "erl", algorithm = c("IATSE", "ATSE"), ties = "erl", probs = c(0.025, 0.975), quantile.type = 7, central = "mean", nstep = 2, ..., lower = NULL, upper = NULL )
global_envelope_test( curve_sets, typeone = c("fwer", "fdr"), alpha = 0.05, alternative = c("two.sided", "less", "greater"), type = "erl", algorithm = c("IATSE", "ATSE"), ties = "erl", probs = c(0.025, 0.975), quantile.type = 7, central = "mean", nstep = 2, ..., lower = NULL, upper = NULL )
curve_sets |
A |
typeone |
Character string indicating which type I error rate to control, either the family-wise error rate ('fwer') or false discovery rate ('fdr'). |
alpha |
The significance level. The 100(1-alpha)% global envelope will be calculated under the 'fwer' or 'fdr' control. If a vector of values is provided, the global envelopes are calculated for each value. |
alternative |
A character string specifying the alternative hypothesis.
Must be one of the following: "two.sided" (default), "less" or "greater".
The last two options only available for types |
type |
The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details. |
algorithm |
The algorithm for the computation of the FDR envelope. Either "IATSE" or "ATSE" standing for the iteratively adaptive two-stage envelope and the adaptive two-stage envelope, respectively, see Mrkvička and Myllymäki (2023). |
ties |
The method to obtain a unique p-value when |
probs |
A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017). |
quantile.type |
As type argument of |
central |
Either "mean" or "median". If the curve sets do not contain the component
|
nstep |
1 or 2 for how to contruct a combined global envelope if list of curve sets is provided. 2 (default) for a two-step combining procedure, 1 for one-step. |
... |
Additional parameters to be passed to |
lower |
A single number (or a vector of suitable length) giving a lower bound for the functions. Used only for the extension of the FDR envelope. |
upper |
A single number (or a vector of suitable length) giving an upper bound for the functions. Used only for the extension of the FDR envelope. |
Given a curve_set
object,
or an envelope
object of spatstat,
which contains both the data curve (or function or vector)
(in the component
obs
) and
the simulated curves
(in the component
sim_m
),
the function global_envelope_test
performs a global envelope test,
either under the control of family-wise error rate (FWER) or false discovery rate (FDR),
as specified by the argument typeone
.
The function global_envelope_test
is the main function for global envelope tests
(for simple hypotheses).
The case typeone = "fdr"
corresponds to the FDR envelopes proposed by
Mrkvička and Myllymäki (2023). See details in fdr_envelope
and
in the vignette vignette("FDRenvelopes")
. Note there also the arguments that are
the relevant ones for the FDR envelope specification.
The descriptions below concern the FWER envelopes.
If typeone = "fwer"
, the functionality of the function is rather similar to the
function central_region
, but in addition to ordering the functions from
the most extreme one to the least extreme one using different measures
and providing the global envelopes with intrinsic
graphical interpretation, p-values are calculated for the test.
Thus, while central_region
can be used to construct global
envelopes in a general setting, the function global_envelope_test
is devoted to testing as its name suggests.
Different type
of global envelope tests under the control of FWER can be computed.
We use such ordering of the functions for which we are able to construct global
envelopes with intrinsic graphical interpretation (IGI, see Myllymäki and Mrkvička, 2023).
'rank'
: the completely non-parametric rank envelope test (Myllymäki et al., 2017)
based on minimum of pointwise ranks
'erl'
: the completely non-parametric rank envelope test based on extreme rank lengths
(Myllymäki et al., 2017; Mrkvička et al., 2018) based on number of minimal pointwise ranks
'cont'
: the completely non-parametric rank envelope test based on continuous rank
(Hahn, 2015; Mrkvička et al., 2022) based on minimum of continuous pointwise ranks
'area'
: the completely non-parametric rank envelope test based on area rank
(Mrkvička et al., 2022) based on area between continuous pointwise ranks and minimum
pointwise ranks for those argument (r) values for which pointwise ranks achieve the minimum
(it is a combination of erl and cont)
"qdir", the directional quantile envelope test, protected against unequal variance and asymmetry of T(r) for different distances r (Myllymäki et al., 2015, 2017)
"st", the studentised envelope test, protected against unequal variance of T(r) for different distances r (Myllymäki et al., 2015, 2017)
"unscaled", the unscaled envelope (providing a baseline) that has a contant width and that corresponds to the classical maximum deviation test (Ripley, 1981).
The first four types are global rank envelopes.
The 'rank'
envelope test is a completely non-parametric test,
which provides the 100(1-alpha)% global envelope for the chosen test function
T(r) on the chosen interval of distances and associated p-values.
The other three are modifications of 'rank'
to treat the ties in
the extreme rank ordering on which the 'rank'
test is based on.
The last three envelopes are global scaled maximum absolute difference (MAD)
envelope tests. The unscaled envelope test leads to envelopes with constant width over the
distances r. Thus, it suffers from unequal variance of T(r) over the distances r and
from the asymmetry of distribution of T(r). We recommend to use the other global
envelope tests available. The unscaled envelope is provided as a reference.
See Myllymäki and Mrkvička (2023, Appendix.), i.e. vignette("GET")
, for more detailed
description of the measures and the corresponding envelopes.
See vignette("pointpatterns")
for examples of point pattern analyses.
See vignette("FDRenvelopes")
for examples of FDR envelopes obtained by
typeone = "fdr"
.
Either an object of class "global_envelope" or "combined_global_envelope",
similarly as the objects returned by central_region
.
Further, if typeone = "fdr"
, the objects have the further class
"fdr_envelope" or "combined_fdr_envelope".
The global_envelope
is essentially a data frame containing columns
the values of the argument r at which the test was made, copied from the argument curve_sets
with the corresponding names
obs = values of the data function, copied from the argument curve_sets
(unlike for central regions, obs
always exists for a global envelope test)
lo = the lower envelope; in case of a vector of alpha values, several 'lo' exist with names paste0("lo.", 100*(1-alpha))
hi = the upper envelope; in case of a vector of alpha values, several 'lo' exist with names paste0("hi.", 100*(1-alpha))
central = a central curve as specified in the argument central
.
Moreover, the returned object has the same attributes as the global_envelope
object returned by
central_region
and in addition
p = the p-value of the test
and in the case that type = 'rank'
also
p_interval = The p-value interval .
ties = As the argument ties
.
The combined_global_envelope
is a list of global_envelope
objects
containing the above mentioned columns and which all together form the global envelope.
It has the same attributes as described in central_region
, and in addition also
the p-value p
.
The 2d classes are attached as described in central_region
.
1) First the curves are ranked from the most extreme one to the least extreme one
by a measure that is specified by the argument type
. The options are
'rank': extreme ranks (Myllymäki et al., 2017)
'erl': extreme rank lengths (Myllymäki et al., 2017; Mrkvička et al., 2018)
'cont': continuous ranks (Hahn, 2015; Mrkvička et al., 2019)
'area': area ranks (Mrkvička et al., 2019)
'qdir': the directional quantile maximum absolute deviation (MAD) measure (Myllymäki et al., 2015, 2017)
'st': the studentized MAD measure (Myllymäki et al., 2015, 2017)
'unscaled': the unscaled MAD measure (Ripley, 1981)
2) Based on the measures used to rank the functions, the 100(1-alpha)% global envelope is provided. It corresponds to the 100*coverage% central region.
3) P-values:
In the case type="rank"
, based on the extreme ranks ,
the p-interval is calculated. Because the extreme ranks contain ties, there is not just
one p-value. The p-interval is given by the most liberal and the most conservative p-value
estimate. Also a single p-value is calculated.
By default this single p-value is the extreme rank length p-value ("erl") as specified by the argument
ties
.
If the case of other measures, a (single) p-value based on the given ordering
of the functions is calculated and returned in the attribute p
.
For the global "rank"
envelope test, Myllymäki et al. (2017) recommended to use
at least 2500 simulations for testing at the significance level alpha = 0.05 for single
function tests, based on experiments with summary functions for point processes evaluated
approximately at 500 argument values.
In this case, the width of the p-interval associated with the extreme rank measure tended
to be smaller than 0.01.
The tests 'erl'
, 'cont'
and 'area'
, similarly as
the MAD deviation/envelope tests 'qdir'
, 'st'
and 'unscaled'
,
allow in principle a lower number of simulations to be used than the test based on
extreme ranks ('rank'
), because no ties occur for these measures.
If affordable, we recommend in any case some thousands of simulations for all the measures
to achieve a good power and repeatability of the test.
If the dimension of the test functions is higher, also the number of simulations should
preferably be higher.
If a list of (suitable) objects are provided in the argument curve_sets
,
then by default (nstep = 2
) the two-step combining procedure is used to
perform the combined global test as described in Myllymäki and Mrkvička (2020).
If nstep = 1
and the lengths of the multivariate vectors in each component
of the list are equal, then the one-step combining procedure is used where the
functions are concatenated together into a one long vector.
Mrkvička, T., Myllymäki, M. and Hahn, U. (2017). Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5), 1239-1255. doi: 10.1007/s11222-016-9683-9
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Mrkvička and Myllymäki (2023). False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381–404. doi: 10.1111/rssb.12172
Myllymäki, M. and Mrkvička, T. (2023). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
Ripley, B.D. (1981). Spatial statistics. Wiley, New Jersey.
plot.global_envelope
, central_region
,
GET.composite
# Goodness-of-fit testing for simple hypothesis if(require("spatstat.explore", quietly=TRUE)) { # Testing complete spatial randomness (CSR) #========================================== X <- unmark(spruces) nsim <- 1999 # Number of simulations # Illustration of general workflow for simple hypotheses #======================================================= # First illustrate the general workflow for the test by this example # of CSR test for a point pattern X using the empirical L-function. # Define the argument values at which the functions are evaluated obs.L <- Lest(X, correction="translate") r <- obs.L[['r']] # The test function for the data obs <- obs.L[['trans']] - r # Prepare simulations and calculate test functions for them at same r as 'obs' sim <- matrix(nrow=length(r), ncol=nsim) for(i in 1:nsim) { sim.X <- runifpoint(ex=X) # simulation under CSR sim[, i] <- Lest(sim.X, correction="translate", r=r)[['trans']] - r } # Create a curve_set containing argument values, observed and simulated functions cset <- curve_set(r=r, obs=obs, sim=sim) # Perform the test res <- global_envelope_test(cset, type="erl") plot(res) + ggplot2::ylab(expression(italic(hat(L)(r)-r))) # Simple hypothesis for a point pattern utilizing the spatstat package #===================================================================== # Generate nsim simulations under CSR, calculate L-function for the data and simulations env <- envelope(X, fun="Lest", nsim=nsim, savefuns=TRUE, # save the functions correction="translate", # edge correction for L transform=expression(.-r), # centering simulate=expression(runifpoint(ex=X))) # Simulate CSR # The rank envelope test (ERL) res <- global_envelope_test(env, type="erl") # Plot the result plot(res) ## Advanced use: # Choose the interval of distances [r_min, r_max] (at the same time create a curve_set from 'env') cset <- crop_curves(env, r_min=1, r_max=7) # Do the rank envelope test (erl) res <- global_envelope_test(cset, type="erl") plot(res) + ggplot2::ylab(expression(italic(L(r)-r))) # A combined global envelope test #================================ # As an example test CSR of the saplings point pattern by means of # L, F, G and J functions. data(saplings) X <- as.ppp(saplings, W=square(75)) nsim <- 499 # Number of simulations # Specify distances for different test functions n <- 500 # the number of r-values rmin <- 0; rmax <- 20; rstep <- (rmax-rmin)/n rminJ <- 0; rmaxJ <- 8; rstepJ <- (rmaxJ-rminJ)/n r <- seq(0, rmax, by=rstep) # r-distances for Lest rJ <- seq(0, rmaxJ, by=rstepJ) # r-distances for Fest, Gest, Jest # Perform simulations of CSR and calculate the L-functions env_L <- envelope(X, nsim=nsim, simulate=expression(runifpoint(ex=X)), fun="Lest", correction="translate", transform=expression(.-r), # Take the L(r)-r function instead of L(r) r=r, # Specify the distance vector savefuns=TRUE, # Save the estimated functions savepatterns=TRUE) # Save the simulated patterns # Take the simulations from the returned object simulations <- attr(env_L, "simpatterns") # Then calculate the other test functions F, G, J for each simulated pattern env_F <- envelope(X, nsim=nsim, simulate=simulations, fun="Fest", correction="Kaplan", r=rJ, savefuns=TRUE) env_G <- envelope(X, nsim=nsim, simulate=simulations, fun="Gest", correction="km", r=rJ, savefuns=TRUE) env_J <- envelope(X, nsim=nsim, simulate=simulations, fun="Jest", correction="none", r=rJ, savefuns=TRUE) # Crop the curves to the desired r-interval I curve_set_L <- crop_curves(env_L, r_min=rmin, r_max=rmax) curve_set_F <- crop_curves(env_F, r_min=rminJ, r_max=rmaxJ) curve_set_G <- crop_curves(env_G, r_min=rminJ, r_max=rmaxJ) curve_set_J <- crop_curves(env_J, r_min=rminJ, r_max=rmaxJ) res <- global_envelope_test(curve_sets=list(L=curve_set_L, F=curve_set_F, G=curve_set_G, J=curve_set_J)) plot(res) plot(res, labels=c("L(r)-r", "F(r)", "G(r)", "J(r)")) } # A test based on a low dimensional random vector #================================================ # Let us generate some example data. X <- matrix(c(-1.6,1.6),1,2) # data pattern X=(X_1,X_2) if(requireNamespace("mvtnorm", quietly=TRUE)) { Y <- mvtnorm::rmvnorm(200,c(0,0),matrix(c(1,0.5,0.5,1),2,2)) # simulations plot(Y, xlim=c(min(X[,1],Y[,1]), max(X[,1],Y[,1])), ylim=c(min(X[,2],Y[,2]), max(X[,2],Y[,2]))) points(X, col=2) # Test the null hypothesis is that X is from the distribution of Y's (or if it is an outlier). # Case 1. The test vector is (X_1, X_2) cset1 <- curve_set(r=1:2, obs=as.vector(X), sim=t(Y)) res1 <- global_envelope_test(cset1) plot(res1) # Case 2. The test vector is (X_1, X_2, (X_1-mean(Y_1))*(X_2-mean(Y_2))). t3 <- function(x, y) { (x[,1]-mean(y[,1]))*(x[,2]-mean(y[,2])) } cset2 <- curve_set(r=1:3, obs=c(X[,1],X[,2],t3(X,Y)), sim=rbind(t(Y), t3(Y,Y))) res2 <- global_envelope_test(cset2) plot(res2) }
# Goodness-of-fit testing for simple hypothesis if(require("spatstat.explore", quietly=TRUE)) { # Testing complete spatial randomness (CSR) #========================================== X <- unmark(spruces) nsim <- 1999 # Number of simulations # Illustration of general workflow for simple hypotheses #======================================================= # First illustrate the general workflow for the test by this example # of CSR test for a point pattern X using the empirical L-function. # Define the argument values at which the functions are evaluated obs.L <- Lest(X, correction="translate") r <- obs.L[['r']] # The test function for the data obs <- obs.L[['trans']] - r # Prepare simulations and calculate test functions for them at same r as 'obs' sim <- matrix(nrow=length(r), ncol=nsim) for(i in 1:nsim) { sim.X <- runifpoint(ex=X) # simulation under CSR sim[, i] <- Lest(sim.X, correction="translate", r=r)[['trans']] - r } # Create a curve_set containing argument values, observed and simulated functions cset <- curve_set(r=r, obs=obs, sim=sim) # Perform the test res <- global_envelope_test(cset, type="erl") plot(res) + ggplot2::ylab(expression(italic(hat(L)(r)-r))) # Simple hypothesis for a point pattern utilizing the spatstat package #===================================================================== # Generate nsim simulations under CSR, calculate L-function for the data and simulations env <- envelope(X, fun="Lest", nsim=nsim, savefuns=TRUE, # save the functions correction="translate", # edge correction for L transform=expression(.-r), # centering simulate=expression(runifpoint(ex=X))) # Simulate CSR # The rank envelope test (ERL) res <- global_envelope_test(env, type="erl") # Plot the result plot(res) ## Advanced use: # Choose the interval of distances [r_min, r_max] (at the same time create a curve_set from 'env') cset <- crop_curves(env, r_min=1, r_max=7) # Do the rank envelope test (erl) res <- global_envelope_test(cset, type="erl") plot(res) + ggplot2::ylab(expression(italic(L(r)-r))) # A combined global envelope test #================================ # As an example test CSR of the saplings point pattern by means of # L, F, G and J functions. data(saplings) X <- as.ppp(saplings, W=square(75)) nsim <- 499 # Number of simulations # Specify distances for different test functions n <- 500 # the number of r-values rmin <- 0; rmax <- 20; rstep <- (rmax-rmin)/n rminJ <- 0; rmaxJ <- 8; rstepJ <- (rmaxJ-rminJ)/n r <- seq(0, rmax, by=rstep) # r-distances for Lest rJ <- seq(0, rmaxJ, by=rstepJ) # r-distances for Fest, Gest, Jest # Perform simulations of CSR and calculate the L-functions env_L <- envelope(X, nsim=nsim, simulate=expression(runifpoint(ex=X)), fun="Lest", correction="translate", transform=expression(.-r), # Take the L(r)-r function instead of L(r) r=r, # Specify the distance vector savefuns=TRUE, # Save the estimated functions savepatterns=TRUE) # Save the simulated patterns # Take the simulations from the returned object simulations <- attr(env_L, "simpatterns") # Then calculate the other test functions F, G, J for each simulated pattern env_F <- envelope(X, nsim=nsim, simulate=simulations, fun="Fest", correction="Kaplan", r=rJ, savefuns=TRUE) env_G <- envelope(X, nsim=nsim, simulate=simulations, fun="Gest", correction="km", r=rJ, savefuns=TRUE) env_J <- envelope(X, nsim=nsim, simulate=simulations, fun="Jest", correction="none", r=rJ, savefuns=TRUE) # Crop the curves to the desired r-interval I curve_set_L <- crop_curves(env_L, r_min=rmin, r_max=rmax) curve_set_F <- crop_curves(env_F, r_min=rminJ, r_max=rmaxJ) curve_set_G <- crop_curves(env_G, r_min=rminJ, r_max=rmaxJ) curve_set_J <- crop_curves(env_J, r_min=rminJ, r_max=rmaxJ) res <- global_envelope_test(curve_sets=list(L=curve_set_L, F=curve_set_F, G=curve_set_G, J=curve_set_J)) plot(res) plot(res, labels=c("L(r)-r", "F(r)", "G(r)", "J(r)")) } # A test based on a low dimensional random vector #================================================ # Let us generate some example data. X <- matrix(c(-1.6,1.6),1,2) # data pattern X=(X_1,X_2) if(requireNamespace("mvtnorm", quietly=TRUE)) { Y <- mvtnorm::rmvnorm(200,c(0,0),matrix(c(1,0.5,0.5,1),2,2)) # simulations plot(Y, xlim=c(min(X[,1],Y[,1]), max(X[,1],Y[,1])), ylim=c(min(X[,2],Y[,2]), max(X[,2],Y[,2]))) points(X, col=2) # Test the null hypothesis is that X is from the distribution of Y's (or if it is an outlier). # Case 1. The test vector is (X_1, X_2) cset1 <- curve_set(r=1:2, obs=as.vector(X), sim=t(Y)) res1 <- global_envelope_test(cset1) plot(res1) # Case 2. The test vector is (X_1, X_2, (X_1-mean(Y_1))*(X_2-mean(Y_2))). t3 <- function(x, y) { (x[,1]-mean(y[,1]))*(x[,2]-mean(y[,2])) } cset2 <- curve_set(r=1:3, obs=c(X[,1],X[,2],t3(X,Y)), sim=rbind(t(Y), t3(Y,Y))) res2 <- global_envelope_test(cset2) plot(res2) }
Global tests of significance for the effect of covariates in quantile regression
global_rq( nsim, formula.full, formula.reduced, taus, data, contrasts = NULL, permutationstrategy = c("Freedman-Lane", "Freedman-Lane+remove zeros", "within nuisance", "remove location", "remove location scale", "remove quantile"), savefuns = FALSE, rq.args = NULL, lm.args = NULL, GET.args = NULL, mc.cores = 1L, mc.args = NULL, cl = NULL )
global_rq( nsim, formula.full, formula.reduced, taus, data, contrasts = NULL, permutationstrategy = c("Freedman-Lane", "Freedman-Lane+remove zeros", "within nuisance", "remove location", "remove location scale", "remove quantile"), savefuns = FALSE, rq.args = NULL, lm.args = NULL, GET.args = NULL, mc.cores = 1L, mc.args = NULL, cl = NULL )
nsim |
The number of random permutations. |
formula.full |
The formula specifying the general linear model,
see |
formula.reduced |
The formula of the reduced model with nuisance factors only. This model should be nested within the full model. |
taus |
The quantiles to be used. |
data |
data.frame where to look for variables. |
contrasts |
Passed directly to |
permutationstrategy |
The permutation strategy to be used. See details. |
savefuns |
Logical. If TRUE, then the functions from permutations are saved to the attribute simfuns. |
rq.args |
Additional arguments passed to |
lm.args |
A named list of additional arguments to be passed to |
GET.args |
A named list of additional arguments to be passed to |
mc.cores |
The number of cores to use, i.e. at most how many child processes will be run simultaneously.
Must be at least one, and parallelization requires at least two cores. On a Windows computer mc.cores must be 1
(no parallelization). For details, see |
mc.args |
A named list of additional arguments to be passed to |
cl |
Allows parallelization through the use of |
The possible permutation strategies include "Freedman-Lane" (FL), "Freedman-Lane+remove zeros" (FL+), "within nuisance" (WN), "remove location" (RL), "remove location scale" (RLS), "remove quantile" (RQ), which correspond to those in Mrkvička et al. (Section 4.1-4.6 and Table 1).
A global_envelope
or combined_global_envelope
object,
which can be printed and plotted directly.
Mrkvička, T., Konstantinou, K., Kuronen, M. and Myllymäki, M. (2023) Global quantile regression. arXiv:2309.04746 [stat.ME]. https://doi.org/10.48550/arXiv.2309.04746
Myllymäki, M and Mrkvička, T. (2023). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
Freedman, D., & Lane, D. (1983) A nonstochastic interpretation of reported significance levels. Journal of Business & Economic Statistics, 1(4), 292-298. doi:10.2307/1391660
if(require("quantreg", quietly=TRUE)) { data("stackloss") res <- global_rq(nsim = 19, # Increase nsim for serious analysis! formula.full = stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., formula.reduced = stack.loss ~ Water.Temp, taus = seq(0.1, 0.9, length=10), permutationstrategy = "remove quantile", data = stackloss, GET.args = list(typeone = "fwer")) plot(res) }
if(require("quantreg", quietly=TRUE)) { data("stackloss") res <- global_rq(nsim = 19, # Increase nsim for serious analysis! formula.full = stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., formula.reduced = stack.loss ~ Water.Temp, taus = seq(0.1, 0.9, length=10), permutationstrategy = "remove quantile", data = stackloss, GET.args = list(typeone = "fwer")) plot(res) }
One-way ANOVA tests for functional data with graphical interpretation
graph.fanova( nsim, curve_set, groups, variances = "equal", contrasts = FALSE, n.aver = 1L, mirror = FALSE, savefuns = FALSE, test.equality = c("mean", "var", "cov"), cov.lag = 1, ... )
graph.fanova( nsim, curve_set, groups, variances = "equal", contrasts = FALSE, n.aver = 1L, mirror = FALSE, savefuns = FALSE, test.equality = c("mean", "var", "cov"), cov.lag = 1, ... )
nsim |
The number of random permutations. |
curve_set |
The original data (an array of functions) provided as a
|
groups |
The original groups (a factor vector representing the assignment to groups). |
variances |
Either "equal" or "unequal". If "unequal", then correction for unequal variances
as explained in details will be done. Only relevant for the case |
contrasts |
Logical. FALSE and TRUE specify the two test functions as described in description part of this help file. |
n.aver |
If variances = "unequal", there is a possibility to use variances smoothed by appying moving average to the estimated sample variances. n.aver determines how many values on each side do contribute (incl. value itself). |
mirror |
The complement of the argument circular of |
savefuns |
Logical. If TRUE, then the functions from permutations are saved to the attribute simfuns. |
test.equality |
A character with possible values
If
where
See Mrkvicka et al. (2020) for more details. |
cov.lag |
The lag of the covariance for testing the equality of covariances,
see |
... |
Additional parameters to be passed to |
This function can be used to perform one-way graphical functional ANOVA tests described in Mrkvička et al. (2020). Both 1d and 2d functions are allowed in curve sets.
The tests assume that there are groups which contain
functions
.
The functions should be given in the argument
curve_set
,
and the groups in the argument groups
.
The tests assume that is an iid sample from
a stochastic process with mean function
and
covariance function
for s,t in R and j = 1,..., J.
To test the hypothesis
you can use the test function
where is a vector of mean values of functions in the group j.
This test function is used when
contrasts = FALSE
(default).
The hypothesis can equivalently be written as
and, alternatively, one can use the test function (vector) taken to consist of the differences of the group averages,
The choice is available with the option contrasts = TRUE
.
This test corresponds to the post-hoc test done usually after an ANOVA test is significant, but
it can be directed tested by means of the combined rank test (Mrkvička et al., 2017) with this test vector.
The test as such assumes that the variances are equal across the groups of functions. To deal with unequal variances, the differences are rescaled as the first step as follows
where is the overall sample mean and
is the overall sample standard deviation.
This scaling of the test functions can be obtained by giving the argument
variances = "unequal"
.
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56 (3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., and Hahn, U. (2017). Multiple Monte Carlo testing, with applications in spatial point processes. Statistics and Computing 27 (5): 1239-1255. doi:10.1007/s11222-016-9683-9
Myllymäki, M and Mrkvička, T. (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
#-- NOx levels example (see for details Myllymaki and Mrkvicka, 2020) if(require("fda.usc", quietly=TRUE)) { # Prepare data data("poblenou") fest <- poblenou$df$day.festive; week <- as.integer(poblenou$df$day.week) Type <- vector(length=length(fest)) Type[fest == 1 | week >= 6] <- "Free" Type[fest == 0 & week %in% 1:4] <- "MonThu" Type[fest == 0 & week == 5] <- "Fri" Type <- factor(Type, levels = c("MonThu", "Fri", "Free")) # (log) Data as a curve_set cset <- curve_set(r = 0:23, obs = t(log(poblenou[['nox']][['data']]))) # Graphical functional ANOVA nsim <- 2999 res.c <- graph.fanova(nsim = nsim, curve_set = cset, groups = Type, variances = "unequal", contrasts = TRUE) plot(res.c) + ggplot2::labs(x = "Hour", y = "Diff.") } #-- Centred government expenditure centralization ratios example # This is an example analysis of the centred GEC in Mrkvicka et al. data("cgec") # Number of simulations nsim <- 2499 # increase to reduce Monte Carlo error # Test for unequal lag 1 covariances res.cov1 <- graph.fanova(nsim = nsim, curve_set = cgec$cgec, groups = cgec$group, test.equality = "cov", cov.lag = 1) plot(res.cov1) # Add labels plot(res.cov1, labels = paste("Group ", 1:3, sep="")) + ggplot2::xlab(substitute(paste(italic(i), " (", j, ")", sep=""), list(i="r", j="Year"))) # Test for equality of variances among groups res.var <- graph.fanova(nsim = nsim, curve_set = cgec$cgec, groups = cgec$group, test.equality = "var") plot(res.var) # Test for equality of means assuming equality of variances # a) using 'means' res <- graph.fanova(nsim = nsim, curve_set = cgec$cgec, groups = cgec$group, variances = "equal", contrasts = FALSE) plot(res) # b) using 'contrasts' res2 <- graph.fanova(nsim = nsim, curve_set = cgec$cgec, groups = cgec$group, variances = "equal", contrasts = TRUE) plot(res2) # Image set examples data("imageset3") res <- graph.fanova(nsim = 19, # Increase nsim for serious analysis! curve_set = imageset3$image_set, groups = imageset3$Group) plot(res, what = c("obs", "lo.sign", "hi.sign"), sign.type = "col")
#-- NOx levels example (see for details Myllymaki and Mrkvicka, 2020) if(require("fda.usc", quietly=TRUE)) { # Prepare data data("poblenou") fest <- poblenou$df$day.festive; week <- as.integer(poblenou$df$day.week) Type <- vector(length=length(fest)) Type[fest == 1 | week >= 6] <- "Free" Type[fest == 0 & week %in% 1:4] <- "MonThu" Type[fest == 0 & week == 5] <- "Fri" Type <- factor(Type, levels = c("MonThu", "Fri", "Free")) # (log) Data as a curve_set cset <- curve_set(r = 0:23, obs = t(log(poblenou[['nox']][['data']]))) # Graphical functional ANOVA nsim <- 2999 res.c <- graph.fanova(nsim = nsim, curve_set = cset, groups = Type, variances = "unequal", contrasts = TRUE) plot(res.c) + ggplot2::labs(x = "Hour", y = "Diff.") } #-- Centred government expenditure centralization ratios example # This is an example analysis of the centred GEC in Mrkvicka et al. data("cgec") # Number of simulations nsim <- 2499 # increase to reduce Monte Carlo error # Test for unequal lag 1 covariances res.cov1 <- graph.fanova(nsim = nsim, curve_set = cgec$cgec, groups = cgec$group, test.equality = "cov", cov.lag = 1) plot(res.cov1) # Add labels plot(res.cov1, labels = paste("Group ", 1:3, sep="")) + ggplot2::xlab(substitute(paste(italic(i), " (", j, ")", sep=""), list(i="r", j="Year"))) # Test for equality of variances among groups res.var <- graph.fanova(nsim = nsim, curve_set = cgec$cgec, groups = cgec$group, test.equality = "var") plot(res.var) # Test for equality of means assuming equality of variances # a) using 'means' res <- graph.fanova(nsim = nsim, curve_set = cgec$cgec, groups = cgec$group, variances = "equal", contrasts = FALSE) plot(res) # b) using 'contrasts' res2 <- graph.fanova(nsim = nsim, curve_set = cgec$cgec, groups = cgec$group, variances = "equal", contrasts = TRUE) plot(res2) # Image set examples data("imageset3") res <- graph.fanova(nsim = 19, # Increase nsim for serious analysis! curve_set = imageset3$image_set, groups = imageset3$Group) plot(res, what = c("obs", "lo.sign", "hi.sign"), sign.type = "col")
Non-parametric graphical tests of significance in functional general linear model (GLM)
graph.flm( nsim, formula.full, formula.reduced, curve_sets, factors = NULL, contrasts = FALSE, lm.args = NULL, GET.args = NULL, mc.cores = 1L, mc.args = NULL, cl = NULL, savefuns = FALSE, fast = TRUE )
graph.flm( nsim, formula.full, formula.reduced, curve_sets, factors = NULL, contrasts = FALSE, lm.args = NULL, GET.args = NULL, mc.cores = 1L, mc.args = NULL, cl = NULL, savefuns = FALSE, fast = TRUE )
nsim |
The number of random permutations. |
formula.full |
The formula specifying the general linear model,
see |
formula.reduced |
The formula of the reduced model with nuisance factors only. This model should be nested within the full model. |
curve_sets |
A named list of sets of curves giving the dependent variable (Y), and
possibly additionally factors whose values vary across the argument values of the functions.
The dimensions of the elements should match with each other.
Note that factors that are fixed across the functions can be given in the argument |
factors |
A data frame of factors. An alternative way to specify factors when they are constant for all argument values of the functions. The number of rows of the data frame should be equal to the number of curves. Each column should specify the values of a factor. |
contrasts |
Logical or NULL. FALSE, TRUE and NULL specify the three test functions as described in description part of this help file. |
lm.args |
A named list of additional arguments to be passed to |
GET.args |
A named list of additional arguments to be passed to |
mc.cores |
The number of cores to use, i.e. at most how many child processes will be run simultaneously.
Must be at least one, and parallelization requires at least two cores. On a Windows computer mc.cores must be 1
(no parallelization). For details, see |
mc.args |
A named list of additional arguments to be passed to |
cl |
Allows parallelization through the use of |
savefuns |
Logical. If TRUE, then the functions from permutations are saved to the attribute simfuns. |
fast |
Logical. See details. |
The function graph.flm
performs the graphical functional GLM of Mrkvička et al. (2021),
described also in Section 3.6 of Myllymäki and Mrkvička (2020) (type vignette("GET")
in R).
This is a nonparametric graphical test of significance of a covariate in functional GLM.
The test is able to find not only if the factor of interest is significant, but also which
functional domain is responsible for the potential rejection.
In the case of functional multi-way main effect ANOVA or functional main effect ANCOVA models,
the test is able to find which groups differ (and where they differ).
In the case of functional factorial ANOVA or functional factorial ANCOVA models,
the test is able to find which combination of levels (which interactions) differ (and where they differ).
The described tests are global envelope tests applied in the context of GLMs.
The Freedman-Lane algorithm (Freedman and Lane, 1983) is applied to permute the functions
(to obtain the simulations under the null hypothesis of "no effects");
consequently, the test approximately achieves the desired significance level.
The specification of the full and reduced formulas is important. The reduced model should be
nested within the full model. The full model should include in addition to the reduced
model the interesting factors whose effects are under investigation.
The implementation to find the coefficients of the interesting factors is based on
dummy.coef
and the restrictions there apply.
The regression coefficients serve as test functions in the graphical functional GLM.
For a continuous interesting factor, the test function is its regression coefficient across the
functional domain. For a discrete factor, there are three possibilities that are controlled by
the arguments contrasts
. If contrasts = FALSE
, then the test statistic is
the function/long vector where the coefficients related to all levels of the factor are joined
together. If contrasts = TRUE
, then the differences between the same coefficients are
considered instead. Given the coefficients in a specific order that is obtained through the use
of lm
and dummy.coef
, the differences are taken for couples i and j
where i < j and reducing j from i (e.g. for three groups 1,2,3, the constrasts are 1-2, 1-3, 2-3).
If contrasts = NULL
the coefficients given by lm
are used directly.
There are different versions of the implementation depending on the application.
Given that the argument fast
is TRUE, then
If all the covariates are continuous or contrasts = NULL
and lm.args = NULL
the regression coefficients are computed using the normal equation approach instead of using lm
.
Otherwise, if all the covariates are constant across the functions, i.e. they can be provided in the
argument factors
, then a linear model is fitted separately by least-squares estimation to
the data at each argument value of the functions fitting a multiple linear model by lm
.
The possible extra arguments passed in lm.args
to lm
must be of the form that
lm
accepts for fitting a multiple linear model. In the basic case, no extra arguments are
needed.
Otherwise, if some of the covariates vary across the space and there are user specified extra arguments given in
lm.args
, then the implementation fits a linear model at each argument value of the functions using
lm
, which can be rather slow. The arguments lm.args
are passed to lm
for fitting each linear model.
By setting fast = FALSE
, it is possible to use the slow version (third option)
for any case. Usually this is not desired.
A global_envelope
or combined_global_envelope
object,
which can be printed and plotted directly.
Mrkvička, T., Roskovec, T. and Rost, M. (2021) A nonparametric graphical tests of significance in functional GLM. Methodology and Computing in Applied Probability 23, 593-612. doi: 10.1007/s11009-019-09756-y
Myllymäki, M and Mrkvička, T. (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
Freedman, D., & Lane, D. (1983) A nonstochastic interpretation of reported significance levels. Journal of Business & Economic Statistics, 1(4), 292-298. doi:10.2307/1391660
data("rimov") res <- graph.flm(nsim=19, # Increase the number of simulations for serious analysis! formula.full = Y~Year, formula.reduced = Y~1, curve_sets = list(Y=rimov), factors = data.frame(Year = 1979:2014)) plot(res) # Test if there is a change in the slope in 1994, # i.e. the full model is T = a + b*year + c*year:Interv, # where Interv is a dummy variable indicating the pre-intervention # period (coded 0) or the post-intervention period (coded 1) Year <- 1979:2014 res <- graph.flm(nsim = 19, # Increase the number of simulations for serious analysis! formula.full = Y ~ Year + Year:Interv, formula.reduced = Y ~ Year, curve_sets = list(Y=rimov), factors = data.frame(Year = Year, Interv = factor(c(rep(0,times=1994-1979+1), rep(1,times=2014-1994)), levels=0:1)), contrasts = NULL) plot(res) # An example of testing the joint effect of a discrete and a continuous variable nsim <- 999 data("GDPtax") factors.df <- data.frame(Group = GDPtax$Group, Tax = GDPtax$Profittax) res.tax_within_group <- graph.flm(nsim = nsim, formula.full = Y~Group+Tax+Group:Tax, formula.reduced = Y~Group+Tax, curve_sets = list(Y=GDPtax$GDP), factors = factors.df) plot(res.tax_within_group) # Image data examples data("abide_9002_23") iset <- abide_9002_23$curve_set # Testing the discrete factor 'group' with contrasts # (Use contrasts = FALSE for 'means'; and for continuous factors) res <- graph.flm(nsim = 19, # Increase nsim for serious analysis! formula.full = Y ~ Group + Sex + Age, formula.reduced = Y ~ Sex + Age, curve_sets = list(Y = iset), factors = abide_9002_23[['factors']], contrasts = TRUE, GET.args = list(type = "area")) plot(res) # Examples of modifying 2d plots plot(res, sign.col="white") + ggplot2::scale_fill_viridis_c(option="magma") plot(res, sign.col="white") + ggplot2::scale_fill_viridis_c(option="magma") + ggplot2::scale_radius(range = 2*c(1, 6)) plot(res, what=c("obs", "lo", "hi", "lo.sign", "hi.sign")) plot(res, what=c("obs", "lo", "hi", "lo.sign", "hi.sign"), sign.type="col")
data("rimov") res <- graph.flm(nsim=19, # Increase the number of simulations for serious analysis! formula.full = Y~Year, formula.reduced = Y~1, curve_sets = list(Y=rimov), factors = data.frame(Year = 1979:2014)) plot(res) # Test if there is a change in the slope in 1994, # i.e. the full model is T = a + b*year + c*year:Interv, # where Interv is a dummy variable indicating the pre-intervention # period (coded 0) or the post-intervention period (coded 1) Year <- 1979:2014 res <- graph.flm(nsim = 19, # Increase the number of simulations for serious analysis! formula.full = Y ~ Year + Year:Interv, formula.reduced = Y ~ Year, curve_sets = list(Y=rimov), factors = data.frame(Year = Year, Interv = factor(c(rep(0,times=1994-1979+1), rep(1,times=2014-1994)), levels=0:1)), contrasts = NULL) plot(res) # An example of testing the joint effect of a discrete and a continuous variable nsim <- 999 data("GDPtax") factors.df <- data.frame(Group = GDPtax$Group, Tax = GDPtax$Profittax) res.tax_within_group <- graph.flm(nsim = nsim, formula.full = Y~Group+Tax+Group:Tax, formula.reduced = Y~Group+Tax, curve_sets = list(Y=GDPtax$GDP), factors = factors.df) plot(res.tax_within_group) # Image data examples data("abide_9002_23") iset <- abide_9002_23$curve_set # Testing the discrete factor 'group' with contrasts # (Use contrasts = FALSE for 'means'; and for continuous factors) res <- graph.flm(nsim = 19, # Increase nsim for serious analysis! formula.full = Y ~ Group + Sex + Age, formula.reduced = Y ~ Sex + Age, curve_sets = list(Y = iset), factors = abide_9002_23[['factors']], contrasts = TRUE, GET.args = list(type = "area")) plot(res) # Examples of modifying 2d plots plot(res, sign.col="white") + ggplot2::scale_fill_viridis_c(option="magma") plot(res, sign.col="white") + ggplot2::scale_fill_viridis_c(option="magma") + ggplot2::scale_radius(range = 2*c(1, 6)) plot(res, what=c("obs", "lo", "hi", "lo.sign", "hi.sign")) plot(res, what=c("obs", "lo", "hi", "lo.sign", "hi.sign"), sign.type="col")
A simulated set of images with a categorical factor (with three levels)
data("imageset3")
data("imageset3")
A list of the image_set
containing the simulated images, and
the discrete group factor in the list component Group
.
We considered a categorical factor Group
obtaining the values 0, 1 or 2
according to the group to which the image belongs to (10 images in each of the three
groups). The images were simulated in the square window [-1,1]^2 from the
general linear model (GLM)
where ||r|| denotes the Euclidean distance of the pixel to the origin, g is the group and the error stems from an inhomogeneous distribution over $I$ with the normal and bimodal errors in the middle and periphery of the image:
where G(r) is a Gaussian random field with the exponential correlation structure with scale parameter 0.15 and standard deviation 0.2. Consequently, the first two groups (0,1) have the same mean, while a bigger bump appears in the third group (2) in the middle of the image.
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
data("imageset3") plot(imageset3$image_set, idx=c(1:5, 11:15, 21:25), ncol=5) # Colors can be changed as follows: plot(imageset3$image_set, idx=c(1:5, 11:15, 21:25), ncol=5) + ggplot2::scale_fill_gradient(low="black", high="white")
data("imageset3") plot(imageset3$image_set, idx=c(1:5, 11:15, 21:25), ncol=5) # Colors can be changed as follows: plot(imageset3$image_set, idx=c(1:5, 11:15, 21:25), ncol=5) + ggplot2::scale_fill_gradient(low="black", high="white")
Check class.
is.curve_set(x)
is.curve_set(x)
x |
An object to be checked. |
Simulated data set mimicing stand age distributions in natural, near-natural non-natural forests in the setup of Mrkvička et al. (2024).
data("naturalness")
data("naturalness")
A data.frame
containing the dominant species (three categories:
Broadleaf, Conifer, Mixed), naturalness (Natural, Near-natural and Non-natural)
and simulated stand age.
The numbers of observations in the naturalness groups and in the dominant species groups correspond to the numbers of plots in on rich mineral soils in Finnish Lapland in the data set analysed in Myllymäki et al. (2023). The stand age values are simulated values from the quantile regression model of Mrkvička et al. (2024).
Myllymäki, M., Tuominen, S., Kuronen, M., Packalen, P. and Kangas, A. (2023) The relationship between forest structure and naturalness in the Finnish national forest inventory. Forestry: An International Journal of Forest Research, cpad053. DOI: https://doi.org/10.1093/forestry/cpad053
Mrkvička, T., Konstantinou, K., Kuronen, M. and Myllymäki, M. (2024) Global quantile regression. arXiv:2309.04746 [stat.ME] DOI: https://doi.org/10.48550/arXiv.2309.04746
If the functional data doesn't comfortably fit in memory it is possible to
compute functional ordering by splitting the domain of the data (voxels in
a brain image), using partial_forder
on each part and finally
combining the results with combine_forder
.
partial_forder( curve_set, measure = c("erl", "rank", "cont", "area"), alternative = c("two.sided", "less", "greater") ) combine_forder(ls)
partial_forder( curve_set, measure = c("erl", "rank", "cont", "area"), alternative = c("two.sided", "less", "greater") ) combine_forder(ls)
curve_set |
A |
measure |
The measure to use to order the functions from the most extreme to the least extreme one. Must be one of the following: 'rank', 'erl', 'cont', 'area', 'max', 'int', 'int2'. Default is 'erl'. |
alternative |
A character string specifying the alternative hypothesis.
Must be one of the following: "two.sided" (default), "less" or "greater".
The last two options only available for types |
ls |
List of objects returned by partial_forder |
See forder
data("abide_9002_23") res <- lapply(list(1:100, 101:200, 201:261), function(part) { set.seed(123) # When using partial_forder, all parts must use the same seed. fset <- frank.flm(nsim=99, formula.full = Y ~ Group + Sex + Age, formula.reduced = Y ~ Group + Sex, curve_sets = list(Y = abide_9002_23$curve_set[part,]), factors = abide_9002_23$factors, savefuns = "return") partial_forder(fset, measure="erl") }) combine_forder(res)
data("abide_9002_23") res <- lapply(list(1:100, 101:200, 201:261), function(part) { set.seed(123) # When using partial_forder, all parts must use the same seed. fset <- frank.flm(nsim=99, formula.full = Y ~ Group + Sex + Age, formula.reduced = Y ~ Group + Sex, curve_sets = list(Y = abide_9002_23$curve_set[part,]), factors = abide_9002_23$factors, savefuns = "return") partial_forder(fset, measure="erl") }) combine_forder(res)
Plot method for the class 'combined_fboxplot'
## S3 method for class 'combined_fboxplot' plot( x, labels, scales = "free", ncol = 2 + 1 * (length(x) == 3), digits = 3, outliers = TRUE, ... )
## S3 method for class 'combined_fboxplot' plot( x, labels, scales = "free", ncol = 2 + 1 * (length(x) == 3), digits = 3, outliers = TRUE, ... )
x |
an 'combined_fboxplot' object |
labels |
A character vector of suitable length.
If |
scales |
See |
ncol |
The maximum number of columns for the figures. Default 2 or 3, if the length of x equals 3. (Relates to the number of curve_sets that have been combined.) |
digits |
The number of digits used for printing the p-value or p-interval in the default main. |
outliers |
Logical. If TRUE, then the functions outside the functional boxplot are drawn. |
... |
Ignored. |
This function provides plots for combined global envelopes.
## S3 method for class 'combined_global_envelope' plot( x, labels, scales, sign.col = "red", dotplot = length(x[[1]]$obs) < 5, ncol = 2 + 1 * (length(x) == 3), digits = 3, level = 1, ... )
## S3 method for class 'combined_global_envelope' plot( x, labels, scales, sign.col = "red", dotplot = length(x[[1]]$obs) < 5, ncol = 2 + 1 * (length(x) == 3), digits = 3, level = 1, ... )
x |
An 'combined_global_envelope' object |
labels |
A character vector of suitable length.
If |
scales |
See |
sign.col |
The color for the observed curve when outside the global envelope
(significant regions). Default to "red". Setting the color to |
dotplot |
Logical. If TRUE, then instead of envelopes a dot plot is done. Suitable for low dimensional test vectors. |
ncol |
The maximum number of columns for the figures. Default 2 or 3, if the length of x equals 3. (Relates to the number of curve_sets that have been combined.) |
digits |
The number of digits used for printing the p-value or p-interval in the default main. |
level |
1 or 2. In the case of two-step combined tests (with several test functions), two different plots are available: 1 for plotting the combined global envelopes (default and most often wanted) or 2 for plotting the second level test result. |
... |
Ignored in most cases. If |
Plotting method for the class 'combined_global_envelope', i.e. combined envelopes for 1d functions.
If fixedscales is FALSE (or 0) all images will have separate scale. If fixedscales is TRUE (or 1) each x[[i]] will have a common scale. If fixedscales is 2 all images will have common scale.
If more than one envelope has been calculated (corresponding to several coverage/alpha), only the largest one is plotted.
## S3 method for class 'combined_global_envelope2d' plot( x, fixedscales = 2, labels, what = c("obs.sign", "obs", "lo", "hi", "lo.sign", "hi.sign"), sign.type = c("circles", "contour", "col"), sign.col = c("blue", "red"), transparency = 155/255, digits = 3, ... )
## S3 method for class 'combined_global_envelope2d' plot( x, fixedscales = 2, labels, what = c("obs.sign", "obs", "lo", "hi", "lo.sign", "hi.sign"), sign.type = c("circles", "contour", "col"), sign.col = c("blue", "red"), transparency = 155/255, digits = 3, ... )
x |
A 'global_envelope' object for two-dimensional functions |
fixedscales |
0, 1 or 2. See details. |
labels |
A character vector of suitable length giving the labels for the separate plots. Default exists. This parameter allows replacing the default. |
what |
Character vector specifying what information should be plotted for 2d functions.
A combination of:
Observed ( |
sign.type |
Either |
sign.col |
A vector of length two giving the colors for significant parts below the envelope (first value) and above the envelope (second value). |
transparency |
A number between 0 and 1 (default 155/255, 60
Similar to alpha of |
digits |
The number of digits used for printing the p-value or p-interval in the default main. |
... |
Ignored. |
data("abide_9002_23") iset <- subset(abide_9002_23[['curve_set']], 1:50) factors <- abide_9002_23[['factors']][1:50,] res <- graph.flm(nsim = 19, # Increase nsim for serious analysis! formula.full = Y ~ Group + Sex + Age, formula.reduced = Y ~ Sex + Age, curve_sets = list(Y=iset), factors = factors, contrasts = FALSE, GET.args = list(type="area")) plot(res) plot(res, what=c("obs", "hi")) plot(res, what=c("hi", "lo"), fixedscales=1) plot(res, what=c("obs", "lo", "hi"), fixedscales=FALSE) if(requireNamespace("gridExtra", quietly=TRUE)) { # Edit style of "fixedscales = 2" plots plot(res, what=c("obs", "hi")) + ggplot2::theme_minimal() plot(res, what=c("obs", "hi")) + ggplot2::theme_bw() # Edit style (e.g. theme) of "fixedscales = 1 or 0" plots gs <- lapply(res, function(x, what) { plot(x, what=what) + ggplot2::ggtitle("") }, what=c("obs", "hi")) gridExtra::grid.arrange(grobs=gs, ncol=1, top="My main") gs <- outer(res, c("obs", "hi"), FUN=Vectorize(function(res, what) list(plot(res, what=what) + ggplot2::ggtitle("") + ggplot2::theme(axis.ticks=ggplot2::element_blank(), axis.text=ggplot2::element_blank(), axis.title=ggplot2::element_blank())))) gridExtra::grid.arrange(grobs=t(gs)) }
data("abide_9002_23") iset <- subset(abide_9002_23[['curve_set']], 1:50) factors <- abide_9002_23[['factors']][1:50,] res <- graph.flm(nsim = 19, # Increase nsim for serious analysis! formula.full = Y ~ Group + Sex + Age, formula.reduced = Y ~ Sex + Age, curve_sets = list(Y=iset), factors = factors, contrasts = FALSE, GET.args = list(type="area")) plot(res) plot(res, what=c("obs", "hi")) plot(res, what=c("hi", "lo"), fixedscales=1) plot(res, what=c("obs", "lo", "hi"), fixedscales=FALSE) if(requireNamespace("gridExtra", quietly=TRUE)) { # Edit style of "fixedscales = 2" plots plot(res, what=c("obs", "hi")) + ggplot2::theme_minimal() plot(res, what=c("obs", "hi")) + ggplot2::theme_bw() # Edit style (e.g. theme) of "fixedscales = 1 or 0" plots gs <- lapply(res, function(x, what) { plot(x, what=what) + ggplot2::ggtitle("") }, what=c("obs", "hi")) gridExtra::grid.arrange(grobs=gs, ncol=1, top="My main") gs <- outer(res, c("obs", "hi"), FUN=Vectorize(function(res, what) list(plot(res, what=what) + ggplot2::ggtitle("") + ggplot2::theme(axis.ticks=ggplot2::element_blank(), axis.text=ggplot2::element_blank(), axis.title=ggplot2::element_blank())))) gridExtra::grid.arrange(grobs=t(gs)) }
Plot method for the class 'curve_set'
## S3 method for class 'curve_set' plot(x, idx, col_idx, idx_name = "", col = "grey70", ...)
## S3 method for class 'curve_set' plot(x, idx, col_idx, idx_name = "", col = "grey70", ...)
x |
An |
idx |
Indices of functions to highlight with color |
col_idx |
A color for the curves to highlight, or a vector of the same length as |
idx_name |
A variable name to be printed with the highlighted functions' idx. Default to empty. |
col |
The basic color for the curves (which are not highlighted). |
... |
Ignored. |
cset <- curve_set(r = 1:10, obs = matrix(runif(10*5), ncol=5)) plot(cset) # Highlight some functions plot(cset, idx=c(1,3)) plot(cset, idx=c(1,3), col_idx=c("black", "red")) # Change legend plot(cset, idx=c(1,3), col_idx=c("black", "red"), idx_name="Special functions") plot(cset, idx=c(1,3)) + ggplot2::theme(legend.position="bottom") # Add labels plot(cset, idx=c(1,3)) + ggplot2::labs(x="x", y="Value") # and title plot(cset) + ggplot2::labs(title="Example curves", x="x", y="Value") # A curve_set with one observed function (other simulated) if(requireNamespace("mvtnorm", quietly=TRUE)) { cset <- curve_set(obs = c(-1.6, 1.6), sim = t(mvtnorm::rmvnorm(200, c(0,0), matrix(c(1,0.5,0.5,1), 2, 2)))) plot(cset) # Remove legend plot(cset) + ggplot2::theme(legend.position="none") }
cset <- curve_set(r = 1:10, obs = matrix(runif(10*5), ncol=5)) plot(cset) # Highlight some functions plot(cset, idx=c(1,3)) plot(cset, idx=c(1,3), col_idx=c("black", "red")) # Change legend plot(cset, idx=c(1,3), col_idx=c("black", "red"), idx_name="Special functions") plot(cset, idx=c(1,3)) + ggplot2::theme(legend.position="bottom") # Add labels plot(cset, idx=c(1,3)) + ggplot2::labs(x="x", y="Value") # and title plot(cset) + ggplot2::labs(title="Example curves", x="x", y="Value") # A curve_set with one observed function (other simulated) if(requireNamespace("mvtnorm", quietly=TRUE)) { cset <- curve_set(obs = c(-1.6, 1.6), sim = t(mvtnorm::rmvnorm(200, c(0,0), matrix(c(1,0.5,0.5,1), 2, 2)))) plot(cset) # Remove legend plot(cset) + ggplot2::theme(legend.position="none") }
Plot method for the class 'curve_set2d', i.e. two-dimensional functions
## S3 method for class 'curve_set2d' plot(x, idx = 1, ncol = 2 + 1 * (length(idx) == 3), ...)
## S3 method for class 'curve_set2d' plot(x, idx = 1, ncol = 2 + 1 * (length(idx) == 3), ...)
x |
An |
idx |
Indices of 2d functions to plot. |
ncol |
The maximum number of columns for the figures. Default 2 or 3, if the length of x equals 3. (Relates to the number of curve_sets that have been combined.) |
... |
Ignored. |
data("abide_9002_23") plot(abide_9002_23$curve_set, idx=c(1, 27))
data("abide_9002_23") plot(abide_9002_23$curve_set, idx=c(1, 27))
Plot method for the class 'fboxplot'
## S3 method for class 'fboxplot' plot(x, digits = 3, outliers = TRUE, ...)
## S3 method for class 'fboxplot' plot(x, digits = 3, outliers = TRUE, ...)
x |
an 'fboxplot' object |
digits |
The number of digits used for printing the p-value or p-interval in the default main. |
outliers |
Logical. If TRUE, then the functions outside the functional boxplot are drawn. |
... |
Ignored. |
if(requireNamespace("fda", quietly=TRUE)) { years <- paste(1:18) curves <- fda::growth[['hgtf']][years,] # Heights cset1 <- curve_set(r = as.numeric(years), obs = curves) bp <- fBoxplot(cset1, coverage=0.50, type="area", factor=1) plot(bp) plot(bp) + ggplot2::theme(legend.position="bottom") plot(bp) + ggplot2::theme(legend.position="none") plot(bp, plot_outliers=FALSE) }
if(requireNamespace("fda", quietly=TRUE)) { years <- paste(1:18) curves <- fda::growth[['hgtf']][years,] # Heights cset1 <- curve_set(r = as.numeric(years), obs = curves) bp <- fBoxplot(cset1, coverage=0.50, type="area", factor=1) plot(bp) plot(bp) + ggplot2::theme(legend.position="bottom") plot(bp) + ggplot2::theme(legend.position="none") plot(bp, plot_outliers=FALSE) }
Plot method for the 'fclust' objects returned by fclustering
.
## S3 method for class 'fclust' plot(x, plotstyle = c("marginal", "joined"), coverage = 0.5, nstep, ncol, ...)
## S3 method for class 'fclust' plot(x, plotstyle = c("marginal", "joined"), coverage = 0.5, nstep, ncol, ...)
x |
An 'fclust' object. |
plotstyle |
The resulting central regions of clusters can be plotted by sorting the
appropriate |
coverage |
The coverage of central regions to be used to show the clusters. |
nstep |
1 or 2 for how to contruct a combined (joined) global envelope if there are more than one sets of curves. Default to 1, if the numbers of points where the curves are observed (r) are the same in each set, and 2 otherwise. |
ncol |
The number of columns in the graphical output, when there is just one set of curves
that has been ordered. If not given, |
... |
Ignored. |
The clusters are shown respectively for each curve_set
. Thus for each curve_set
the panel with all the medoids is shown followed by all clusters represented by central region,
medoid and all curves belonging to it.
For all sources, the function plots the deepest curves for all clusters and the deepest curve of each cluster together with the desired central region and all the curves of the group.
Dai, W., Athanasiadis, S., Mrkvička, T. (2021) A new functional clustering method with combined dissimilarity sources and graphical interpretation. Intech open, London, UK. DOI: 10.5772/intechopen.100124
Plot method for the class 'global_envelope'
## S3 method for class 'global_envelope' plot( x, dotplot = length(x$r) < 10, sign.col = "red", labels = NULL, digits = 3, ... )
## S3 method for class 'global_envelope' plot( x, dotplot = length(x$r) < 10, sign.col = "red", labels = NULL, digits = 3, ... )
x |
An 'global_envelope' object |
dotplot |
Logical. If TRUE, then instead of envelopes a dot plot is done. Suitable for low dimensional test vectors. Default: TRUE if the dimension is less than 10, FALSE otherwise. |
sign.col |
The color for the observed curve when outside the global envelope
(significant regions). Default to "red". Setting the color to |
labels |
A character vector of suitable length.
If |
digits |
The number of digits used for printing the p-value or p-interval in the default main. |
... |
Ignored. |
If several envelopes have been computed, their are plotted in different grey scales so that the smallest envelope has the darkest color and the widest envelope consist of all grey scales with the lightest color in the outskirts.
central_region
, global_envelope_test
if(require("spatstat.explore", quietly=TRUE)) { X <- unmark(spruces) nsim <- 1999 # Number of simulations env <- envelope(X, fun="Kest", nsim=nsim, savefuns=TRUE, # save the functions correction="translate", # edge correction for K simulate=expression(runifpoint(ex=X))) # Simulate CSR res <- global_envelope_test(env, type="erl") # Default plot plot(res) # Plots can be edited, e.g. # Remove legend plot(res) + ggplot2::theme(legend.position="none") # Change its position plot(res) + ggplot2::theme(legend.position="right") # Change the outside color plot(res, sign.col="#5DC863FF") plot(res, sign.col=NULL) # Change default title and x- and y-labels plot(res) + ggplot2::labs(title="95% global envelope", x="x", y="f(x)") # Prior to the plot, you can set your preferred ggplot theme by theme_set old <- ggplot2::theme_set(ggplot2::theme_bw()) plot(res) # Do other edits, e.g. turn off expansion with the default limits plot(res) + ggplot2::coord_cartesian(expand=FALSE) # Go back to the old theme ggplot2::theme_set(old) # If you are working with the R package spatstat and its envelope-function, # you can obtain global envelope plots in the style of spatstat using plot.fv: plot.fv(res) }
if(require("spatstat.explore", quietly=TRUE)) { X <- unmark(spruces) nsim <- 1999 # Number of simulations env <- envelope(X, fun="Kest", nsim=nsim, savefuns=TRUE, # save the functions correction="translate", # edge correction for K simulate=expression(runifpoint(ex=X))) # Simulate CSR res <- global_envelope_test(env, type="erl") # Default plot plot(res) # Plots can be edited, e.g. # Remove legend plot(res) + ggplot2::theme(legend.position="none") # Change its position plot(res) + ggplot2::theme(legend.position="right") # Change the outside color plot(res, sign.col="#5DC863FF") plot(res, sign.col=NULL) # Change default title and x- and y-labels plot(res) + ggplot2::labs(title="95% global envelope", x="x", y="f(x)") # Prior to the plot, you can set your preferred ggplot theme by theme_set old <- ggplot2::theme_set(ggplot2::theme_bw()) plot(res) # Do other edits, e.g. turn off expansion with the default limits plot(res) + ggplot2::coord_cartesian(expand=FALSE) # Go back to the old theme ggplot2::theme_set(old) # If you are working with the R package spatstat and its envelope-function, # you can obtain global envelope plots in the style of spatstat using plot.fv: plot.fv(res) }
If more than one envelope has been calculated (corresponding to several coverage/alpha), only the largest one is plotted.
## S3 method for class 'global_envelope2d' plot( x, fixedscales = TRUE, what = c("obs.sign", "obs", "lo", "hi", "lo.sign", "hi.sign"), sign.type = c("circles", "contour", "col"), sign.col = c("blue", "red"), transparency = 155/255, digits = 3, ... )
## S3 method for class 'global_envelope2d' plot( x, fixedscales = TRUE, what = c("obs.sign", "obs", "lo", "hi", "lo.sign", "hi.sign"), sign.type = c("circles", "contour", "col"), sign.col = c("blue", "red"), transparency = 155/255, digits = 3, ... )
x |
A 'global_envelope' object for two-dimensional functions |
fixedscales |
Logical. TRUE for the same scales for all images. |
what |
Character vector specifying what information should be plotted for 2d functions.
A combination of:
Observed ( |
sign.type |
Either |
sign.col |
A vector of length two giving the colors for significant parts below the envelope (first value) and above the envelope (second value). |
transparency |
A number between 0 and 1 (default 155/255, 60
Similar to alpha of |
digits |
The number of digits used for printing the p-value or p-interval in the default main. |
... |
Ignored. |
Population growth
data("popgrowthmillion")
data("popgrowthmillion")
A matrix, where each row corresponds to a year and each column to a country. Column names correspond to the countries, and row names to the years.
This dataset includes population growth, i.e. population at the end of the year divided by population at the beginning of the year, in 134 countries in years from 1950 to 2015. The dataset includes only countries over million inhabitants in 1950. The data were extracted from the supplement of Nagy et al. (2017) distributed under the GPL-2 license.
Nagy, S., I. Gijbels, and D. Hlubinka (2017). Depth-based recognition of shape outlying functions. Journal of Computational and Graphical Statistics 26 (4), 883-893.
Print method for the class 'combined_fboxplot'
## S3 method for class 'combined_fboxplot' print(x, ...)
## S3 method for class 'combined_fboxplot' print(x, ...)
x |
an 'combined_fboxplot' object |
... |
Ignored. |
Print method for the class 'combined_global_envelope'
## S3 method for class 'combined_global_envelope' print(x, ...)
## S3 method for class 'combined_global_envelope' print(x, ...)
x |
A 'combined_global_envelope' object |
... |
Ignored. |
Print method for the class 'curve_set'
## S3 method for class 'curve_set' print(x, ...)
## S3 method for class 'curve_set' print(x, ...)
x |
an 'curve_set' object |
... |
Passed to |
Print method for the class 'deviation_test'
## S3 method for class 'deviation_test' print(x, ...)
## S3 method for class 'deviation_test' print(x, ...)
x |
an 'deviation_test' object |
... |
Ignored. |
Print method for the class 'fboxplot'
## S3 method for class 'fboxplot' print(x, ...)
## S3 method for class 'fboxplot' print(x, ...)
x |
an 'fboxplot' object |
... |
Ignored. |
Print method for the 'fclust' objects returned by fclustering
.
## S3 method for class 'fclust' print(x, ...)
## S3 method for class 'fclust' print(x, ...)
x |
A object of class 'fclust'. |
... |
Ignored. |
Print method for the class 'fdr_envelope'
## S3 method for class 'fdr_envelope' print(x, ...)
## S3 method for class 'fdr_envelope' print(x, ...)
x |
An 'fdr_envelope' object |
... |
Ignored. |
Print method for the class 'GET_contingency'
## S3 method for class 'GET_contingency' print(x, ...)
## S3 method for class 'GET_contingency' print(x, ...)
x |
A 'GET_contingency' object |
... |
Ignored. |
Print method for the class 'global_envelope'
## S3 method for class 'global_envelope' print(x, ...)
## S3 method for class 'global_envelope' print(x, ...)
x |
A 'global_envelope' object. |
... |
Ignored. |
Performs the global scaled MAD envelope tests, either directional quantile or studentised,
or the unscaled MAD envelope test. These tests correspond to calling the
function global_envelope_test
with type="qdir"
, type = "st"
and
type="unscaled"
, respectively. The functions qdir_envelope
, st_envelope
and
unscaled_envelope
have been kept for historical reasons;
preferably use global_envelope_test
with the suitable type
argument.
qdir_envelope(curve_set, ...) st_envelope(curve_set, ...) unscaled_envelope(curve_set, ...)
qdir_envelope(curve_set, ...) st_envelope(curve_set, ...) unscaled_envelope(curve_set, ...)
curve_set |
A |
... |
Additional parameters to be passed to |
The directional quantile envelope test (Myllymäki et al., 2015, 2017) takes into account the unequal variances of the test function T(r) for different distances r and is also protected against asymmetry of T(r).
The studentised envelope test (Myllymäki et al., 2015, 2017) takes into account the unequal variances of the test function T(r) for different distances r.
The unscaled envelope test (Ripley, 1981) corresponds to the classical maximum
deviation test without scaling, and leads to envelopes with constant width over the distances r.
Thus, it suffers from unequal variance of T(r) over the distances r and from the asymmetry of
distribution of T(r). We recommend to use the other global envelope tests available,
see global_envelope_test
for full list of alternatives.
An object of class global_envelope
of combined_global_envelope
which can be printed and plotted directly. See global_envelope_test
for more details.
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11: 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381–404. doi: 10.1111/rssb.12172
Ripley, B.D. (1981). Spatial statistics. Wiley, New Jersey.
# See more examples in ?global_envelope_test ## Testing complete spatial randomness (CSR) #------------------------------------------- if(require("spatstat.explore", quietly=TRUE)) { X <- spruces nsim <- 999 # Number of simulations ## Test for complete spatial randomness (CSR) # Generate nsim simulations under CSR, calculate centred L-function for the data and simulations env <- envelope(X, fun="Lest", nsim=nsim, savefuns=TRUE, correction="translate", transform=expression(.-r), simulate=expression(runifpoint(ex=X))) res_qdir <- qdir_envelope(env) # The directional quantile envelope test plot(res_qdir) ## Advanced use: # Create a curve set, choosing the interval of distances [r_min, r_max] curve_set <- crop_curves(env, r_min=1, r_max=8) # The directional quantile envelope test res_qdir <- qdir_envelope(curve_set); plot(res_qdir) # The studentised envelope test res_st <- st_envelope(curve_set); plot(res_st) # The unscaled envelope test res_unscaled <- unscaled_envelope(curve_set); plot(res_unscaled) }
# See more examples in ?global_envelope_test ## Testing complete spatial randomness (CSR) #------------------------------------------- if(require("spatstat.explore", quietly=TRUE)) { X <- spruces nsim <- 999 # Number of simulations ## Test for complete spatial randomness (CSR) # Generate nsim simulations under CSR, calculate centred L-function for the data and simulations env <- envelope(X, fun="Lest", nsim=nsim, savefuns=TRUE, correction="translate", transform=expression(.-r), simulate=expression(runifpoint(ex=X))) res_qdir <- qdir_envelope(env) # The directional quantile envelope test plot(res_qdir) ## Advanced use: # Create a curve set, choosing the interval of distances [r_min, r_max] curve_set <- crop_curves(env, r_min=1, r_max=8) # The directional quantile envelope test res_qdir <- qdir_envelope(curve_set); plot(res_qdir) # The studentised envelope test res_st <- st_envelope(curve_set); plot(res_st) # The unscaled envelope test res_unscaled <- unscaled_envelope(curve_set); plot(res_unscaled) }
The rank envelope test, p-values and global envelopes.
The test corresponds to the global envelope test that can be carriet out by
global_envelope_test
by specifying the type
for which the options
"rank"
, "erl"
, "cont"
and "area"
are available. The last
three are modifications of the first one to treat the ties in the extreme rank ordering
used in "rank"
. This function is kept for historical reasons.
rank_envelope(curve_set, type = "rank", ...)
rank_envelope(curve_set, type = "rank", ...)
curve_set |
A |
type |
The type of the global envelope with current options for "rank", "erl", "cont" and "area". If "rank", the global rank envelope accompanied by the p-interval is given (Myllymäki et al., 2017). If "erl", the global rank envelope based on extreme rank lengths accompanied by the extreme rank length p-value is given (Myllymäki et al., 2017, Mrkvička et al., 2018). See details and additional sections thereafter. |
... |
Additional parameters to be passed to |
The "rank"
envelope test is a completely non-parametric test, which provides
the 100(1-alpha)% global envelope for the chosen test function T(r) on
the chosen interval of distances and associated p-values.
The other three types are solutions to break the ties in the extreme ranks
on which the "rank"
envelope test is based on.
Note: The method to break ties for the global type = "rank"
envelope
(Myllymäki et al., 2017) can be done by the argument ties
with default
to ties = "erl"
corresponding to the extreme rank length breaking of ties.
In this case the global envelope corresponds to the extreme rank measure.
If instead choosing type
to be "erl"
, "cont"
or "area"
,
then the global envelope corresponds to these measures.
An object of class global_envelope
of combined_global_envelope
which can be printed and plotted directly. See global_envelope_test
for more details.
The global "erl"
, "cont"
, "area"
envelope tests allow
in principle a lower number of simulations to be used than the global "rank"
test
based on extreme ranks.
However, if feasible, we recommend some thousands of simulations in any case to achieve
a good power and repeatability of the test.
For the global "rank"
envelope test, Myllymäki et al. (2017) recommended to use
at least 2500 simulations for testing at the significance level alpha = 0.05 for single
function tests, experimented with summary functions for point processes.
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381–404. doi: 10.1111/rssb.12172
Mrkvička, T., Myllymäki, M. and Hahn, U. (2017). Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27 (5): 1239-1255. doi: 10.1007/s11222-016-9683-9
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56 (3), 432-458. doi: 10.14736/kyb-2020-3-0432
# See ?global_envelope_test for more examples ## Testing complete spatial randomness (CSR) #------------------------------------------- if(require("spatstat.explore", quietly=TRUE)) { X <- unmark(spruces) nsim <- 2499 # Number of simulations # Generate nsim simulations under CSR, calculate centred L-function for the data and simulations env <- envelope(X, fun="Lest", nsim=nsim, savefuns=TRUE, correction="translate", transform=expression(.-r), simulate=expression(runifpoint(ex=X))) # The rank envelope test res <- rank_envelope(env) # Plot the result. plot(res) ## Advanced use: # Choose the interval of distances [r_min, r_max] (at the same time create a curve_set from 'env') curve_set <- crop_curves(env, r_min=1, r_max=7) # Do the rank envelope test res <- rank_envelope(curve_set); plot(res) }
# See ?global_envelope_test for more examples ## Testing complete spatial randomness (CSR) #------------------------------------------- if(require("spatstat.explore", quietly=TRUE)) { X <- unmark(spruces) nsim <- 2499 # Number of simulations # Generate nsim simulations under CSR, calculate centred L-function for the data and simulations env <- envelope(X, fun="Lest", nsim=nsim, savefuns=TRUE, correction="translate", transform=expression(.-r), simulate=expression(runifpoint(ex=X))) # The rank envelope test res <- rank_envelope(env) # Plot the result. plot(res) ## Advanced use: # Choose the interval of distances [r_min, r_max] (at the same time create a curve_set from 'env') curve_set <- crop_curves(env, r_min=1, r_max=7) # Do the rank envelope test res <- rank_envelope(curve_set); plot(res) }
Subtract the theoretical function or the mean of the functions
in the curve set. If the
curve_set
object contains already residuals
,
use_theo
ignored and the same object returned.
residual(curve_set, use_theo = TRUE)
residual(curve_set, use_theo = TRUE)
curve_set |
A |
use_theo |
Whether to use the theoretical summary function or the mean of the functions in the curve_set. |
The mean of the functions in the curve_set
is the mean of all functions.
If use_theo = TRUE
, but the component theo
does not exist in the
curve_set
, the mean of the functions is used silently.
A curve set object containing residual summary functions. theo is no longer included.
Year temperature curves
data("rimov")
data("rimov")
A curve_set
object with water temperatures in 365 days of the 36 years.
The component curve_set[['r']]
is a vector of days (from 1 to 365),
whereas curve_set[['obs']]
contains the water temperatures such that
each column gives year temperatures in a year.
The water temperature data sampled at the water level of Rimov reservoir in Czech republic every day for the 36 years between 1979 and 2014.
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56 (3), 432-458. doi: 10.14736/kyb-2020-3-0432
graph.fanova
data("rimov") groups <- factor(c(rep(1, times=12), rep(2, times=12), rep(3, times=12))) for(i in 1:3) assign(paste0("p", i), plot(subset(rimov, groups==i)) + ggplot2::labs(title=paste("Group ", i, sep=""), y="Temperature")) p3 if(require("patchwork", quietly=TRUE)) p1 + p2 + p3 # See example analysis in ?graph.fanova
data("rimov") groups <- factor(c(rep(1, times=12), rep(2, times=12), rep(3, times=12))) for(i in 1:3) assign(paste0("p", i), plot(subset(rimov, groups==i)) + ggplot2::labs(title=paste("Group ", i, sep=""), y="Temperature")) p3 if(require("patchwork", quietly=TRUE)) p1 + p2 + p3 # See example analysis in ?graph.fanova
Road crashes
data("roadcrash")
data("roadcrash")
A list
with
x = x-coordinates of road accidents
y = y-coordinates of road accidents
xrange = x coordinate limits of enclosing box (-774936.86,-727048.86)
yrange = y coordinate limits of enclosing box (-1201599.83,-1125679.83)
Vertices.x = x-coordinates of vertices of the linear network
Vertices.y = y-coordinates of vertices of the linear network
Edges = a 2 column matrix giving the ID (index) of the origin and destination vectices (in Vertices.x and Vertices.y)
Traffic = matrix of traffic volume
ForestDensity = matrix of forest density
BuildingDensity = matrix of building density
Mrkvička et al. (2023) worked with the database of road crashes reported to the Police in the Czech Republic from 1 January 2016 to 31 December 2020. The data available here is a subpattern of this full data set, included here with the permission of the Police in the Czech Republic. The full data is published as open data, see https://policyvisuals.eu/traffic-accidents-data-in-the-czech-republic/. The subpattern 7700 crashes lying on a linear network with 269 vertices and 354 lines. Further average traffic volume (number of vehicles per 24 hours), forest density and building density in the cell are available in the region of the linear network.
Mrkvička, T., Kraft, S., Blažek, V. and Myllymäki, M. (2023) Hotspots detection on a linear network with presence of covariates: a case study on road crash data.
if(require("spatstat.geom", quietly = TRUE) & require("spatstat.linnet", quietly = TRUE)) { data("roadcrash") win <- owin(xrange = roadcrash$xrange, yrange = roadcrash$yrange) X <- ppp(x = roadcrash$x, y = roadcrash$y, window = win) Vertices.pp <- ppp(x = roadcrash$Vertices.x, y = roadcrash$Vertices.y, window=win) L <- linnet(vertices=Vertices.pp, edges = roadcrash$Edges) PP <- lpp(X, L) z1 <- im(roadcrash$Traffic, xrange = roadcrash$xrange, yrange = roadcrash$yrange) z2 <- im(roadcrash$ForestDensity, xrange = roadcrash$xrange, yrange = roadcrash$yrange) z3 <- im(roadcrash$BuildingDensity, xrange = roadcrash$xrange, yrange = roadcrash$yrange) }
if(require("spatstat.geom", quietly = TRUE) & require("spatstat.linnet", quietly = TRUE)) { data("roadcrash") win <- owin(xrange = roadcrash$xrange, yrange = roadcrash$yrange) X <- ppp(x = roadcrash$x, y = roadcrash$y, window = win) Vertices.pp <- ppp(x = roadcrash$Vertices.x, y = roadcrash$Vertices.y, window=win) L <- linnet(vertices=Vertices.pp, edges = roadcrash$Edges) PP <- lpp(X, L) z1 <- im(roadcrash$Traffic, xrange = roadcrash$xrange, yrange = roadcrash$yrange) z2 <- im(roadcrash$ForestDensity, xrange = roadcrash$xrange, yrange = roadcrash$yrange) z3 <- im(roadcrash$BuildingDensity, xrange = roadcrash$xrange, yrange = roadcrash$yrange) }
Saplings data set
data("saplings")
data("saplings")
A data.frame
containing the locations (x- and y-coordinates) of 123 trees
in an area of 75 m x 75 m.
A pattern of small trees (height <= 15 m) originating from an uneven aged multi-species broadleaf nonmanaged forest in Kaluzhskie Zaseki, Russia.
The pattern is a sample part of data collected over 10 ha plot as a part of a research program headed by project leader Prof. O.V. Smirnova.
Grabarnik, P. and Chiu, S. N. (2002) Goodness-of-fit test for complete spatial randomness against mixtures of regular and clustered spatial point processes. Biometrika, 89, 411–421.
van Lieshout, M.-C. (2010) Spatial point process theory. In Handbook of Spatial Statistics (eds. A. E. Gelfand, P. J. Diggle, M. Fuentes and P. Guttorp), Handbooks of Modern Statistical Methods. Boca Raton: CRC Press.
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381-404. doi: 10.1111/rssb.12172
# This is an example analysis of the saplings data set #===================================================== # Example of Myllymaki et al. (2017, Supplement S4). if(require("spatstat.explore", quietly=TRUE)) { data("saplings") saplings <- as.ppp(saplings, W=square(75)) # First choose the r-distances for L (r) and J (rJ) functions, respectively. nr <- 500 rmin <- 0.3; rminJ <- 0.3 rmax <- 10; rmaxJ <- 6 rstep <- (rmax-rmin)/nr; rstepJ <- (rmaxJ-rminJ)/nr r <- seq(0, rmax, by=rstep) rJ <- seq(0, rmaxJ, by=rstepJ) #-- CSR test --# (a simple hypothesis) #--------------# # First, a CSR test using the L(r)-r function: # Note: CSR is simulated by fixing the number of points and generating nsim simulations # from the binomial process, i.e. we deal with a simple hypothesis. nsim <- 999 # Number of simulations env <- envelope(saplings, nsim=nsim, simulate=expression(runifpoint(ex=saplings)), # Simulate CSR fun="Lest", correction="translate", # T(r) = estimator of L with translational edge correction transform=expression(.-r), # Take the L(r)-r function instead of L(r) r=r, # Specify the distance vector savefuns=TRUE) # Save the estimated functions # Crop the curves to the interval of distances [rmin, rmax] # (at the same time create a curve_set from 'env') curve_set <- crop_curves(env, r_min=rmin, r_max=rmax) # Perform a global envelope test res <- global_envelope_test(curve_set, type="erl") # type="rank" and larger nsim was used in S4. # Plot the result. plot(res) + ggplot2::ylab(expression(italic(hat(L)(r)-r))) # -> The CSR hypothesis is clearly rejected and the rank envelope indicates clear # clustering of saplings. Next we explore the Matern cluster process as a null model. } if(require("spatstat.model", quietly=TRUE)) { #-- Testing the Matern cluster process --# (a composite hypothesis) #----------------------------------------# # Fit the Matern cluster process to the pattern (using minimum contrast estimation with the pair # correction function) fitted_model <- kppm(saplings~1, clusters="MatClust", statistic="pcf") summary(fitted_model) nsim <- 19 # 19 just for experimenting with the code!! #nsim <- 499 # 499 is ok for type = 'qdir' (takes > 1 h) # Make the adjusted directional quantile global envelope test using the L(r)-r function # (For the rank envelope test, choose type = "rank" instead and increase nsim.) adjenvL <- GET.composite(X=fitted_model, fun="Lest", correction="translate", transform=expression(.-r), r=r, type="qdir", nsim=nsim, nsimsub=nsim, r_min=rmin, r_max=rmax) # Plot the test result plot(adjenvL) + ggplot2::ylab(expression(italic(L(r)-r))) # From the test with the L(r)-r function, it appears that the Matern cluster model would be # a reasonable model for the saplings pattern. # To further explore the goodness-of-fit of the Matern cluster process, test the # model with the J function: # This takes quite some time if nsim is reasonably large. adjenvJ <- GET.composite(X=fitted_model, fun="Jest", correction="none", r=rJ, type="qdir", nsim=nsim, nsimsub=nsim, r_min=rminJ, r_max=rmaxJ) # Plot the test result plot(adjenvJ) + ggplot2::ylab(expression(italic(J(r)))) # -> the Matern cluster process not adequate for the saplings data # Test with the two test functions jointly adjenvLJ <- GET.composite(X=fitted_model, testfuns=list(L=list(fun="Lest", correction="translate", transform=expression(.-r), r=r), J=list(fun="Jest", correction="none", r=rJ)), type="erl", nsim=nsim, nsimsub=nsim, r_min=c(rmin, rminJ), r_max=c(rmax, rmaxJ), save.cons.envelope=TRUE) plot(adjenvLJ) }
# This is an example analysis of the saplings data set #===================================================== # Example of Myllymaki et al. (2017, Supplement S4). if(require("spatstat.explore", quietly=TRUE)) { data("saplings") saplings <- as.ppp(saplings, W=square(75)) # First choose the r-distances for L (r) and J (rJ) functions, respectively. nr <- 500 rmin <- 0.3; rminJ <- 0.3 rmax <- 10; rmaxJ <- 6 rstep <- (rmax-rmin)/nr; rstepJ <- (rmaxJ-rminJ)/nr r <- seq(0, rmax, by=rstep) rJ <- seq(0, rmaxJ, by=rstepJ) #-- CSR test --# (a simple hypothesis) #--------------# # First, a CSR test using the L(r)-r function: # Note: CSR is simulated by fixing the number of points and generating nsim simulations # from the binomial process, i.e. we deal with a simple hypothesis. nsim <- 999 # Number of simulations env <- envelope(saplings, nsim=nsim, simulate=expression(runifpoint(ex=saplings)), # Simulate CSR fun="Lest", correction="translate", # T(r) = estimator of L with translational edge correction transform=expression(.-r), # Take the L(r)-r function instead of L(r) r=r, # Specify the distance vector savefuns=TRUE) # Save the estimated functions # Crop the curves to the interval of distances [rmin, rmax] # (at the same time create a curve_set from 'env') curve_set <- crop_curves(env, r_min=rmin, r_max=rmax) # Perform a global envelope test res <- global_envelope_test(curve_set, type="erl") # type="rank" and larger nsim was used in S4. # Plot the result. plot(res) + ggplot2::ylab(expression(italic(hat(L)(r)-r))) # -> The CSR hypothesis is clearly rejected and the rank envelope indicates clear # clustering of saplings. Next we explore the Matern cluster process as a null model. } if(require("spatstat.model", quietly=TRUE)) { #-- Testing the Matern cluster process --# (a composite hypothesis) #----------------------------------------# # Fit the Matern cluster process to the pattern (using minimum contrast estimation with the pair # correction function) fitted_model <- kppm(saplings~1, clusters="MatClust", statistic="pcf") summary(fitted_model) nsim <- 19 # 19 just for experimenting with the code!! #nsim <- 499 # 499 is ok for type = 'qdir' (takes > 1 h) # Make the adjusted directional quantile global envelope test using the L(r)-r function # (For the rank envelope test, choose type = "rank" instead and increase nsim.) adjenvL <- GET.composite(X=fitted_model, fun="Lest", correction="translate", transform=expression(.-r), r=r, type="qdir", nsim=nsim, nsimsub=nsim, r_min=rmin, r_max=rmax) # Plot the test result plot(adjenvL) + ggplot2::ylab(expression(italic(L(r)-r))) # From the test with the L(r)-r function, it appears that the Matern cluster model would be # a reasonable model for the saplings pattern. # To further explore the goodness-of-fit of the Matern cluster process, test the # model with the J function: # This takes quite some time if nsim is reasonably large. adjenvJ <- GET.composite(X=fitted_model, fun="Jest", correction="none", r=rJ, type="qdir", nsim=nsim, nsimsub=nsim, r_min=rminJ, r_max=rmaxJ) # Plot the test result plot(adjenvJ) + ggplot2::ylab(expression(italic(J(r)))) # -> the Matern cluster process not adequate for the saplings data # Test with the two test functions jointly adjenvLJ <- GET.composite(X=fitted_model, testfuns=list(L=list(fun="Lest", correction="translate", transform=expression(.-r), r=r), J=list(fun="Jest", correction="none", r=rJ)), type="erl", nsim=nsim, nsimsub=nsim, r_min=c(rmin, rminJ), r_max=c(rmax, rmaxJ), save.cons.envelope=TRUE) plot(adjenvLJ) }
ggproto
objects for central region plot. Not to be used directly.
StatCentralRegion GeomCentralRegion GeomCentralRegionMulti
StatCentralRegion GeomCentralRegion GeomCentralRegionMulti
An object of class StatCentralRegion
(inherits from Stat
, ggproto
, gg
) of length 4.
An object of class GeomCentralRegion
(inherits from Geom
, ggproto
, gg
) of length 4.
An object of class GeomCentralRegionMulti
(inherits from Geom
, ggproto
, gg
) of length 4.
Return subsets of curve sets which meet conditions.
## S3 method for class 'curve_set' subset(x, subset, ...)
## S3 method for class 'curve_set' subset(x, subset, ...)
x |
A |
subset |
A logical expression indicating curves to keep. |
... |
Ignored. |
if(require("fda.usc", quietly=TRUE)) { # Prepare data data("poblenou") Free <- poblenou$df$day.festive == 1 | as.integer(poblenou$df$day.week) >= 6 MonThu <- poblenou$df$day.festive == 0 & poblenou$df$day.week %in% 1:4 Friday <- poblenou$df$day.festive == 0 & poblenou$df$day.week == 5 # Data as a curve_set cset <- curve_set(r=0:23, obs=t(poblenou[['nox']][['data']])) plot(subset(cset, MonThu)) plot(subset(cset, Friday)) plot(subset(cset, Free)) }
if(require("fda.usc", quietly=TRUE)) { # Prepare data data("poblenou") Free <- poblenou$df$day.festive == 1 | as.integer(poblenou$df$day.week) >= 6 MonThu <- poblenou$df$day.festive == 0 & poblenou$df$day.week %in% 1:4 Friday <- poblenou$df$day.festive == 0 & poblenou$df$day.week == 5 # Data as a curve_set cset <- curve_set(r=0:23, obs=t(poblenou[['nox']][['data']])) plot(subset(cset, MonThu)) plot(subset(cset, Friday)) plot(subset(cset, Free)) }