Title: | Spatial Sampling Design and Analysis |
---|---|
Description: | A design-based approach to statistical inference, with a focus on spatial data. Spatially balanced samples are selected using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm can be applied to finite resources (point geometries) and infinite resources (linear / linestring and areal / polygon geometries) and flexibly accommodates a diverse set of sampling design features, including stratification, unequal inclusion probabilities, proportional (to size) inclusion probabilities, legacy (historical) sites, a minimum distance between sites, and two options for replacement sites (reverse hierarchical order and nearest neighbor). Data are analyzed using a wide range of analysis functions that perform categorical variable analysis, continuous variable analysis, attributable risk analysis, risk difference analysis, relative risk analysis, change analysis, and trend analysis. spsurvey can also be used to summarize objects, visualize objects, select samples that are not spatially balanced, select panel samples, measure the amount of spatial balance in a sample, adjust design weights, and more. For additional details, see Dumelle et al. (2023) <doi:10.18637/jss.v105.i03>. |
Authors: | Michael Dumelle [aut, cre] , Tom Kincaid [aut], Tony Olsen [aut], Marc Weber [aut], Don Stevens [ctb], Denis White [ctb] |
Maintainer: | Michael Dumelle <[email protected]> |
License: | GPL (>= 3) |
Version: | 5.5.1 |
Built: | 2024-10-28 05:36:48 UTC |
Source: | https://github.com/usepa/spsurvey |
spsurvey implements a design-based approach to statistical inference, with a focus on spatial data. Spatially balanced samples are selected using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm can be applied to finite resources (point geometries) and infinite resources (linear / linestring and areal / polygon geometries) and flexibly accommodates a diverse set of sampling design features, including stratification, unequal inclusion probabilities, proportional (to size) inclusion probabilities, legacy (historical) sites, a minimum distance between sites, and two options for replacement sites (reverse hierarchical order and nearest neighbor). Data are analyzed using a wide range of analysis functions that perform categorical variable analysis, continuous variable analysis, attributable risk analysis, risk difference analysis, relative risk analysis, change analysis, and trend analysis. spsurvey can also be used to summarize objects, visualize objects, select samples that are not spatially balanced, select panel samples, measure the amount of spatial balance in a sample, adjust design weights, and more. This R package has been reviewed in accordance with U.S. Environmental Protection Agency policy and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.
Maintainer: Michael Dumelle [email protected] (ORCID)
Authors:
Tom Kincaid [email protected]
Tony Olsen [email protected]
Marc Weber [email protected]
Other contributors:
Don Stevens [contributor]
Denis White [contributor]
Useful links:
Report bugs at https://github.com/USEPA/spsurvey/issues
Adjust initial survey design weights so that the
final weights sum to a desired frame size. Adjusted weights
proportionally scale the initial weights to sum to the desired frame size.
Separate adjustments are applied to each category specified in wgtcat
.
adjwgt(wgt, wgtcat = NULL, framesize, sites = NULL)
adjwgt(wgt, wgtcat = NULL, framesize, sites = NULL)
wgt |
Vector of initial weights for each site. These equal the reciprocal of the site's inclusion probability. |
wgtcat |
Vector containing each site's weight adjustment
category name. The default is |
framesize |
Vector containing the known size of the frame
for each category name in |
sites |
Vector indicating site use; |
Vector of adjusted weights, where the adjusted weight is set
to 0
for sites whose value in the sites argument was set to
FALSE
.
Tony Olsen [email protected]
wgt <- runif(50) wgtcat <- rep(c("A", "B"), c(30, 20)) framesize <- c(A = 15, B = 10) sites <- rep(rep(c(TRUE, FALSE), c(9, 1)), 5) adjwgt(wgt, wgtcat, framesize, sites)
wgt <- runif(50) wgtcat <- rep(c("A", "B"), c(30, 20)) framesize <- c(A = 15, B = 10) sites <- rep(rep(c(TRUE, FALSE), c(9, 1)), 5) adjwgt(wgt, wgtcat, framesize, sites)
Adjust weights for target sample units that do not respond and are missing at random within categories. The missing at random assumption implies that their sample weight may be assigned to specific categories of units that have responded (i.e., have been sampled). This is a class-based method for non-response adjustment.
adjwgtNR(wgt, MARClass, EvalStatus, TNRClass, TRClass)
adjwgtNR(wgt, MARClass, EvalStatus, TNRClass, TRClass)
wgt |
vector of weights for each sample unit that will be adjusted for non-response. Weights must be weights for the design as implemented. All weights must be greater than zero. |
MARClass |
vector that identifies for each sample unit the category that will be used in non-response weight adjustment for sample units that are known to be target. Within each missing at random (MAR) category, the missing sample units that are not sampled are assumed to be missing at random. |
EvalStatus |
vector of the evaluation status for each sample unit. Values must include the values given in TNRclass and TRClass. May include other values not required for the non-response adjustment. |
TNRClass |
subset of values in EvalStatus that identify sample units whose target status is known and that do not respond (i.e., are not sampled). |
TRClass |
Subset of values in EvalStatus that identify sample units whose target status is known and that respond (i.e., are target and sampled). |
Vector of sample unit weights that are adjusted for non-response and that is the same length of input weights. Weights for sample units that did not response but were known to be eligible are set to zero. Weights for all other sample units are also set to zero.
Tony Olsen [email protected]
set.seed(5) wgt <- runif(40) MARClass <- rep(c("A", "B"), rep(20, 2)) EvalStatus <- sample(c("Not_Target", "Target_Sampled", "Target_Not_Sampled"), 40, replace = TRUE) TNRClass <- "Target_Not_Sampled" TRClass <- "Target_Sampled" adjwgtNR(wgt, MARClass, EvalStatus, TNRClass, TRClass) # function that has an error check
set.seed(5) wgt <- runif(40) MARClass <- rep(c("A", "B"), rep(20, 2)) EvalStatus <- sample(c("Not_Target", "Target_Sampled", "Target_Not_Sampled"), 40, replace = TRUE) TNRClass <- "Target_Not_Sampled" TRClass <- "Target_Sampled" adjwgtNR(wgt, MARClass, EvalStatus, TNRClass, TRClass) # function that has an error check
Calculate the average shifted histogram estimate of a density based on one-dimensional data from a survey design with weights.
ash1_wgt( x, wgt = rep(1, length(x)), m = 5, nbin = 50, ab = NULL, support = "Continuous" )
ash1_wgt( x, wgt = rep(1, length(x)), m = 5, nbin = 50, ab = NULL, support = "Continuous" )
x |
Vector used to estimate the density. |
wgt |
Vector of weights for each observation from a probability sample. The default assigns equal weights (equal probability). |
m |
Number of empty bins to add to the ends when the range is not
completely specified. The default is |
nbin |
Number of bins for density estimation. The default is |
ab |
Optional range for support associated with the density. Both
values may be equal to |
support |
Type of support. If equal to |
List containing the ASH density estimate. List consists of
tcen
x-coordinate for center of bin
f
y-coordinate for density estimate height
Tony Olsen [email protected]
Scott, D. W. (1985). "Averaged shifted histograms: effective nonparametric density estimators in several dimensions." The Annals of Statistics 13(3): 1024-1040.
x <- rnorm(100, 10, sqrt(10)) wgt <- runif(100, 10, 100) rslt <- ash1_wgt(x, wgt) plot(rslt)
x <- rnorm(100, 10, sqrt(10)) wgt <- runif(100, 10, 100) rslt <- ash1_wgt(x, wgt) plot(rslt)
This function organizes input and output for the analysis of attributable risk (for
categorical variables). The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
attrisk_analysis( dframe, vars_response, vars_stressor, response_levels = NULL, stressor_levels = NULL, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", conf = 95, All_Sites = FALSE )
attrisk_analysis( dframe, vars_response, vars_stressor, response_levels = NULL, stressor_levels = NULL, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", conf = 95, All_Sites = FALSE )
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_response |
Vector composed of character values that identify the
names of response variables in |
vars_stressor |
Vector composed of character values that identify the
names of stressor variables in |
response_levels |
List providing the category values (levels) for each
element in the |
stressor_levels |
List providing the category values (levels) for each
element in the |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing the name of the site ID variable in
|
weight |
Character value providing the name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing the name of the stratum ID
variable in |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing the name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing the name of the stage one size
weight variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and size of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
stressor variable
sample size
attributable risk estimate
attributable risk standard error (on the log scale)
attributable risk margin of error (on the log scale)
xx% (default 95%) lower confidence bound
xx% (default 95%) upper confidence bound
sum of design weights
number of observations in the poor response and poor stressor group
number of observations in the poor response and good stressor group
number of observations in the good response and poor stressor group
number of observations in the good response and good stressor group
weighted proportion of observations in the poor response and poor stressor group
weighted proportion of observations in the poor response and good stressor group
weighted proportion of observations in the good response and poor stressor group
weighted proportion of observations in the good response and good stressor group
Attributable risk measures the proportional reduction in the extent of poor condition of a response variable that presumably would result from eliminating a stressor variable, where the response and stressor variables are classified as either good (i.e., reference condition) or poor (i.e., different from reference condition). Attributable risk is defined as one minus the ratio of two probabilities. The numerator of the ratio is the conditional probability that the response variable is in poor condition given that the stressor variable is in good condition. The denominator of the ratio is the probability that the response variable is in poor condition. Attributable risk values close to zero indicate that removing the stressor variable will have little or no impact on the probability that the response variable is in poor condition. Attributable risk values close to one indicate that removing the stressor variable will result in extensive reduction of the probability that the response variable is in poor condition.
Tom Kincaid [email protected]
Sickle, J. V., & Paulsen, S. G. (2008). Assessing the attributable risks, relative risks, and regional extents of aquatic stressors. Journal of the North American Benthological Society, 27(4), 920-931.
relrisk_analysis
for relative risk analysis
diffrisk_analysis
for risk difference analysis
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE), RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE), StressVar = sample(c("Poor", "Good"), 100, replace = TRUE), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Agr", "Forest"), c(55, 45)) ) myresponse <- c("RespVar1", "RespVar2") mystressor <- c("StressVar") mysubpops <- c("All_Sites", "Resource_Class") attrisk_analysis(dframe, vars_response = myresponse, vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum" )
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE), RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE), StressVar = sample(c("Poor", "Good"), 100, replace = TRUE), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Agr", "Forest"), c(55, 45)) ) myresponse <- c("RespVar1", "RespVar2") mystressor <- c("StressVar") mysubpops <- c("All_Sites", "Resource_Class") attrisk_analysis(dframe, vars_response = myresponse, vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum" )
This function organizes input and output for the analysis of categorical variables. The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
cat_analysis( dframe, vars, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", jointprob = "overton", conf = 95, All_Sites = FALSE )
cat_analysis( dframe, vars, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", jointprob = "overton", conf = 95, All_Sites = FALSE )
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars |
Vector composed of character values that identify the
names of response variables in |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing name of the site ID variable in
the |
weight |
Character value providing name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
the |
ycoord |
Character value providing name of the y-coordinate variable in
the |
stratumID |
Character value providing name of the stratum ID variable in
the |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing name of the stage one size weight
variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
jointprob |
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and total of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
category of response variable
sample size
proportion estimate (in %)
standard error of proportion estimate
margin of error of proportion estimate
xx% (default 95%) lower confidence bound of proportion estimate
xx% (default 95%) upper confidence bound of proportion estimate
total estimate
standard error of total estimate
margin of error of total estimate
xx% (default 95%) lower confidence bound of total estimate
xx% (default 95%) upper confidence bound of total estimate
Tom Kincaid [email protected]
cont_analysis
for continuous variable analysis
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), CatVar = rep(c("north", "south", "east", "west"), 25), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("CatVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) cat_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize )
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), CatVar = rep(c("north", "south", "east", "west"), 25), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("CatVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) cat_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize )
This function creates a CDF plot. Input data for the plots is provided by a
data frame with the same structure as the "CDF" output from cont_analysis
.
Confidence limits for the CDF also are plotted.
cdf_plot( cdfest, var = NULL, subpop = NULL, subpop_level = NULL, units_cdf = "Percent", type_cdf = "Continuous", log = "", xlab = NULL, ylab = NULL, ylab_r = NULL, main = NULL, legloc = NULL, confcut = 0, conflev = 95, cex.main = 1.2, cex.legend = 1, ... )
cdf_plot( cdfest, var = NULL, subpop = NULL, subpop_level = NULL, units_cdf = "Percent", type_cdf = "Continuous", log = "", xlab = NULL, ylab = NULL, ylab_r = NULL, main = NULL, legloc = NULL, confcut = 0, conflev = 95, cex.main = 1.2, cex.legend = 1, ... )
cdfest |
Data frame with the same structure as the "CDF" output from
|
var |
If |
subpop |
If |
subpop_level |
If |
units_cdf |
Indicator for the label utilized for the left side y-axis and the values used for the left side y-axis tick marks, where "Percent" means the label and values are in terms of percent of the population, and "Units" means the label and values are in terms of units (count, length, or area) of the population. The default is "Percent". |
type_cdf |
Character string consisting of the value "Continuous" or "Ordinal" that controls the type of CDF plot. The default is "Continuous". |
log |
Character string consisting of the value "" or "x" that controls whether the x axis uses the original scale ("") or the base 10 logarithmic scale ("x"). The default is "". |
xlab |
Character string providing the x-axis label. If this argument equals NULL, then the indicator name is used as the label. The default is NULL. |
ylab |
Character string providing the left side y-axis label. If argument units_cdf equals "Units", a value should be provided for this argument. Otherwise, the label will be "Percent". The default is "Percent". |
ylab_r |
Character string providing the label for the right side y-axis (and, hence, determining the values used for the right side y-axis tick marks), where NULL means a right side y-axis is not created. If this argument equals "Same", the right side y-axis will have the same label and tick mark values as the left side y-axis. If this argument equals a character string other than "Same", the right side y-axis label will be the value provided for argument ylab_r, and the right side y-axis tick mark values will be determined by the choice not utilized for argument units_cdf, which means that the default value of argument units_cdf (i.e., "Percent") will result in the right side y-axis tick mark values being expressed in terms of units of the population (i.e., count, length, or area). The default is NULL. |
main |
Character string providing the plot title. The default is NULL. |
legloc |
Indicator for location of the plot legend, where "BR" means bottom right, "BL" means bottom left, "TR" means top right, "TL" means top left, and NULL means no legend. The default is NULL. |
confcut |
Numeric value that controls plotting confidence limits at the CDF extremes. Confidence limits for CDF values (percent scale) less than confcut or greater than 100 minus confcut are not plotted. A value of zero means confidence limits are plotted for the complete range of the CDF. The default is 0. |
conflev |
Numeric value of the confidence level used for confidence limits. The default is 95. |
cex.main |
Expansion factor for the plot title. The default is 1.2. |
cex.legend |
Expansion factor for the legend title. The default is 1. |
... |
Additional arguments passed to the |
A plot of a variable's CDF estimates associated confidence limits.
Tom Kincaid [email protected]
cont_cdfplot
for creating a PDF file containing CDF plots
cont_cdftest
for CDF hypothesis testing
## Not run: dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), ContVar = rnorm(100, 10, 1), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("ContVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) myanalysis <- cont_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize ) keep <- with(myanalysis$CDF, Type == "Resource_Class" & Subpopulation == "Good") par(mfrow = c(2, 1)) cdf_plot(myanalysis$CDF[keep, ], xlab = "ContVar", ylab = "Percent of Stream Length", ylab_r = "Stream Length (km)", main = "Estimates for Resource Class: Good" ) cdf_plot(myanalysis$CDF[keep, ], xlab = "ContVar", ylab = "Percent of Stream Length", ylab_r = "Same", main = "Estimates for Resource Class: Good" ) ## End(Not run)
## Not run: dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), ContVar = rnorm(100, 10, 1), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("ContVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) myanalysis <- cont_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize ) keep <- with(myanalysis$CDF, Type == "Resource_Class" & Subpopulation == "Good") par(mfrow = c(2, 1)) cdf_plot(myanalysis$CDF[keep, ], xlab = "ContVar", ylab = "Percent of Stream Length", ylab_r = "Stream Length (km)", main = "Estimates for Resource Class: Good" ) cdf_plot(myanalysis$CDF[keep, ], xlab = "ContVar", ylab = "Percent of Stream Length", ylab_r = "Same", main = "Estimates for Resource Class: Good" ) ## End(Not run)
This function organizes input and output for the estimation of change between two
samples (for categorical and continuous variables). The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
change_analysis( dframe, vars_cat = NULL, vars_cont = NULL, test = "mean", subpops = NULL, surveyID = "surveyID", survey_names = NULL, siteID = "siteID", weight = "weight", revisitwgt = FALSE, xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", jointprob = "overton", conf = 95, All_Sites = FALSE )
change_analysis( dframe, vars_cat = NULL, vars_cont = NULL, test = "mean", subpops = NULL, surveyID = "surveyID", survey_names = NULL, siteID = "siteID", weight = "weight", revisitwgt = FALSE, xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", jointprob = "overton", conf = 95, All_Sites = FALSE )
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_cat |
Vector composed of character values that identify the
names of categorical response variables in |
vars_cont |
Vector composed of character values that identify the
names of continuous response variables in |
test |
Character string or character vector providing the location
measure(s) to use for change estimation for continuous variables. The
choices are |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
surveyID |
Character value providing name of the survey ID variable in
|
survey_names |
Character vector of length two that provides the survey
names contained in the |
siteID |
Character value providing name of the site ID variable in
|
weight |
Character value providing name of the design weight
variable in |
revisitwgt |
Logical value that indicates whether each repeat visit
site has the same design weight in the two surveys, where
|
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing name of the stratum ID variable in
|
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing name of the stage one size weight
variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
jointprob |
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
List of change estimates composed of four items:
(1) catsum
contains change estimates for categorical variables,
(2) contsum_mean
contains estimates for continuous variables using
the mean, (3) contsum_total
contains estimates for continuous
variables using the total, and (4) contsum_median
contains estimates for continuous
variables using the median. The items in the list will contain NULL
for estimates that were not calculated. Each data frame includes estimates
for all combinations of population Types, subpopulations within types,
response variables, and categories within each response variable (for
categorical variables and continuous variables using the median). Change
estimates are provided plus standard error estimates and confidence
interval estimates.
The catsum
data frame contains the following variables:
first survey name
second survey name
subpopulation (domain) name
subpopulation name within a domain
response variable
category of response variable
proportion difference estimate (in %; second survey - first survey)
standard error of proportion difference estimate
margin of error of proportion difference estimate
xx% (default 95%) lower confidence bound of proportion difference estimate
xx% (default 95%) upper confidence bound of proportion difference estimate
total difference estimate (second survey - first survey)
standard error of total difference estimate
margin of error of total difference estimate
xx% (default 95%) lower confidence bound of total difference estimate
xx% (default 95%) upper confidence bound of total difference estimate
sample size in the first survey
proportion estimate (in %) from the first survey
standard error of proportion estimate from the first survey
margin of error of proportion estimate from the first survey
xx% (default 95%) lower confidence bound of proportion estimate from the first survey
xx% (default 95%) upper confidence bound of proportion estimate from the first survey
sample size in the second survey
total estimate from the first survey
standard error of total estimate from the first survey
margin of error of total estimate from the first survey
xx% (default 95%) lower confidence bound of total estimate from the first survey
xx% (default 95%) upper confidence bound of total estimate from the first survey
proportion estimate (in %) from the second survey
standard error of proportion estimate from the second survey
margin of error of proportion estimate from the second survey
xx% (default 95%) lower confidence bound of proportion estimate from the second survey
xx% (default 95%) upper confidence bound of proportion estimate from the second survey
total estimate from the second survey
standard error of total estimate from the second survey
margin of error of total estimate from the second survey
xx% (default 95%) lower confidence bound of total estimate from the second survey
xx% (default 95%) upper confidence bound of total estimate from the second survey
The contsum_mean
data frame contains the following variables:
first survey name
second survey name
subpopulation (domain) name
subpopulation name within a domain
response variable
value of percentile
sample size at or below Value
mean difference estimate
standard error of mean difference estimate
margin of error of mean difference estimate
xx% (default 95%) lower confidence bound of mean difference estimate
xx% (default 95%) upper confidence bound of mean difference estimate
sample size in the first survey
mean estimate from the first survey
standard error of mean estimate from the first survey
margin of error of mean estimate from the first survey
xx% (default 95%) lower confidence bound of mean estimate from the first survey
xx% (default 95%) upper confidence bound of mean estimate from the first survey
sample size in the second survey
mean estimate from the second survey
standard error of mean estimate from the second survey
margin of error of mean estimate from the second survey
xx% (default 95%) lower confidence bound of mean estimate from the second survey
xx% (default 95%) upper confidence bound of mean estimate from the second survey
The contsum_total
data frame contains the following variables:
first survey name
second survey name
subpopulation (domain) name
subpopulation name within a domain
response variable
value of percentile
sample size at or below Value
total difference estimate
standard error of total difference estimate
margin of error of total difference estimate
xx% (default 95%) lower confidence bound of total difference estimate
xx% (default 95%) upper confidence bound of total difference estimate
sample size in the first survey
total estimate from the first survey
standard error of total estimate from the first survey
margin of error of total estimate from the first survey
xx% (default 95%) lower confidence bound of total estimate from the first survey
xx% (default 95%) upper confidence bound of total estimate from the first survey
sample size in the second survey
total estimate from the second survey
standard error of total estimate from the second survey
margin of error of total estimate from the second survey
xx% (default 95%) lower confidence bound of total estimate from the second survey
xx% (default 95%) upper confidence bound of total estimate from the second survey
The contsum_median
data frame contains the following variables:
first survey name
second survey name
subpopulation (domain) name
subpopulation name within a domain
response variable
category of response variable
proportion above or below median difference estimate (in %; second survey - first survey)
standard error of proportion above or below median difference estimate
margin of error of proportion above or below median difference estimate
xx% (default 95%) lower confidence bound of proportion above or below median difference estimate
xx% (default 95%) upper confidence bound of proportion above or below median difference estimate
total above or below median difference estimate (second survey - first survey)
standard error of total above or below median difference estimate
margin of error of total above or below median difference estimate
xx% (default 95%) lower confidence bound of total above or below median difference estimate
xx% (default 95%) upper confidence bound of total above or below median difference estimate
sample size in the first survey
proportion above or below median estimate (in %) from the first survey
standard error of proportion above or below median estimate from the first survey
margin of error of proportion above or below median estimate from the first survey
xx% (default 95%) lower confidence bound of proportion above or below median estimate from the first survey
xx% (default 95%) upper confidence bound of proportion above or below median estimate from the first survey
sample size in the second survey
total above or below median estimate from the first survey
standard error of total above or below median estimate from the first survey
margin of error of total above or below median estimate from the first survey
xx% (default 95%) lower confidence bound of total above or below median estimate from the first survey
xx% (default 95%) upper confidence bound of total above or below median estimate from the first survey
proportion above or below median estimate (in %) from the second survey
standard error of proportion above or below median estimate from the second survey
margin of error of proportion above or below median estimate from the second survey
xx% (default 95%) lower confidence bound of proportion above or below median estimate from the second survey
xx% (default 95%) upper confidence bound of proportion above or below median estimate from the second survey
total above or below median estimate from the second survey
standard error of total above or below median estimate from the second survey
margin of error of total above or below median estimate from the second survey
xx% (default 95%) lower confidence bound of total above or below median estimate from the second survey
xx% (default 95%) upper confidence bound of total above or below median estimate from the second survey
Tom Kincaid [email protected]
trend_analysis
for trend analysis
# Categorical variable example for three resource classes dframe <- data.frame( surveyID = rep(c("Survey 1", "Survey 2"), c(100, 100)), siteID = paste0("Site", 1:200), wgt = runif(200, 10, 100), xcoord = runif(200), ycoord = runif(200), stratum = rep(rep(c("Stratum 1", "Stratum 2"), c(2, 2)), 50), CatVar = rep(c("North", "South"), 100), All_Sites = rep("All Sites", 200), Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE) ) myvars <- c("CatVar") mysubpops <- c("All_Sites", "Resource_Class") change_analysis(dframe, vars_cat = myvars, subpops = mysubpops, surveyID = "surveyID", siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum" )
# Categorical variable example for three resource classes dframe <- data.frame( surveyID = rep(c("Survey 1", "Survey 2"), c(100, 100)), siteID = paste0("Site", 1:200), wgt = runif(200, 10, 100), xcoord = runif(200), ycoord = runif(200), stratum = rep(rep(c("Stratum 1", "Stratum 2"), c(2, 2)), 50), CatVar = rep(c("North", "South"), 100), All_Sites = rep("All Sites", 200), Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE) ) myvars <- c("CatVar") mysubpops <- c("All_Sites", "Resource_Class") change_analysis(dframe, vars_cat = myvars, subpops = mysubpops, surveyID = "surveyID", siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum" )
This function organizes input and output for the analysis of continuous
variables. The analysis data, dframe
, can be either a data frame or a
simple features (sf
) object. If an sf
object is used,
coordinates are extracted from the geometry column in the object, arguments
xcoord
and ycoord
are assigned values "xcoord"
and
"ycoord"
, respectively, and the geometry column is dropped from the
object.
cont_analysis( dframe, vars, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", jointprob = "overton", conf = 95, pctval = c(5, 10, 25, 50, 75, 90, 95), statistics = c("CDF", "Pct", "Mean", "Total"), All_Sites = FALSE )
cont_analysis( dframe, vars, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", jointprob = "overton", conf = 95, pctval = c(5, 10, 25, 50, 75, 90, 95), statistics = c("CDF", "Pct", "Mean", "Total"), All_Sites = FALSE )
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars |
Vector composed of character values that identify the
names of response variables in |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing name of the site ID variable in
the |
weight |
Character value providing name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
the |
ycoord |
Character value providing name of the y-coordinate variable in
the |
stratumID |
Character value providing name of the stratum ID variable in
the |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing name of the stage one size weight
variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
jointprob |
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
pctval |
Vector of the set of values at which percentiles are
estimated. The default set is: |
statistics |
Character vector specifying desired estimates, where
|
All_Sites |
A logical variable used when |
The analysis results. A list composed of one, two, three, or four
data frames that contain population estimates for all combinations of
subpopulations, categories within each subpopulation, and response
variables, where the number of data frames is determined by argument
statistics
. The possible data frames in the output list are:
CDF
: a data frame containing CDF estimates
Pct
: data frame containing percentile estimates
Mean
: a data frame containing mean estimates
Total
: a data frame containing total estimates
The CDF
data frame contains the following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
value of response variable
sample size at or below Value
CDF proportion estimate (in %)
standard error of CDF proportion estimate
margin of error of CDF proportion estimate
xx% (default 95%) lower confidence bound of CDF proportion estimate
xx% (default 95%) upper confidence bound of CDF proportion estimate
CDF total estimate
standard error of CDF total estimate
margin of error of CDF total estimate
xx% (default 95%) lower confidence bound of CDF total estimate
xx% (default 95%) upper confidence bound of CDF total estimate
The Pct
data frame contains the following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
value of percentile
sample size at or below Value
percentile estimate
standard error of percentile estimate
margin of error of percentile estimate
xx% (default 95%) lower confidence bound of percentile estimate
xx% (default 95%) upper confidence bound of percentile estimate
The Mean
data frame contains the following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
sample size at or below Value
mean estimate
standard error of mean estimate
margin of error of mean estimate
xx% (default 95%) lower confidence bound of mean estimate
xx% (default 95%) upper confidence bound of mean estimate
The Total
data frame contains the following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
sample size at or below Value
total estimate
standard error of total estimate
margin of error of total estimate
xx% (default 95%) lower confidence bound of total estimate
xx% (default 95%) upper confidence bound of total estimate
Tom Kincaid [email protected]
cat_analysis
for categorical variable analysis
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), ContVar = rnorm(100, 10, 1), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("ContVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) cont_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize, statistics = "Mean" )
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), ContVar = rnorm(100, 10, 1), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("ContVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) cont_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize, statistics = "Mean" )
This function creates a PDF file containing CDF plots. Input data for the
plots is provided by a data frame with the same structure as the "CDF"
output from cont_analysis
. Plots are produced for every combination of Type
of
population, Subpopulation
within Type
, and Indicator
(every combination
of subpopulations, subpopulation levels, and variables).
cont_cdfplot( pdffile = "cdf2x2.pdf", cdfest, units_cdf = "Percent", ind_type = rep("Continuous", nind), log = rep("", nind), xlab = NULL, ylab = NULL, ylab_r = NULL, legloc = NULL, cdf_page = 4, width = 10, height = 8, confcut = 0, cex.main = 1.2, cex.legend = 1, ... )
cont_cdfplot( pdffile = "cdf2x2.pdf", cdfest, units_cdf = "Percent", ind_type = rep("Continuous", nind), log = rep("", nind), xlab = NULL, ylab = NULL, ylab_r = NULL, legloc = NULL, cdf_page = 4, width = 10, height = 8, confcut = 0, cex.main = 1.2, cex.legend = 1, ... )
pdffile |
Name of the PDF file. The default is "cdf2x2.pdf". |
cdfest |
Data frame with the same structure as the "CDF"
output from |
units_cdf |
Indicator for the label utilized for the left side y-axis and the values used for the left side y-axis tick marks, where "Percent" means the label and values are in terms of percent of the population, and "Units" means the label and values are in terms of units (count, length, or area) of the population. The default is "Percent". |
ind_type |
Character vector consisting of the values "Continuous" or "Ordinal" that controls the type of CDF plot for each indicator. The default is "Continuous" for every indicator. |
log |
Character vector consisting of the values "" or "x" that controls whether the x axis uses the original scale ("") or the base 10 logarithmic scale ("x") for each indicator. The default is "" for every indicator. |
xlab |
Character vector consisting of the x-axis label for each indicator. If this argument equals NULL, then indicator names are used as the labels. The default is NULL. |
ylab |
Character string providing the left side y-axis label. If argument units_cdf equals "Units", a value should be provided for this argument. Otherwise, the label will be "Percent". The default is "Percent". |
ylab_r |
Character string providing the label for the right side y-axis (and, hence, determining the values used for the right side y-axis tick marks), where NULL means a right side y-axis is not created. If this argument equals "Same", the right side y-axis will have the same label and tick mark values as the left side y-axis. If this argument equals a character string other than "Same", the right side y-axis label will be the value provided for argument ylab_r, and the right side y-axis tick mark values will be determined by the choice not utilized for argument units_cdf, which means that the default value of argument units_cdf (i.e., "Percent") will result in the right side y-axis tick mark values being expressed in terms of units of the population (i.e., count, length, or area). The default is NULL. |
legloc |
Indicator for location of the plot legend, where "BR" means bottom right, "BL" means bottom left, "TR" means top right, "TL" means top left, and NULL means no legend. The default is NULL. |
cdf_page |
Number of CDF plots on each page, which must be chosen from the values: 1, 2, 4, or 6. The default is 4. |
width |
Width of the graphic region in inches. The default is 10. |
height |
Height of the graphic region in inches. The default is 8. |
confcut |
Numeric value that controls plotting confidence limits at the CDF extremes. Confidence limits for CDF values (percent scale) less than confcut or greater than 100 minus confcut are not plotted. A value of zero means confidence limits are plotted for the complete range of the CDF. The default is 0. |
cex.main |
Expansion factor for the plot title. The default is 1.2. |
cex.legend |
Expansion factor for the legend title. The default is 1. |
... |
Additional arguments passed to the |
A PDF file containing the CDF plots.
Tom Kincaid [email protected]
cdf_plot
for plotting a cumulative distribution function (CDF)
cont_cdftest
for CDF hypothesis testing
## Not run: dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), ContVar = rnorm(100, 10, 1), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("ContVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) myanalysis <- cont_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize ) cont_cdfplot("myanalysis.pdf", myanalysis$CDF, ylab_r = "Stream Length (km)") ## End(Not run)
## Not run: dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), ContVar = rnorm(100, 10, 1), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("ContVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) myanalysis <- cont_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize ) cont_cdfplot("myanalysis.pdf", myanalysis$CDF, ylab_r = "Stream Length (km)") ## End(Not run)
This function organizes input and output for conducting inference regarding cumulative distribution functions (CDFs) generated by a probability survey. For every response variable and every subpopulation (domain) variable, differences between CDFs are tested for every pair of subpopulations within the domain. Data input to the function can be either a single survey or multiple surveys (two or more). If the data contain multiple surveys, then the domain variables will reference those surveys and (potentially) subpopulations within those surveys. The inferential procedures divide the CDFs into a discrete set of intervals (classes) and then utilize procedures that have been developed for analysis of categorical data from probability surveys. Choices for inference are the Wald, adjusted Wald, Rao-Scott first order corrected (mean eigenvalue corrected), and Rao-Scott second order corrected (Satterthwaite corrected) test statistics. The default test statistic is the adjusted Wald statistic. The input data argument can be either a data frame or a simple features (sf) object. If an sf object is used, coordinates are extracted from the geometry column in the object, arguments xcoord and ycoord are assigned values "xcoord" and "ycoord", respectively, and the geometry column is dropped from the object.
cont_cdftest( dframe, vars, subpops = NULL, surveyID = NULL, siteID = "siteID", weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", jointprob = "overton", testname = "adjWald", nclass = 3 )
cont_cdftest( dframe, vars, subpops = NULL, surveyID = NULL, siteID = "siteID", weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", jointprob = "overton", testname = "adjWald", nclass = 3 )
dframe |
Data frame containing survey design variables, response variables, and subpopulation (domain) variables. |
vars |
Vector composed of character values that identify the
names of response variables in the |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in the |
surveyID |
Character value providing name of the survey ID variable in
the |
siteID |
Character value providing name of the site ID variable in
the |
weight |
Character value providing name of the survey design weight
variable in the |
xcoord |
Character value providing name of the x-coordinate variable in
the |
ycoord |
Character value providing name of the y-coordinate variable in
the |
stratumID |
Character value providing name of the stratum ID variable in
the |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in the |
weight1 |
Character value providing name of the stage one weight
variable in the |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in the |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in the |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in the |
sweight1 |
Character value providing name of the stage one size weight
variable in the |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
jointprob |
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
testname |
Name of the test statistic to be reported in the output
data frame. Choices for the name are: |
nclass |
Number of classes into which the CDFs will be divided
(binned), which must equal at least |
Data frame of CDF test results for all pairs of subpopulations
within each population type for every response variable. The data frame
includes the test statistic specified by argument testname
plus its
degrees of freedom and p-value.
Tom Kincaid [email protected]
cdf_plot
for visualizing CDF plots
cont_cdfplot
for making CDF plots output to pdfs
n <- 200 mysiteID <- paste("Site", 1:n, sep = "") dframe <- data.frame( siteID = mysiteID, wgt = runif(n, 10, 100), xcoord = runif(n), ycoord = runif(n), stratum = rep(c("Stratum1", "Stratum2"), n / 2), Resource_Class = sample(c("Agr", "Forest", "Urban"), n, replace = TRUE) ) ContVar <- numeric(n) tst <- dframe$Resource_Class == "Agr" ContVar[tst] <- rnorm(sum(tst), 10, 1) tst <- dframe$Resource_Class == "Forest" ContVar[tst] <- rnorm(sum(tst), 10.1, 1) tst <- dframe$Resource_Class == "Urban" ContVar[tst] <- rnorm(sum(tst), 10.5, 1) dframe$ContVar <- ContVar myvars <- c("ContVar") mysubpops <- c("Resource_Class") mypopsize <- data.frame( Resource_Class = rep(c("Agr", "Forest", "Urban"), rep(2, 3)), stratum = rep(c("Stratum1", "Stratum2"), 3), Total = c(2500, 1500, 1000, 500, 600, 450) ) cont_cdftest(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize, testname = "RaoScott_First" )
n <- 200 mysiteID <- paste("Site", 1:n, sep = "") dframe <- data.frame( siteID = mysiteID, wgt = runif(n, 10, 100), xcoord = runif(n), ycoord = runif(n), stratum = rep(c("Stratum1", "Stratum2"), n / 2), Resource_Class = sample(c("Agr", "Forest", "Urban"), n, replace = TRUE) ) ContVar <- numeric(n) tst <- dframe$Resource_Class == "Agr" ContVar[tst] <- rnorm(sum(tst), 10, 1) tst <- dframe$Resource_Class == "Forest" ContVar[tst] <- rnorm(sum(tst), 10.1, 1) tst <- dframe$Resource_Class == "Urban" ContVar[tst] <- rnorm(sum(tst), 10.5, 1) dframe$ContVar <- ContVar myvars <- c("ContVar") mysubpops <- c("Resource_Class") mypopsize <- data.frame( Resource_Class = rep(c("Agr", "Forest", "Urban"), rep(2, 3)), stratum = rep(c("Stratum1", "Stratum2"), 3), Total = c(2500, 1500, 1000, 500, 600, 450) ) cont_cdftest(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize, testname = "RaoScott_First" )
Covariance structure accounts for the panel design and the four variance components: unit variation, period variation, unit by period interaction variation and index (or residual) variation. The model incorporates unit, period, unit by period, and index variance components. It also includes a provision for unit correlation and period autocorrelation.
cov_panel_dsgn( paneldsgn = matrix(50, 1, 10), nrepeats = 1, unit_var = NULL, period_var = NULL, unitperiod_var = NULL, index_var = NULL, unit_rho = 1, period_rho = 0 )
cov_panel_dsgn( paneldsgn = matrix(50, 1, 10), nrepeats = 1, unit_var = NULL, period_var = NULL, unitperiod_var = NULL, index_var = NULL, unit_rho = 1, period_rho = 0 )
paneldsgn |
A matrix (dimensions: number of panels (rows) by number of periods (columns)) containing the number of units visited for each combination of panel and period. Default is matrix(50, 1, 10) which is a single panel of 50 units visited 10 times, typical time is a period. |
nrepeats |
Either |
unit_var |
The variance component estimate for unit. The default is
|
period_var |
The variance component estimate for period The default is
|
unitperiod_var |
The variance component estimate for unit by period
interaction. The default is |
index_var |
The variance component estimate for index error. The
default is |
unit_rho |
Unit correlation across periods. The default is |
period_rho |
Period autocorrelation. The default is |
Covariance structure accounts for the panel design and the four variance components: unit variation, period variation, unit by period interaction variation and index (or residual) variation. Uses the model structure defined by Urquhart 2012.
If nrepeats
is NULL
, then no units sampled more than once in a specific
panel, period combination) and then unit by period and index variances are
added together or user may have only estimated unit, period and unit by
period variance components so that index component is zero. It calculates
the covariance matrix for the simple linear regression. The standard error
for a linear trend coefficient is the square root of the variance.
A list containing the covariance matrix (cov
) for the panel design,
the input panel design (paneldsgn
), the input nrepeats
design
(nrepeats.dsgn
) and the function call.
Tony Olsen [email protected]
Urquhart, N. S., W. S. Overton, et al. (1993) Comparing sampling designs for monitoring ecological status and trends: impact of temporal patterns. In: Statistics for the Environment. V. Barnett and K. F. Turkman. John Wiley & Sons, New York, pp. 71-86.
Urquhart, N. S. and T. M. Kincaid (1999). Designs for detecting trends from repeated surveys of ecological resources. Journal of Agricultural, Biological, and Environmental Statistics, 4(4), 404-414.
Urquhart, N. S. (2012). The role of monitoring design in detecting trend in long-term ecological monitoring studies. In: Design and Analysis of Long-term Ecological Monitoring Studies. R. A. Gitzen, J. J. Millspaugh, A. B. Cooper, and D. S. Licht (eds.). Cambridge University Press, New York, pp. 151-173.
power_dsgn
for power calculations of multiple panel designs
This function organizes input and output for risk difference analysis (of
categorical variables). The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
diffrisk_analysis( dframe, vars_response, vars_stressor, response_levels = NULL, stressor_levels = NULL, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", conf = 95, All_Sites = FALSE )
diffrisk_analysis( dframe, vars_response, vars_stressor, response_levels = NULL, stressor_levels = NULL, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", conf = 95, All_Sites = FALSE )
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_response |
Vector composed of character values that identify the
names of response variables in |
vars_stressor |
Vector composed of character values that identify the
names of stressor variables in |
response_levels |
List providing the category values (levels) for each
element in the |
stressor_levels |
List providing the category values (levels) for each
element in the |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing the name of the site ID variable in
|
weight |
Character value providing the name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing the name of the stratum ID
variable in |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing the name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing the name of the stage one size
weight variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and size of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
stressor variable
sample size
risk difference estimate
risk estimate for poor condition stressor
risk estimate for good condition stressor
risk difference standard error
risk difference margin of error
xx% (default 95%) lower confidence bound
xx% (default 95%) upper confidence bound
sum of design weights
number of observations in the poor response and poor stressor group
number of observations in the poor response and good stressor group
number of observations in the good response and poor stressor group
number of observations in the good response and good stressor group
weighted proportion of observations in the poor response and poor stressor group
weighted proportion of observations in the poor response and good stressor group
weighted proportion of observations in the good response and poor stressor group
weighted proportion of observations in the good response and good stressor group
Risk difference measures the absolute strength of association between conditional probabilities defined for a response variable and a stressor variable, where the response and stressor variables are classified as either good (i.e., reference condition) or poor (i.e., different from reference condition). Risk difference is defined as the difference between two conditional probabilities: the probability that the response variable is in poor condition given that the stressor variable is in poor condition and the probability that the response variable is in poor condition given that the stressor variable is in good condition. Risk difference values close to zero indicate that the stressor variable has little or no impact on the probability that the response variable is in poor condition. Risk difference values much greater than zero indicate that the stressor variable has a significant impact on the probability that the response variable is in poor condition.
Tom Kincaid [email protected]
attrisk_analysis
for attributable risk analysis
relrisk_analysis
for relative risk analysis
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE), RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE), StressVar = sample(c("Poor", "Good"), 100, replace = TRUE), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Agr", "Forest"), c(55, 45)) ) myresponse <- c("RespVar1", "RespVar2") mystressor <- c("StressVar") mysubpops <- c("All_Sites", "Resource_Class") diffrisk_analysis(dframe, vars_response = myresponse, vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum" )
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE), RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE), StressVar = sample(c("Poor", "Good"), 100, replace = TRUE), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Agr", "Forest"), c(55, 45)) ) myresponse <- c("RespVar1", "RespVar2") mystressor <- c("StressVar") mysubpops <- c("All_Sites", "Resource_Class") diffrisk_analysis(dframe, vars_response = myresponse, vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum" )
This function prints the error messages vector in the analysis functions.
errorprnt(error_vec = get("error_vec", envir = .GlobalEnv))
errorprnt(error_vec = get("error_vec", envir = .GlobalEnv))
error_vec |
Data frame that contains error messages. The default is
|
Printed errors.
Tom Kincaid [email protected]
Select a spatially balanced sample from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites. For technical details, see Stevens and Olsen (2004).
grts( sframe, n_base, stratum_var = NULL, seltype = NULL, caty_var = NULL, caty_n = NULL, aux_var = NULL, legacy_var = NULL, legacy_sites = NULL, legacy_stratum_var = NULL, legacy_caty_var = NULL, legacy_aux_var = NULL, mindis = NULL, maxtry = 10, n_over = NULL, n_near = NULL, wgt_units = NULL, pt_density = NULL, DesignID = "Site", SiteBegin = 1, sep = "-", projcrs_check = TRUE )
grts( sframe, n_base, stratum_var = NULL, seltype = NULL, caty_var = NULL, caty_n = NULL, aux_var = NULL, legacy_var = NULL, legacy_sites = NULL, legacy_stratum_var = NULL, legacy_caty_var = NULL, legacy_aux_var = NULL, mindis = NULL, maxtry = 10, n_over = NULL, n_near = NULL, wgt_units = NULL, pt_density = NULL, DesignID = "Site", SiteBegin = 1, sep = "-", projcrs_check = TRUE )
sframe |
A sampling frame as an |
n_base |
The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by |
stratum_var |
A character string containing the name of the column from
|
seltype |
A character string or vector indicating the inclusion probability type,
which must be one of following: |
caty_var |
A character string containing the name of the column from
|
caty_n |
A character vector indicating the expected sample size for each
level of |
aux_var |
A character string containing the name of the column from
|
legacy_var |
This argument can be used instead of |
legacy_sites |
An sf object with a |
legacy_stratum_var |
A character string containing the name of the column from
|
legacy_caty_var |
A character string containing the name of the column from
|
legacy_aux_var |
A character string containing the name of the column from
|
mindis |
A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and |
maxtry |
The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are |
n_over |
The number of reverse hierarchically ordered (rho) replacement sites.
If the sampling design is unstratified, then
|
n_near |
The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified, |
wgt_units |
The units used to compute the design weights. These
units must be standard units as defined by the |
pt_density |
A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points. |
DesignID |
A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with |
SiteBegin |
A character string indicating the first number to use to match
with |
sep |
A character string that acts as a separator between
|
projcrs_check |
A check for whether the coordinates are projected. If |
n_base
is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, n_base
is typically the
number of sites in all panels that will be sampled in the same temporal period –
n_base
is not the total number of sites in all panels. The sum of n_base
and
n_over
is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
sites_legacy
An sf object containing legacy sites. This is
NULL
if legacy sites were not included in the sample.
sites_base
An sf object containing the base sites. This is NULL
if n_base
equals the number of legacy sites.
sites_over
An sf object containing the reverse hierarchically
ordered replacement sites. This is NULL
if no reverse hierarchically
ordered replacement sites were included in the sample.
sites_near
An sf object containing the nearest neighbor
replacement sites. This is NULL
if no nearest neighbor replacement
sites were included in the sample.
design
A list documenting the specifications of this sampling design.
This can be checked to verify your sampling design ran as intended.
call
The original function call.
stratum_var
The name of the stratification variable in sframe
.
This equals NULL
if no stratification is used.
stratum
The unique strata. This equals "None"
if
the sampling design is unstratified.
n_base
The base sample size per stratum.
seltype
The selection type per stratum.
caty_var
The name of the unequal probability variable in sframe
.
This equals NULL
if no unequal probability variable is used.
caty_n
The expected sample sizes for each level of the
unequal probability grouping variable per stratum. This equals
NULL
when seltype
is not "unequal"
.
aux_var
The name of the proportional probability (auxiliary) variable in sframe
.
This equals NULL
if no proportional probability variable is used.
legacy
A logical variable indicating whether legacy sites
were included in the sample.
legacy_stratum_var
The name of the stratification variable in legacy_sites
.
Omitted if legacy sites are not used. This equals NULL
if legacy sites were used but
no stratification variable is used.
legacy_caty_var
The name of the unequal probability variable in legacy_sites
.
Omitted if legacy sites are not used. This equals NULL
if legacy sites were used but
no unequal probability variable is used.
legacy_aux_var
The name of the proportional probability (auxiliary)
variable in legacy_sites
.
Omitted if legacy sites are not used. This equals NULL
if legacy sites
were used but no proportional probability variable is used.
mindis
The minimum distance requirement desired. This
is NULL
when no minimum distance requirement was applied.
n_over
The reverse hierarchically ordered replacement
site sample sizes per stratum. If seltype
is unequal
,
this represents the expected sample sizes. This is NULL
when no reverse hierarchically ordered replacement sites were selected.
n_near
The number of nearest neighbor replacement sites
desired. This is NULL
when no nearest neighbor replacement
sites were selected.
When non-NULL
, the sites_legacy
, sites_base
,
sites_over
, and sites_near
objects contain the original columns
in sframe
and include a few additional columns. These additional columns
are
siteID
A site identifier (as named using the DesignID
and SiteBegin
arguments to grts()
).
siteuse
Whether the site is a legacy site (Legacy
), base
site (Base
), reverse hierarchically ordered replacement site
(Over
), or nearest neighbor replacement site (Near
).
replsite
The replacement site ordering. replsite
is
None
if the site is not a replacement site, Next
if it is
the next reverse hierarchically ordered replacement site to use, or
Near_
, where the word following _
indicates the ordering of sites closest to
the originally sampled site.
lon_WGS84
Longitude coordinates using the WGS84 coordinate
system (EPSG:4326). Only given if coordinates are projected.
lat_WGS84
Latitude coordinates using the WGS84 coordinate
system (EPSG:4326). Only given if coordinates are projected.
X
Longitude coordinates using the provided coordinate
system. Only given if coordinates are not projected (i.e., they are geographic or NA).
Y
Latitude coordinates using the provided coordinate
system. Only given if coordinates are not projected (i.e., they are geographic or NA).
stratum
A stratum indicator. stratum
is None
if the sampling design was unstratified. If the sampling design was stratified
,
stratum
indicates the stratum.
wgt
The design weight.
ip
The site's original inclusion probability (the reciprocal)
of (wgt
).
caty
An unequal probability grouping indicator. caty
is None
if the sampling design did not use unequal inclusion probabilities.
If the sampling design did use unequal inclusion probabilities, caty
indicates the unequal probability level.
aux
The auxiliary proportional probability variable. This
column is only returned if seltype
was proportional
in the
original sampling design.
If any columns in sframe
contain these names, those columns
from sframe
will be automatically prefixed with sframe_
in the sites
object. When output is printed, a summary of site counts by
the levels in stratum_var
and caty_var
is shown.
Tony Olsen [email protected]
Stevens Jr., Don L. and Olsen, Anthony R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465), 262-278.
irs
to select a sample that is not spatially balanced
## Not run: samp <- grts(NE_Lakes, n_base = 100) print(samp) strata_n <- c(low = 25, high = 30) samp_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT") print(samp_strat) samp_over <- grts(NE_Lakes, n_base = 30, n_over = 5) print(samp_over) ## End(Not run)
## Not run: samp <- grts(NE_Lakes, n_base = 100) print(samp) strata_n <- c(low = 25, high = 30) samp_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT") print(samp_strat) samp_over <- grts(NE_Lakes, n_base = 30, n_over = 5) print(samp_over) ## End(Not run)
An (sf
) MULTILINESTRING object of 244 segments of the
Illinois River in Arkansas and Oklahoma.
Illinois_River
Illinois_River
244 rows and 2 variables:
STATE_NAME
State name.
geometry
MULTILINESTRING geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
An (sf
) POINT object of legacy sites for the Illinois
River data.
Illinois_River_Legacy
Illinois_River_Legacy
5 rows and 2 variables:
STATE_NAME
State name.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
Select a sample that is not spatially balanced from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Independent Random Sampling (IRS) algorithm. The IRS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites.
irs( sframe, n_base, stratum_var = NULL, seltype = NULL, caty_var = NULL, caty_n = NULL, aux_var = NULL, legacy_var = NULL, legacy_sites = NULL, legacy_stratum_var = NULL, legacy_caty_var = NULL, legacy_aux_var = NULL, mindis = NULL, maxtry = 10, n_over = NULL, n_near = NULL, wgt_units = NULL, pt_density = NULL, DesignID = "Site", SiteBegin = 1, sep = "-", projcrs_check = TRUE )
irs( sframe, n_base, stratum_var = NULL, seltype = NULL, caty_var = NULL, caty_n = NULL, aux_var = NULL, legacy_var = NULL, legacy_sites = NULL, legacy_stratum_var = NULL, legacy_caty_var = NULL, legacy_aux_var = NULL, mindis = NULL, maxtry = 10, n_over = NULL, n_near = NULL, wgt_units = NULL, pt_density = NULL, DesignID = "Site", SiteBegin = 1, sep = "-", projcrs_check = TRUE )
sframe |
A sampling frame as an |
n_base |
The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by |
stratum_var |
A character string containing the name of the column from
|
seltype |
A character string or vector indicating the inclusion probability type,
which must be one of following: |
caty_var |
A character string containing the name of the column from
|
caty_n |
A character vector indicating the expected sample size for each
level of |
aux_var |
A character string containing the name of the column from
|
legacy_var |
This argument can be used instead of |
legacy_sites |
An sf object with a |
legacy_stratum_var |
A character string containing the name of the column from
|
legacy_caty_var |
A character string containing the name of the column from
|
legacy_aux_var |
A character string containing the name of the column from
|
mindis |
A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and |
maxtry |
The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are |
n_over |
The number of reverse hierarchically ordered (rho) replacement sites.
If the sampling design is unstratified, then
|
n_near |
The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified, |
wgt_units |
The units used to compute the design weights. These
units must be standard units as defined by the |
pt_density |
A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points. |
DesignID |
A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with |
SiteBegin |
A character string indicating the first number to use to match
with |
sep |
A character string that acts as a separator between
|
projcrs_check |
A check for whether the coordinates are projected. If |
n_base
is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, n_base
is typically the
number of sites in all panels that will be sampled in the same temporal period –
n_base
is not the total number of sites in all panels. The sum of n_base
and
n_over
is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
sites_legacy
An sf object containing legacy sites. This is
NULL
if legacy sites were not included in the sample.
sites_base
An sf object containing the base sites. This is NULL
if n_base
equals the number of legacy sites.
sites_over
An sf object containing the reverse hierarchically
ordered replacement sites. This is NULL
if no reverse hierarchically
ordered replacement sites were included in the sample.
sites_near
An sf object containing the nearest neighbor
replacement sites. This is NULL
if no nearest neighbor replacement
sites were included in the sample.
design
A list documenting the specifications of this sampling design.
This can be checked to verify your sampling design ran as intended.
call
The original function call.
stratum_var
The name of the stratification variable in sframe
.
This equals NULL
if no stratification is used.
stratum
The unique strata. This equals "None"
if
the sampling design is unstratified.
n_base
The base sample size per stratum.
seltype
The selection type per stratum.
caty_var
The name of the unequal probability variable in sframe
.
This equals NULL
if no unequal probability variable is used.
caty_n
The expected sample sizes for each level of the
unequal probability grouping variable per stratum. This equals
NULL
when seltype
is not "unequal"
.
aux_var
The name of the proportional probability (auxiliary) variable in sframe
.
This equals NULL
if no proportional probability variable is used.
legacy
A logical variable indicating whether legacy sites
were included in the sample.
legacy_stratum_var
The name of the stratification variable in legacy_sites
.
Omitted if legacy sites are not used. This equals NULL
if legacy sites were used but
no stratification variable is used.
legacy_caty_var
The name of the unequal probability variable in legacy_sites
.
Omitted if legacy sites are not used. This equals NULL
if legacy sites were used but
no unequal probability variable is used.
legacy_aux_var
The name of the proportional probability (auxiliary)
variable in legacy_sites
.
Omitted if legacy sites are not used. This equals NULL
if legacy sites
were used but no proportional probability variable is used.
mindis
The minimum distance requirement desired. This
is NULL
when no minimum distance requirement was applied.
n_over
The reverse hierarchically ordered replacement
site sample sizes per stratum. If seltype
is unequal
,
this represents the expected sample sizes. This is NULL
when no reverse hierarchically ordered replacement sites were selected.
n_near
The number of nearest neighbor replacement sites
desired. This is NULL
when no nearest neighbor replacement
sites were selected.
When non-NULL
, the sites_legacy
, sites_base
,
sites_over
, and sites_near
objects contain the original columns
in sframe
and include a few additional columns. These additional columns
are
siteID
A site identifier (as named using the DesignID
and SiteBegin
arguments to grts()
).
siteuse
Whether the site is a legacy site (Legacy
), base
site (Base
), reverse hierarchically ordered replacement site
(Over
), or nearest neighbor replacement site (Near
).
replsite
The replacement site ordering. replsite
is
None
if the site is not a replacement site, Next
if it is
the next reverse hierarchically ordered replacement site to use, or
Near_
, where the word following _
indicates the ordering of sites closest to
the originally sampled site.
lon_WGS84
Longitude coordinates using the WGS84 coordinate
system (EPSG:4326). Only given if coordinates are projected.
lat_WGS84
Latitude coordinates using the WGS84 coordinate
system (EPSG:4326). Only given if coordinates are projected.
X
Longitude coordinates using the provided coordinate
system. Only given if coordinates are not projected (i.e., they are geographic or NA).
Y
Latitude coordinates using the provided coordinate
system. Only given if coordinates are not projected (i.e., they are geographic or NA).
stratum
A stratum indicator. stratum
is None
if the sampling design was unstratified. If the sampling design was stratified
,
stratum
indicates the stratum.
wgt
The design weight.
ip
The site's original inclusion probability (the reciprocal)
of (wgt
).
caty
An unequal probability grouping indicator. caty
is None
if the sampling design did not use unequal inclusion probabilities.
If the sampling design did use unequal inclusion probabilities, caty
indicates the unequal probability level.
aux
The auxiliary proportional probability variable. This
column is only returned if seltype
was proportional
in the
original sampling design.
If any columns in sframe
contain these names, those columns
from sframe
will be automatically prefixed with sframe_
in the sites
object. When output is printed, a summary of site counts by
the levels in stratum_var
and caty_var
is shown.
Tony Olsen [email protected]
grts
to select a sample that is spatially balanced
## Not run: samp <- irs(NE_Lakes, n_base = 100) print(samp) strata_n <- c(low = 25, high = 30) samp_strat <- irs(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT") print(samp_strat) samp_over <- irs(NE_Lakes, n_base = 30, n_over = 5) print(samp_over) ## End(Not run)
## Not run: samp <- irs(NE_Lakes, n_base = 100) print(samp) strata_n <- c(low = 25, high = 30) samp_strat <- irs(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT") print(samp_strat) samp_over <- irs(NE_Lakes, n_base = 30, n_over = 5) print(samp_over) ## End(Not run)
An sf
MULTIPOLYGON object of 187 polygons consisting
of shore segments in Lake Ontario.
Lake_Ontario
Lake_Ontario
187 rows and 5 variables:
COUNTRY
Country.
RSRC_CLASS
Bay class.
PSTL_CODE
Postal code.
AREA_SQKM
Area in square kilometers
geometry
MULTIPOLYGON geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
This function calculates the variance-covariance matrix using the local mean estimator.
localmean_cov(zmat, weight_1st)
localmean_cov(zmat, weight_1st)
zmat |
Matrix of weighted response values or weighted residual values for the sample points. |
weight_1st |
List from the local mean weight function containing two
elements: a matrix named |
The local mean estimator of the variance-covariance matrix.
Tom Kincaid [email protected]
This function calculates the local mean variance estimator.
localmean_var(z, weight_1st)
localmean_var(z, weight_1st)
z |
Vector of weighted response values or weighted residual values for the sample points. |
weight_1st |
List from the local mean weight function containing two
elements: a matrix named |
The local mean estimator of the variance.
Tom Kincaid [email protected]
This function calculates the index values of neighboring points and associated weights required by the local mean variance estimator.
localmean_weight(x, y, prb, nbh = 4)
localmean_weight(x, y, prb, nbh = 4)
x |
Vector of x-coordinates for location of the sample points. |
y |
Vector of y-coordinates for location of the sample points. |
prb |
Vector of inclusion probabilities for the sample points. |
nbh |
Number of neighboring points to use in the calculations. |
If ginv fails to return valid output, a NULL object. Otherwise, a
list containing two elements: a matrix named ij
composed of the
index values of neighboring points and a vector named gwt
composed of weights.
Tom Kincaid [email protected]
An sf
POINT object of 195 lakes in the Northeastern
United States.
NE_Lakes
NE_Lakes
195 rows and 5 variables:
AREA
Lake area in hectares.
AREA_CAT
Lake area categories based on a hectare cutoff.
ELEV
Elevation in meters.
ELEV_CAT
Elevation categories based on a meter cutoff.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
An data frame of 195 lakes in the Northeastern United States.
NE_Lakes_df
NE_Lakes_df
195 rows and 6 variables:
AREA
Lake area in hectares.
AREA_CAT
Lake area categories based on a hectare cutoff.
ELEV
Elevation in meters.
ELEV_CAT
Elevation categories based on a meter cutoff.
XCOORD
x-coordinate using the WGS 84 coordinate reference system (EPSG: 4326)
YCOORD
y-coordinate using WGS 84 coordinate reference system (EPSG: 4326)
An sf
POINT object of 5 legacy sites for the NE Lakes data
NE_Lakes_Legacy
NE_Lakes_Legacy
5 rows and 5 variables:
AREA
Lake area in hectares.
AREA_CAT
Lake area categories based on a hectare cutoff.
ELEV
Elevation in meters.
ELEV_CAT
Elevation categories based on a meter cutoff.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
An sf
POINT object of 96 lakes in the Pacific Northwest Region of the United
States during the year 2017, from a subset of the Environmental
Protection Agency's "National Lakes Assessment."
NLA_PNW
NLA_PNW
96 rows and 9 variables:
SITE_ID
A unique lake identifier.
WEIGHT
The sampling design weight.
URBAN
Urban category.
STATE
State name.
BMMI
Benthic MMI value.
BMMI_COND
Benthic MMI condition categories.
PHOS_COND
Phosphorus condition categories.
NITR_COND
Nitrogen condition categories.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
An sf
POINT object of 353 stream segments in the Central
United States during the years 2008 and 2013, from a subset of the Environmental
Protection Agency's "National Rivers and Streams Assessment."
NRSA_EPA7
NRSA_EPA7
353 rows and 10 variables:
SITE_ID
A unique site identifier.
YEAR
Year of design cycle.
WEIGHT
Sampling design weights.
ECOREGION
Ecoregion.
STATE
State name.
BMMI
Benthic MMI value.
BMMI_COND
Benthic MMI categories.
PHOS_COND
Phosphorus condition categories.
NITR_COND
Nitrogen condition categories.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
Panel revisit design characteristics are summarized: number of panels, number of time periods, total number of sample events for the revisit design, total number of sample events for each panel, total number of sample events for each time period and cumulative number of unique units sampled by time periods.
pd_summary(object, visitdsgn = NULL, ...)
pd_summary(object, visitdsgn = NULL, ...)
object |
Two-dimensional array from |
visitdsgn |
Two-dimensional array with same dimensions as |
... |
Additional arguments (S3 consistency) |
The revisit panel design and the visit design (if present) are summarized. Summaries can be useful to know the effort required to complete the survey design. See the values returned for the summaries that are produced.
List of six elements.
n_panel
number of panels in revisit design
n_period
number of time periods in revisit design
n_total
total number of sample events across all panels and all
time periods, accounting for visitdsgn
, that will be sampled in the revisit
design
n_periodunit
vector of the number of time periods a unit will be sampled in each panel
n_unitpnl
vector of the number of sample units, accounting for
visitdsgn
, that will be sampled in each panel
n_unitperiod
vector of the number of sample units, accounting for
visitdsgn
, that will be sampled during each time period
ncum_unit
vector of the cumulative number of unique units that will be sampled in time periods up to and including the current time period.
Tony Olsen [email protected]
# Serially alternating panel revisit design summary sa_dsgn <- revisit_dsgn(20, panels = list(SA60N = list( n = 60, pnl_dsgn = c(1, 4), pnl_n = NA, start_option = "None" )), begin = 1) pd_summary(sa_dsgn) # Add visit design where first panel is sampled twice at every time period sa_visit <- sa_dsgn sa_visit[sa_visit > 0] <- 1 sa_visit[1, sa_visit[1, ] > 0] <- 2 pd_summary(sa_dsgn, sa_visit)
# Serially alternating panel revisit design summary sa_dsgn <- revisit_dsgn(20, panels = list(SA60N = list( n = 60, pnl_dsgn = c(1, 4), pnl_n = NA, start_option = "None" )), begin = 1) pd_summary(sa_dsgn) # Add visit design where first panel is sampled twice at every time period sa_visit <- sa_dsgn sa_visit[sa_visit > 0] <- 1 sa_visit[1, sa_visit[1, ] > 0] <- 2 pd_summary(sa_dsgn, sa_visit)
This function plots sampling frames, design sites, and analysis data.
If the left-hand side of the formula is empty, plots
are of the distributions of the right-hand side variables. If the left-hand side
of the variable contains a variable, plots are of the left-hand size variable
for each level of each right-hand side variable.
This function is largely built on plot.sf()
, and all spsurvey plotting
methods can supply additional arguments to plot.sf()
. For more information on
plotting in sf
, run ?sf::plot.sf()
. Equivalent to sp_plot()
; both
are currently maintained for backwards compatibility.
## S3 method for class 'sp_frame' plot( x, formula = ~1, xcoord, ycoord, crs, var_args = NULL, varlevel_args = NULL, geom = FALSE, onlyshow = NULL, fix_bbox = TRUE, ... ) ## S3 method for class 'sp_design' plot( x, sframe = NULL, formula = ~siteuse, siteuse = NULL, var_args = NULL, varlevel_args = NULL, geom = FALSE, onlyshow = NULL, fix_bbox = TRUE, ... )
## S3 method for class 'sp_frame' plot( x, formula = ~1, xcoord, ycoord, crs, var_args = NULL, varlevel_args = NULL, geom = FALSE, onlyshow = NULL, fix_bbox = TRUE, ... ) ## S3 method for class 'sp_design' plot( x, sframe = NULL, formula = ~siteuse, siteuse = NULL, var_args = NULL, varlevel_args = NULL, geom = FALSE, onlyshow = NULL, fix_bbox = TRUE, ... )
x |
An object to plot. When plotting sampling frames an |
formula |
A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of |
xcoord |
Name of the x-coordinate (east-west) in |
ycoord |
Name of y (north-south)-coordinate in |
crs |
Projection code for |
var_args |
A named list. The name of each list element corresponds to a
right-hand side variable in |
varlevel_args |
A named list. The name of each list element corresponds to a
right-hand side variable in |
geom |
Should separate geometries for each level of the right-hand
side |
onlyshow |
A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. |
fix_bbox |
Should the geometry bounding box be fixed across plots?
If a length-four vector with names "xmin", "ymin", "xmax", and "ymax" and values
indicating bounding box edges, the bounding box will be fixed as |
... |
Additional arguments to pass to |
sframe |
The sampling frame (an |
siteuse |
A character vector of site types to include when plotting design sites.
It can only take on values |
Michael Dumelle [email protected]
## Not run: data("NE_Lakes") NE_Lakes <- sp_frame(NE_Lakes) plot(NE_Lakes, formula = ~ELEV_CAT) sample <- grts(NE_Lakes, 30) plot(sample, NE_Lakes) ## End(Not run)
## Not run: data("NE_Lakes") NE_Lakes <- sp_frame(NE_Lakes) plot(NE_Lakes, formula = ~ELEV_CAT) sample <- grts(NE_Lakes, 30) plot(sample, NE_Lakes) ## End(Not run)
This function creates a CDF plot. Input data for the plots is provided by a
data frame from the "CDF" output given by cont_analysis
.
Confidence limits for the CDF also are plotted. Equivalent to cdf_plot()
;
both are currently maintained for backwards compatibility.
## S3 method for class 'sp_CDF' plot( x, var = NULL, subpop = NULL, subpop_level = NULL, units_cdf = "Percent", type_cdf = "Continuous", log = "", xlab = NULL, ylab = NULL, ylab_r = NULL, main = NULL, legloc = NULL, confcut = 0, conflev = 95, cex.main = 1.2, cex.legend = 1, ... )
## S3 method for class 'sp_CDF' plot( x, var = NULL, subpop = NULL, subpop_level = NULL, units_cdf = "Percent", type_cdf = "Continuous", log = "", xlab = NULL, ylab = NULL, ylab_r = NULL, main = NULL, legloc = NULL, confcut = 0, conflev = 95, cex.main = 1.2, cex.legend = 1, ... )
x |
Data frame from the "CDF" output given by
|
var |
If |
subpop |
If |
subpop_level |
If |
units_cdf |
Indicator for the label utilized for the left side y-axis and the values used for the left side y-axis tick marks, where "Percent" means the label and values are in terms of percent of the population, and "Units" means the label and values are in terms of units (count, length, or area) of the population. The default is "Percent". |
type_cdf |
Character string consisting of the value "Continuous" or "Ordinal" that controls the type of CDF plot. The default is "Continuous". |
log |
Character string consisting of the value "" or "x" that controls whether the x axis uses the original scale ("") or the base 10 logarithmic scale ("x"). The default is "". |
xlab |
Character string providing the x-axis label. If this argument equals NULL, then the indicator name is used as the label. The default is NULL. |
ylab |
Character string providing the left side y-axis label. If argument units_cdf equals "Units", a value should be provided for this argument. Otherwise, the label will be "Percent". The default is "Percent". |
ylab_r |
Character string providing the label for the right side y-axis (and, hence, determining the values used for the right side y-axis tick marks), where NULL means a right side y-axis is not created. If this argument equals "Same", the right side y-axis will have the same label and tick mark values as the left side y-axis. If this argument equals a character string other than "Same", the right side y-axis label will be the value provided for argument ylab_r, and the right side y-axis tick mark values will be determined by the choice not utilized for argument units_cdf, which means that the default value of argument units_cdf (i.e., "Percent") will result in the right side y-axis tick mark values being expressed in terms of units of the population (i.e., count, length, or area). The default is NULL. |
main |
Character string providing the plot title. The default is NULL. |
legloc |
Indicator for location of the plot legend, where "BR" means bottom right, "BL" means bottom left, "TR" means top right, "TL" means top left, and NULL means no legend. The default is NULL. |
confcut |
Numeric value that controls plotting confidence limits at the CDF extremes. Confidence limits for CDF values (percent scale) less than confcut or greater than 100 minus confcut are not plotted. A value of zero means confidence limits are plotted for the complete range of the CDF. The default is 0. |
conflev |
Numeric value of the confidence level used for confidence limits. The default is 95. |
cex.main |
Expansion factor for the plot title. The default is 1.2. |
cex.legend |
Expansion factor for the legend title. The default is 1. |
... |
Additional arguments passed to the |
A plot of a variable's CDF estimates associated confidence limits.
Tom Kincaid [email protected]
cont_cdfplot
for creating a PDF file containing CDF plots
cont_cdftest
for CDF hypothesis testing
## Not run: dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), ContVar = rnorm(100, 10, 1), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("ContVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) myanalysis <- cont_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize ) keep <- with(myanalysis$CDF, Type == "Resource_Class" & Subpopulation == "Good") par(mfrow = c(2, 1)) plot(myanalysis$CDF[keep, ], xlab = "ContVar", ylab = "Percent of Stream Length", ylab_r = "Stream Length (km)", main = "Estimates for Resource Class: Good" ) plot(myanalysis$CDF[keep, ], xlab = "ContVar", ylab = "Percent of Stream Length", ylab_r = "Same", main = "Estimates for Resource Class: Good" ) ## End(Not run)
## Not run: dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), ContVar = rnorm(100, 10, 1), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Good", "Poor"), c(55, 45)) ) myvars <- c("ContVar") mysubpops <- c("All_Sites", "Resource_Class") mypopsize <- data.frame( Resource_Class = c("Good", "Poor"), Total = c(4000, 1500) ) myanalysis <- cont_analysis(dframe, vars = myvars, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum", popsize = mypopsize ) keep <- with(myanalysis$CDF, Type == "Resource_Class" & Subpopulation == "Good") par(mfrow = c(2, 1)) plot(myanalysis$CDF[keep, ], xlab = "ContVar", ylab = "Percent of Stream Length", ylab_r = "Stream Length (km)", main = "Estimates for Resource Class: Good" ) plot(myanalysis$CDF[keep, ], xlab = "ContVar", ylab = "Percent of Stream Length", ylab_r = "Same", main = "Estimates for Resource Class: Good" ) ## End(Not run)
Calculates the power for trend detection for one or more variables, for one or more panel designs, for one or more linear trends, and for one or more significance levels. The panel designs create a covariance model where the model includes variance components for units, periods, the interaction of units and periods, and the residual (or index) variance.
power_dsgn( ind_names, ind_values, unit_var, period_var, unitperiod_var, index_var, unit_rho = 1, period_rho = 0, paneldsgn, nrepeats = NULL, trend_type = "mean", ind_pct = NULL, ind_tail = NULL, trend = 2, alpha = 0.05 )
power_dsgn( ind_names, ind_values, unit_var, period_var, unitperiod_var, index_var, unit_rho = 1, period_rho = 0, paneldsgn, nrepeats = NULL, trend_type = "mean", ind_pct = NULL, ind_tail = NULL, trend = 2, alpha = 0.05 )
ind_names |
Vector of indicator names |
ind_values |
Vector of indicator mean values |
unit_var |
Vector of variance component estimates for unit variability for the indicators |
period_var |
Vector of variance component estimates for period variability for the indicators |
unitperiod_var |
Vector of variance component estimates for unit by period interaction variability for the indicators |
index_var |
Vector of variance component estimates for index (residual) error for the indicators |
unit_rho |
Correlation across units. Default is |
period_rho |
Correlation across periods. Default is |
paneldsgn |
A list of panel designs each as a matrix. Each element of
the list is a matrix with |
nrepeats |
Either |
trend_type |
Trend type is either |
ind_pct |
When |
ind_tail |
When trend_type is equal to |
trend |
Single value or vector of assumed percent change from
initial value in the indicator for each period. Assumes the trend is
expressed as percent per period. Note that the trend may be either positive
or negative. The default is |
alpha |
Single value or vector of significance level for linear
trend test, alpha, Type I error, level. The default is |
Calculates the power for detecting a change in the mean for different panel design structures. The model incorporates unit, period, unit by period, and index variance components as well as correlation across units and across periods. See references for methods.
A list with components trend_type
, ind_pct
, ind_tail
, trend values
across periods, periods (all periods included in one or more panel
designs), significance levels, a five-dimensional array of power
calculations (dimensions: panel, design names, periods, indicator names,
trend names, alpha_names
), an array of indicator mean values for each trend
and the function call.
Tony Olsen [email protected]
Urquhart, N. S., W. S. Overton, et al. (1993) Comparing sampling designs for monitoring ecological status and trends: impact of temporal patterns. In: Statistics for the Environment. V. Barnett and K. F. Turkman. John Wiley & Sons, New York, pp. 71-86.
Urquhart, N. S. and T. M. Kincaid (1999). Designs for detecting trends from repeated surveys of ecological resources. Journal of Agricultural, Biological, and Environmental Statistics, 4(4), 404-414.
Urquhart, N. S. (2012). The role of monitoring design in detecting trend in long-term ecological monitoring studies. In: Design and Analysis of Long-term Ecological Monitoring Studies. R. A. Gitzen, J. J. Millspaugh, A. B. Cooper, and D. S. Licht (eds.). Cambridge University Press, New York, pp. 151-173.
ppd_plot
to plot power curves for panel designs
# Power for rotating panel with sample size 60 power_dsgn("Variable_Name", ind_values = 43, unit_var = 280, period_var = 4, unitperiod_var = 40, index_var = 90, unit_rho = 1, period_rho = 0, paneldsgn = list(NoR60 = revisit_dsgn(20, panels = list(NoR60 = list( n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None" )), begin = 1 )), nrepeats = NULL, trend_type = "mean", trend = 1.0, alpha = 0.05 )
# Power for rotating panel with sample size 60 power_dsgn("Variable_Name", ind_values = 43, unit_var = 280, period_var = 4, unitperiod_var = 40, index_var = 90, unit_rho = 1, period_rho = 0, paneldsgn = list(NoR60 = revisit_dsgn(20, panels = list(NoR60 = list( n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None" )), begin = 1 )), nrepeats = NULL, trend_type = "mean", trend = 1.0, alpha = 0.05 )
Plot power curves and relative power curves for trend detection for set of panel designs, time periods, indicators, significance levels and trend. Trend may be based on percent change per period in mean or percent change in proportion of cumulative distribution function above or below a fixed cut point. Types of plots are combinations of standard/relative, mean/percent, period/change and design/indicator. Input must be be of class powerpaneldesign and is normally the output of function power_dsgn.
ppd_plot( object, plot_type = "standard", trend_type = "mean", xaxis_type = "period", comp_type = "design", dsgns = NULL, indicator = NULL, trend = NULL, period = NULL, alpha = NULL, ... )
ppd_plot( object, plot_type = "standard", trend_type = "mean", xaxis_type = "period", comp_type = "design", dsgns = NULL, indicator = NULL, trend = NULL, period = NULL, alpha = NULL, ... )
object |
List object of class |
plot_type |
Default is |
trend_type |
Character value for trend in mean ( |
xaxis_type |
Character value equal to |
comp_type |
Character value equal to |
dsgns |
Vector of names of panel designs that are to be plotted. Names
must be all, or a subset of, names of designs in |
indicator |
Vector of indicator names contained in |
trend |
|
period |
|
alpha |
A single value or vector of significance levels (as proportion,
e.g. |
... |
Additional arguments (S3 consistency) |
By default the plot function produces a standard power curve at end
of each time period on the x-axis with y-axis as power. When more than one
panel design is in dsgnpower
, the first panel design is used. When more than
one indicator is in dsgnpower
, the first indicator is used. When more than
one trend value is in dsgnpower
, the maximum trend value is used. When more
than one significance level, alpha
, is in dsgnpower
, the minimum
significance level is used.
Control of the type of plot produced is governed by plot_type
, trend_type
,
xaxis_type
and comp_type
. The number of plots produced is governed by the
number of panel designs (dsgn
) specified, the number of indicators
(indicator
) specified, the number of time periods (period
) specifies, the
number of trend values (trend) specified and the number of significance
levels (alpha
) specified.
When the comparison type ("comp_type"
) is equal to "design"
, all power
curves specified by dsgn are plotted on the same plot. When comp_type
is
equal to "indicator"
, all power curves specified by "indicator"
are plotted
on the same plot. Typically, no more than 4-5 power curves should be
plotted on same plot.
One or more power curve plots are created and plotted. User must specify output graphical device if more than one plot is created. See Devices for graphical output options.
Tony Olsen [email protected]
## Not run: # Construct a rotating panel design with sample size of 60 R60N <- revisit_dsgn(20, panels = list(R60N = list( n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None" )), begin = 1) # Construct a fixed panel design with sample size of 60 F60 <- revisit_dsgn(20, panels = list(F60 = list( n = 60, pnl_dsgn = c(1, 0), pnl_n = NA, start_option = "None" )), begin = 1) # Power for rotating panel with sample size 60 Power_tst <- power_dsgn("Variable_Name", ind_values = 43, unit_var = 280, period_var = 4, unitperiod_var = 40, index_var = 90, unit_rho = 1, period_rho = 0, paneldsgn = list( R60N = R60N, F60 = F60 ), nrepeats = NULL, trend_type = "mean", trend = c(1.0, 2.0), alpha = 0.05 ) ppd_plot(Power_tst) ppd_plot(Power_tst, dsgns = c("F60", "R60N")) ppd_plot(Power_tst, dsgns = c("F60", "R60N"), trend = 1.0) ppd_plot(Power_tst, plot_type = "relative", comp_type = "design", trend_type = "mean", trend = c(1, 2), dsgns = c("R60N", "F60"), indicator = "Variable_Name" ) ## End(Not run)
## Not run: # Construct a rotating panel design with sample size of 60 R60N <- revisit_dsgn(20, panels = list(R60N = list( n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None" )), begin = 1) # Construct a fixed panel design with sample size of 60 F60 <- revisit_dsgn(20, panels = list(F60 = list( n = 60, pnl_dsgn = c(1, 0), pnl_n = NA, start_option = "None" )), begin = 1) # Power for rotating panel with sample size 60 Power_tst <- power_dsgn("Variable_Name", ind_values = 43, unit_var = 280, period_var = 4, unitperiod_var = 40, index_var = 90, unit_rho = 1, period_rho = 0, paneldsgn = list( R60N = R60N, F60 = F60 ), nrepeats = NULL, trend_type = "mean", trend = c(1.0, 2.0), alpha = 0.05 ) ppd_plot(Power_tst) ppd_plot(Power_tst, dsgns = c("F60", "R60N")) ppd_plot(Power_tst, dsgns = c("F60", "R60N"), trend = 1.0) ppd_plot(Power_tst, plot_type = "relative", comp_type = "design", trend_type = "mean", trend = c(1, 2), dsgns = c("R60N", "F60"), indicator = "Variable_Name" ) ## End(Not run)
This function organizes input and output for relative risk analysis (of
categorical variables). The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
relrisk_analysis( dframe, vars_response, vars_stressor, response_levels = NULL, stressor_levels = NULL, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", conf = 95, All_Sites = FALSE )
relrisk_analysis( dframe, vars_response, vars_stressor, response_levels = NULL, stressor_levels = NULL, subpops = NULL, siteID = NULL, weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, vartype = "Local", conf = 95, All_Sites = FALSE )
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_response |
Vector composed of character values that identify the
names of response variables in |
vars_stressor |
Vector composed of character values that identify the
names of stressor variables in |
response_levels |
List providing the category values (levels) for each
element in the |
stressor_levels |
List providing the category values (levels) for each
element in the |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing the name of the site ID variable in
|
weight |
Character value providing the name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing the name of the stratum ID
variable in |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing the name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing the name of the stage one size
weight variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and size of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
stressor variable
sample size
relative risk estimate
relative risk numerator estimate
relative risk denominator estimate
relative risk standard error
relative risk margin of error
xx% (default 95%) lower confidence bound
xx% (default 95%) upper confidence bound
sum of design weights
number of observations in the poor response and poor stressor group
number of observations in the poor response and good stressor group
number of observations in the good response and poor stressor group
number of observations in the good response and good stressor group
weighted proportion of observations in the poor response and poor stressor group
weighted proportion of observations in the poor response and good stressor group
weighted proportion of observations in the good response and poor stressor group
weighted proportion of observations in the good response and good stressor group
Relative risk measures the relative strength of association between conditional probabilities defined for a response variable and a stressor variable, where the response and stressor variables are classified as either good (i.e., reference condition) or poor (i.e., different from reference condition). Relative risk is defined as the ratio of two conditional probabilities. The numerator of the ratio is the probability that the response variable is in poor condition given that the stressor variable is in poor condition. The denominator of the ratio is the probability that the response variable is in poor condition given that the stressor variable is in good condition. A relative risk value equal to one indicates that the response variable is independent of the stressor variable. Relative risk values greater than one measure the extent to which poor condition of the stressor variable is associated with poor condition of the response variable.
Tom Kincaid [email protected]
attrisk_analysis
for attributable risk analysis
diffrisk_analysis
for risk difference analysis
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE), RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE), StressVar = sample(c("Poor", "Good"), 100, replace = TRUE), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Agr", "Forest"), c(55, 45)) ) myresponse <- c("RespVar1", "RespVar2") mystressor <- c("StressVar") mysubpops <- c("All_Sites", "Resource_Class") relrisk_analysis(dframe, vars_response = myresponse, vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum" )
dframe <- data.frame( siteID = paste0("Site", 1:100), wgt = runif(100, 10, 100), xcoord = runif(100), ycoord = runif(100), stratum = rep(c("Stratum1", "Stratum2"), 50), RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE), RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE), StressVar = sample(c("Poor", "Good"), 100, replace = TRUE), All_Sites = rep("All Sites", 100), Resource_Class = rep(c("Agr", "Forest"), c(55, 45)) ) myresponse <- c("RespVar1", "RespVar2") mystressor <- c("StressVar") mysubpops <- c("All_Sites", "Resource_Class") relrisk_analysis(dframe, vars_response = myresponse, vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum" )
Create a revisit design for panels in a survey that specifies the time periods for the units of each panel to be sampled based on searching for a D-optimal block design that is a member of the class of generalized Youden designs. The resulting design need not be a balanced incomplete block design. Based on algorithmic idea by Cook and Nachtsheim (1989) and implemented by Robert Wheeler.
revisit_bibd( n_period, n_pnl, n_visit, nsamp, panel_name = "BIB", begin = 1, skip = 1, iter = 30 )
revisit_bibd( n_period, n_pnl, n_visit, nsamp, panel_name = "BIB", begin = 1, skip = 1, iter = 30 )
n_period |
Number of time periods for the survey design. Typically, number of periods if sampling occurs once per period or number of months if sampling occurs once per month. (v, number of varieties/treatments in BIBD terms) |
n_pnl |
Number of panels (b, number of blocks in BIBD terms) |
n_visit |
Number of time periods to be visited in a panel (k, block size in BIBD terms) |
nsamp |
Number of samples in each panel. |
panel_name |
Prefix for name of each panel |
begin |
Numeric name of first sampling occasion, e.g. a specific period. |
skip |
Number of sampling occasions to skip between planned sampling
periods, e.g., sampling will occur only every 5 periods if |
iter |
Maximum number of iterations in search for D-optimal Generalized Youden Design. |
The function uses find.BIB
function from crossdes package to
search for a D-optimal block design. crossdes uses package AlgDesign
to search balanced incomplete block designs.
A two-dimensional array of sample sizes to be sampled for each panel and each sampling occasion.
Tony Olsen [email protected]
Cook R. D. and C. Nachtsheim. (1989). Computer-aided blocking of factorial and response-surface designs. Technometrics 31(3), 339-346.
revisit_dsgn
to create a panel revisit design
revisit_rand
to create a panel revisit design with random assignment to panels and time periods
pd_summary
to summarize characteristics of a panel revisit design
# Balanced incomplete block design with 20 sample occasions, 20 panels, # 3 visits to each unit, and 20 units in each panel. revisit_bibd(n_period = 20, n_pnl = 20, n_visit = 3, nsamp = 20)
# Balanced incomplete block design with 20 sample occasions, 20 panels, # 3 visits to each unit, and 20 units in each panel. revisit_bibd(n_period = 20, n_pnl = 20, n_visit = 3, nsamp = 20)
Create a revisit design for panels in a survey that specifies the time periods that members of each panel will be sampled. Three basic panel design structures may be created: always revisit panel, serially alternating panels, or rotating panels.
revisit_dsgn(n_period, panels, begin = 1, skip = 1)
revisit_dsgn(n_period, panels, begin = 1, skip = 1)
n_period |
Number of time periods for the panel design. For example, number of periods if sampling occurs once per period or number of months if sampling occurs once per month. |
panels |
List of lists where each list specifies a revisit panel
structure. Each sublist consists of four components: |
begin |
Numeric name of first sampling occasion, e.g. a specific period. |
skip |
Number of time periods to skip between planned sampling
periods, e.g., sampling will occur only every 5 periods if |
The function creates revisit designs using the concepts in McDonald (2003) to specify the revisit pattern across time periods for each panel. The panel revisit schedule is specified by a vector. Odd positions in vector specify the number of consecutive time periods when panel units are sampled. Even positions in vector specify the number of consecutive time periods when panel units are not sampled.
If last even position is a "0"
, then a single panel follows an always
revisit panel structure. After satisfying the initial revisit schedule
specified prior to the "0"
, units in a panel are always visited for rest of
the time periods. The simplest always revisit panel design is to revisit
every sample unit on every time period, specified as pnl_dsgn = c(1,0)
or
using McDonald's notation [1-0].
If the last even position is NA
, the panels follow a rotating panel
structure. For example, pnl_dsgn = c(1, NA)
designates that sample units in
a panel will be visited once and then never again, [1-n] in McDonald's
notation. pnl_dsgn =c(1, 4, 1, NA)
designates that sample units in a panel
will be visited once, then not sampled on next four time periods, then
sampled again once at the next time period and then never sampled again,
[1-4-1-n] in McDonald/s notation.
If the last even position is > 0
, the panels follow a serially alternating
panel structure. For example, pnl_dsgn = c(1, 4)
designates that sample
units in a panel will be visited once, then not sampled during the next
four time periods, then sampled once and not sampled for next four time
periods, and that cycle repeated until end of the number of time periods,
[1-4] in McDonald's notation. pnl_dsgn = c(2, 3, 1, 4)
designates that the
cycle has sample units in a panel being visited during two consecutive time
periods, not sampled for three consecutive time periods, sampled for one time
period and then not sampled on next four time periods, and the cycle is
repeated until end of the number of time periods, [2-3-1-4] in McDonald's
notation.
The number of panels in a single panel design is specified by pnl_n
. For
an always revisit panel structure, a single panel is created and pnl_n
is
ignored. For a rotating panel structure, when pnl_n = NA
, the number of
panels is equal to n_period. Note that this should only be used when the
rotating panel structure is the only panel design, i.e., no split panel
design (see below for split panel details). If pnl_n = m
is specified for a
rotating panel design, then then number of panels will be m
. For example,
pnl_dsgn = c( 1, 4, 1, NA)
and and pnl_n = 5
means that only 5 panels will
be constructed and the last time period to be sampled will be time period
10. In McDonald's notation the panel design structure is [(1-4-1-n)^5]. If
the number of time periods, n_period
, is 20 and no other panel design
structure is specified, then the last 10 time periods will not be sampled.
For serially alternating panels, when pnl_n = NA
, the number of panels will
be the sum of the elements in pan_dsgn (ignoring NA
). If pnl_n
is specified
as m
, then m
panels will be created. For example, pnl_dsgn = c(1, 4, 1, 4)
and pnl_n = 3
, [(1-4-1-4)^3] in McDonald's notation, will create first three
panels of the 510 serially alternating panels specified by pnl_dsgn
.
A serially alternating or rotating panel revisit design may not result in
the same number of units being sampled during each time period,
particularly during the initial start up period. The default is to not
specify a startup option ("None"
). Start up option "Partial_Begin"
initiates the revisit design at the last time period scheduled for sampling
in the first panel. For example, a [2-3-1-4] design starts at time period 6
instead of time period 1 under the Partial_Begin option. For a serially
alternating panel structure, start up option "Partial_End"
initiates the
revisit design at the time period that begins the second serially
alternating pattern. For example, a [2-3-1-4] design starts at time period
11 instead of time period 1. For a rotating panel structure design, use of
Partial_End makes the assumption that the number of panels equals the
number of time periods and adds units to the last "m" panels for time
periods 1
to "m"
as if number of time periods was extended by "m"
where "m"
is one less than then the sum of the panel design. For example, a
[1-4-1-4-1-n] design would result in m = 10
. Note that some designs with
pnl_n
not equal to the number of sample occasions can produce unexpected
panel designs. See examples.
Different types of panel structures can be combined, these are termed split panels by many authors, by specifying more than one list for the panels parameter. The total number of panels is the sum of the number of panels in each of the panel structures specified by the split panel design.
A two-dimensional array of sample sizes to be sampled at each combination of panel and time period.
Tony Olsen [email protected]
McDonald, T. (2003). Review of environmental monitoring methods: survey designs. Environmental Monitoring and Assessment 85, 277-292.
revisit_bibd
to create a balanced incomplete block panel revisit design
revisit_rand
to create a revisit design with random assignment to panels and time periods
pd_summary
to summarize characteristics of a panel revisit design
# One panel of 60 sample units sampled at every time period: [1-0] revisit_dsgn(20, panels = list( Annual = list( n = 60, pnl_dsgn = c(1, 0), pnl.n = NA, start_option = "None" ) ), begin = 1) # Rotating panels of 60 units sampled once and never again: [1-n]. Number # of panels equal n_period. revisit_dsgn(20, panels = list( R60N = list(n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None") ), begin = 1 ) # Serially alternating panel with three visits to sample unit then skip # next two time periods: [3-2] revisit_dsgn(20, panels = list( SA60PE = list( n = 20, pnl_dsgn = c(3, 2), pnl_n = NA, start_option = "Partial_End" ) ), begin = 1) # Split panel of sample units combining above two panel designs: [1-0, 1-n] revisit_dsgn(n_period = 20, begin = 2017, panels = list( Annual = list( n = 60, pnl_dsgn = c(1, 0), pnl.n = NA, start_option = "None" ), R60N = list(n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None") ))
# One panel of 60 sample units sampled at every time period: [1-0] revisit_dsgn(20, panels = list( Annual = list( n = 60, pnl_dsgn = c(1, 0), pnl.n = NA, start_option = "None" ) ), begin = 1) # Rotating panels of 60 units sampled once and never again: [1-n]. Number # of panels equal n_period. revisit_dsgn(20, panels = list( R60N = list(n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None") ), begin = 1 ) # Serially alternating panel with three visits to sample unit then skip # next two time periods: [3-2] revisit_dsgn(20, panels = list( SA60PE = list( n = 20, pnl_dsgn = c(3, 2), pnl_n = NA, start_option = "Partial_End" ) ), begin = 1) # Split panel of sample units combining above two panel designs: [1-0, 1-n] revisit_dsgn(n_period = 20, begin = 2017, panels = list( Annual = list( n = 60, pnl_dsgn = c(1, 0), pnl.n = NA, start_option = "None" ), R60N = list(n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None") ))
Create a revisit design for a survey that specifies the panels and time
periods that will be sampled by random selection of panels and time periods.
Three options for random assignments are "period"
where the number of time
periods to be sampled in a panel is fixed, "panel"
where the number panels to
be sampled in a time period is fixed, and "none"
where the number of
panel-period combinations is fixed.
revisit_rand( n_period, n_pnl, rand_control = "period", n_visit, nsamp, panel_name = "Random", begin = 1, skip = 1 )
revisit_rand( n_period, n_pnl, rand_control = "period", n_visit, nsamp, panel_name = "Random", begin = 1, skip = 1 )
n_period |
Number of time periods for the survey design. Typically, number of periods if sampling occurs once per period or number of months if sampling occurs once per month. (v, number of varieties (or treatments) in BIBD terms) |
n_pnl |
Number of panels |
rand_control |
Character value must be |
n_visit |
If |
nsamp |
Number of samples in each panel. |
panel_name |
Prefix for name of each panel |
begin |
Numeric name of first sampling occasion, e.g. a specific period. |
skip |
Number of sampling occasions to skip between planned sampling
periods, e.g., sampling will occur only every 5 periods if |
The revisit design for a survey is created by random selection of panels and time periods that will have sample events. The number of sample occasions that will be visited by a panel is random.
A two-dimensional array of sample sizes to be sampled for each panel and each time period.
Tony Olsen [email protected]
revisit_bibd
create a balanced incomplete block panel revisit design
revisit_dsgn
create a panel revisit design
pd_summary
to summarize characteristics of a panel revisit design
revisit_rand( n_period = 20, n_pnl = 10, rand_control = "none", n_visit = 50, nsamp = 20 ) revisit_rand( n_period = 20, n_pnl = 10, rand_control = "panel", n_visit = 5, nsamp = 10 ) revisit_rand( n_period = 20, n_pnl = 10, rand_control = "period", n_visit = 5, nsamp = 10 )
revisit_rand( n_period = 20, n_pnl = 10, rand_control = "none", n_visit = 50, nsamp = 20 ) revisit_rand( n_period = 20, n_pnl = 10, rand_control = "panel", n_visit = 5, nsamp = 10 ) revisit_rand( n_period = 20, n_pnl = 10, rand_control = "period", n_visit = 5, nsamp = 10 )
This function measures the spatial balance (with respect to the sampling frame) of design sites using Voronoi polygons (Dirichlet tessellations).
sp_balance( object, sframe, stratum_var = NULL, ip = NULL, metrics = "pielou", extents = FALSE )
sp_balance( object, sframe, stratum_var = NULL, ip = NULL, metrics = "pielou", extents = FALSE )
object |
An |
sframe |
The sampling frame as an |
stratum_var |
The name of the stratum variable in |
ip |
Inclusion probabilities associated with each row of |
metrics |
A character vector of spatial balance metrics:
All spatial balance metrics have a lower bound of zero, which indicates perfect spatial balance. As the metric value increases, the spatial balance decreases. |
extents |
Should the extent (total units) within each Voronoi polygon
be returned? Defaults to |
A data frame with columns providing the stratum (stratum
),
spatial balance metric (metric
), and spatial balance (value
).
Michael Dumelle [email protected]
## Not run: sample <- grts(NE_Lakes, 30) sp_balance(sample$sites_base, NE_Lakes) strata_n <- c(low = 25, high = 30) sample_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT") sp_balance(sample_strat$sites_base, NE_Lakes, stratum_var = "ELEV_CAT", metric = "rmse") ## End(Not run)
## Not run: sample <- grts(NE_Lakes, 30) sp_balance(sample$sites_base, NE_Lakes) strata_n <- c(low = 25, high = 30) sample_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT") sp_balance(sample_strat$sites_base, NE_Lakes, stratum_var = "ELEV_CAT", metric = "rmse") ## End(Not run)
sp_frame
objectsTurn sampling frames or analysis data into an sp_frame
object
or transform sp_frame
objects back into their original object.
sp_frame(frame) sp_unframe(sp_frame)
sp_frame(frame) sp_unframe(sp_frame)
frame |
A sampling frame or analysis data |
sp_frame |
An |
The sp_frame()
function assigns frame
class sp_frame
to be used by summary()
and plot()
. sp_frame()
objects
can sometimes clash with other sf and tidyverse generics, so un_spframe()
removes
class sp_frame()
, leaving the original classes of frame
intact.
An sp_frame
object.
NE_Lakes <- sp_frame(NE_Lakes) class(NE_Lakes) NE_Lakes <- sp_unframe(NE_Lakes) class(NE_Lakes)
NE_Lakes <- sp_frame(NE_Lakes) class(NE_Lakes) NE_Lakes <- sp_unframe(NE_Lakes) class(NE_Lakes)
This function plots sampling frames, design sites, and analysis data.
If the left-hand side of the formula is empty, plots
are of the distributions of the right-hand side variables. If the left-hand side
of the variable contains a variable, plots are of the left-hand size variable
for each level of each right-hand side variable.
This function is largely built on plot.sf()
, and all spsurvey plotting
methods can supply additional arguments to plot.sf()
. For more information on
plotting in sf
, run ?sf::plot.sf()
. Equivalent to spsurvey::plot()
; both
are currently maintained for backwards compatibility.
sp_plot(object, ...) ## Default S3 method: sp_plot( object, formula = ~1, xcoord, ycoord, crs, var_args = NULL, varlevel_args = NULL, geom = FALSE, onlyshow = NULL, fix_bbox = TRUE, ... ) ## S3 method for class 'sp_design' sp_plot( object, sframe = NULL, formula = ~siteuse, siteuse = NULL, var_args = NULL, varlevel_args = NULL, geom = FALSE, onlyshow = NULL, fix_bbox = TRUE, ... )
sp_plot(object, ...) ## Default S3 method: sp_plot( object, formula = ~1, xcoord, ycoord, crs, var_args = NULL, varlevel_args = NULL, geom = FALSE, onlyshow = NULL, fix_bbox = TRUE, ... ) ## S3 method for class 'sp_design' sp_plot( object, sframe = NULL, formula = ~siteuse, siteuse = NULL, var_args = NULL, varlevel_args = NULL, geom = FALSE, onlyshow = NULL, fix_bbox = TRUE, ... )
object |
An object to plot. When plotting sampling frames or analysis data,
a data frame or |
... |
Additional arguments to pass to |
formula |
A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of |
xcoord |
Name of the x-coordinate (east-west) in |
ycoord |
Name of y (north-south)-coordinate in |
crs |
Projection code for |
var_args |
A named list. The name of each list element corresponds to a
right-hand side variable in |
varlevel_args |
A named list. The name of each list element corresponds to a
right-hand side variable in |
geom |
Should separate geometries for each level of the right-hand
side |
onlyshow |
A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. |
fix_bbox |
Should the geometry bounding box be fixed across plots?
If a length-four vector with names "xmin", "ymin", "xmax", and "ymax" and values
indicating bounding box edges, the bounding box will be fixed as |
sframe |
The sampling frame (an |
siteuse |
A character vector of site types to include when plotting design sites.
It can only take on values |
Michael Dumelle [email protected]
## Not run: data("NE_Lakes") sp_plot(NE_Lakes, formula = ~ELEV_CAT) sample <- grts(NE_Lakes, 30) sp_plot(sample, NE_Lakes) data("NLA_PNW") sp_plot(NLA_PNW, formula = ~BMMI) ## End(Not run)
## Not run: data("NE_Lakes") sp_plot(NE_Lakes, formula = ~ELEV_CAT) sample <- grts(NE_Lakes, 30) sp_plot(sample, NE_Lakes) data("NLA_PNW") sp_plot(NLA_PNW, formula = ~BMMI) ## End(Not run)
This function row binds the sites_legacy
, sites_base
,
sites_over
, and sites_near
objects from a GRTS or IRS sample
into a single sf
object. This function is most useful when a single
sf
object that contains all design sites is desired
(e.g. writing out a single shapefile using sf::write_sf()
).
sp_rbind(object, siteuse = NULL)
sp_rbind(object, siteuse = NULL)
object |
The design sites (output from |
siteuse |
A character vector of site types to return. Can contain
|
A single sf
object containing all requested design sites.
Michael Dumelle [email protected]
## Not run: sample <- grts(NE_Lakes, 50, n_over = 10) sample <- sp_rbind(sample) write_sf(sample, "mypath/sample.shp") ## End(Not run)
## Not run: sample <- grts(NE_Lakes, 50, n_over = 10) sample <- sp_rbind(sample) write_sf(sample, "mypath/sample.shp") ## End(Not run)
sp_summary()
summarizes sampling frames, design sites, and analysis data. The right-hand of the
formula specifies the variables (or factors) to
summarize by. If the left-hand side of the formula is empty, the
summary will be of the distributions of the right-hand side variables. If the left-hand side
of the formula contains a variable, the summary will be of the left-hand size variable
for each level of each right-hand side variable. Equivalent to spsurvey::summary()
; both
are currently maintained for backwards compatibility.
sp_summary(object, ...) ## Default S3 method: sp_summary(object, formula = ~1, onlyshow = NULL, ...) ## S3 method for class 'sp_design' sp_summary(object, formula = ~siteuse, siteuse = NULL, onlyshow = NULL, ...)
sp_summary(object, ...) ## Default S3 method: sp_summary(object, formula = ~1, onlyshow = NULL, ...) ## S3 method for class 'sp_design' sp_summary(object, formula = ~siteuse, siteuse = NULL, onlyshow = NULL, ...)
object |
An object to summarize. When summarizing sampling frames,
an |
... |
Additional arguments to pass to |
formula |
A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of |
onlyshow |
A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. |
siteuse |
A character vector indicating the design sites
for which summaries are requested in |
If the left-hand side of the formula is empty, a named list containing summaries of the count distribution for each right-hand side variable is returned. If the left-hand side of the formula contains a variable, a named list containing five number summaries (numeric left-hand side) or tables (categorical or factor left hand side) is returned for each right-hand side variable.
Michael Dumelle [email protected]
## Not run: data("NE_Lakes") sp_summary(NE_Lakes, ELEV ~ 1) sp_summary(NE_Lakes, ~ ELEV_CAT * AREA_CAT) sample <- grts(NE_Lakes, 100) sp_summary(sample, ~ ELEV_CAT * AREA_CAT) ## End(Not run)
## Not run: data("NE_Lakes") sp_summary(NE_Lakes, ELEV ~ 1) sp_summary(NE_Lakes, ~ ELEV_CAT * AREA_CAT) sample <- grts(NE_Lakes, 100) sp_summary(sample, ~ ELEV_CAT * AREA_CAT) ## End(Not run)
This function prints the error messages vector in the grts
and irs
functions.
stopprnt(stop_df = get("stop_df", envir = .GlobalEnv), m = 1:nrow(stop_df))
stopprnt(stop_df = get("stop_df", envir = .GlobalEnv), m = 1:nrow(stop_df))
stop_df |
Data frame that contains stop messages. The default is
|
m |
Vector of indices for stop messages that are to be printed. The
default is a vector containing the integers from 1 through the number of
rows in |
Printed errors
Tony Olsen [email protected]
summary()
summarizes sampling frames, design sites, and analysis data. The right-hand of the
formula specifies the variables (or factors) to
summarize by. If the left-hand side of the formula is empty, the
summary will be of the distributions of the right-hand side variables. If the left-hand side
of the formula contains a variable, the summary will be of the left-hand size variable
for each level of each right-hand side variable. Equivalent to sp_summary()
; both
are currently maintained for backwards compatibility.
## S3 method for class 'sp_frame' summary(object, formula = ~1, onlyshow = NULL, ...) ## S3 method for class 'sp_design' summary(object, formula = ~siteuse, siteuse = NULL, onlyshow = NULL, ...)
## S3 method for class 'sp_frame' summary(object, formula = ~1, onlyshow = NULL, ...) ## S3 method for class 'sp_design' summary(object, formula = ~siteuse, siteuse = NULL, onlyshow = NULL, ...)
object |
An object to summarize. When summarizing sampling frames,
an |
formula |
A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of |
onlyshow |
A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. |
... |
Additional arguments to pass to |
siteuse |
A character vector indicating the design sites
for which summaries are requested in |
If the left-hand side of the formula is empty, a named list containing summaries of the count distribution for each right-hand side variable is returned. If the left-hand side of the formula contains a variable, a named list containing five number summaries (numeric left-hand side) or tables (categorical or factor left hand side) is returned for each right-hand side variable.
Michael Dumelle [email protected]
## Not run: data("NE_Lakes") summary(NE_Lakes, ELEV ~ 1) summary(NE_Lakes, ~ ELEV_CAT * AREA_CAT) sample <- grts(NE_Lakes, 100) summary(sample, ~ ELEV_CAT * AREA_CAT) ## End(Not run)
## Not run: data("NE_Lakes") summary(NE_Lakes, ELEV ~ 1) summary(NE_Lakes, ~ ELEV_CAT * AREA_CAT) sample <- grts(NE_Lakes, 100) summary(sample, ~ ELEV_CAT * AREA_CAT) ## End(Not run)
This function organizes input and output for estimation of trend across time
for a series of samples (for categorical and continuous variables). Trend is estimated using the
analytical procedure identified by the model arguments. For categorical
variables, the choices for the model_cat
argument are: (1) simple linear
regression, (2) weighted linear regression, and (3) generalized linear
mixed-effects model. For continuous variables, the choices for the
model_cont
argument are: (1) simple linear regression, (2) weighted
linear regression, and (3) linear mixed-effects model. The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
trend_analysis( dframe, vars_cat = NULL, vars_cont = NULL, subpops = NULL, model_cat = "SLR", cat_rhs = NULL, model_cont = "LMM", cont_rhs = NULL, siteID = "siteID", yearID = "year", weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, invprboot = TRUE, nboot = 1000, vartype = "Local", jointprob = "overton", conf = 95, All_Sites = FALSE )
trend_analysis( dframe, vars_cat = NULL, vars_cont = NULL, subpops = NULL, model_cat = "SLR", cat_rhs = NULL, model_cont = "LMM", cont_rhs = NULL, siteID = "siteID", yearID = "year", weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL, clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL, sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL, popsize = NULL, invprboot = TRUE, nboot = 1000, vartype = "Local", jointprob = "overton", conf = 95, All_Sites = FALSE )
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_cat |
Vector composed of character values that identify the names
of categorical response variables in |
vars_cont |
Vector composed of character values that identify the
names of continuous response variables in |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
model_cat |
Character value identifying the analytical procedure used
for trend estimation for categorical variables. The choices are:
|
cat_rhs |
Character value specifying the right hand side of the formula
for a generalized linear mixed-effects model. If a value is not provided,
the argument is assigned a value that specifies the Piepho and Ogutu (2002)
model. The default value is |
model_cont |
Character value identifying the analytical procedure used
for trend estimation for continuous variables. The choices are:
|
cont_rhs |
Character value specifying the right hand side of the
formula for a linear mixed-effects model. If a value is not provided, the
argument is assigned a value that specifies the Piepho and Ogutu (2002)
model. The default value is |
siteID |
Character value providing name of the site ID variable in
|
yearID |
Character value providing name of the time period variable in
|
weight |
Character value providing name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing name of the stratum ID variable in
|
clusterID |
Character value providing name of the cluster (stage one) ID
variable in |
weight1 |
Character value providing name of the stage one weight
variable in |
xcoord1 |
Character value providing name of the stage one x-coordinate
variable in |
ycoord1 |
Character value providing name of the stage one y-coordinate
variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing name of the size weight variable in
|
sweight1 |
Character value providing name of the stage one size weight
variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
invprboot |
Logical value that indicates whether the inverse probability
bootstrap procedure is used to calculate trend parameter estimates. This
bootstrap procedure is only available for the "LMM" option for continuous
variables. Inverse probability references the design weights, which
are the inverse of the sample inclusion probabilities. The default value
is |
nboot |
Numeric value for the number of bootstrap iterations. The
default is |
vartype |
Character value providing choice of the variance estimator,
where |
jointprob |
Character value providing choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
conf |
Numeric value for the Gaussian-based confidence level. The default is
|
All_Sites |
A logical variable used when |
The analysis results. A list composed of two data frames containing trend estimates for all combinations of population Types, subpopulations within Types, and response variables. For categorical variables, trend estimates are calculated for each category of the variable. The two data frames in the output list are:
catsum
data frame containing trend estimates for categorical variables
contsum
data frame containing trend estimates for continuous variables
For the SLR and WLR model options, the data frame contains the following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
trend estimate
trend standard error
trend xx% (default 95%) lower confidence bound
trend xx% (default 95%) upper confidence bound
trend p-value
intercept estimate
intercept standard error
intercept xx% (default 95%) lower confidence bound
intercept xx% (default 95%) upper confidence bound
intercept p-value
R-squared value
adjusted R-squared value
For the GLMM and LMM model options, contents of the data frames will vary
depending on the model specified by arguments cat_rhs
and
cont_rhs
. For the default PO model, the data frame contains the
following variables:
subpopulation (domain) name
subpopulation name within a domain
response variable
trend estimate
trend standard error
trend xx% (default 95%) lower confidence bound
trend xx% (default 95%) upper confidence bound
trend p-value
intercept estimate
intercept standard error
intercept xx% (default 95%) lower confidence bound
intercept xx% (default 95%) upper confidence bound
intercept p-value
variance of the site intercepts
variance of the site trends
correlation of site intercepts and site trends
year variance
residual variance
generalized Akaike Information Criterion
For the simple linear regression (SLR) model, a design-based estimate of the
category proportion (categorical variables) or the mean (continuous
variables) is calculated for each time period (year). Four choices of
variance estimator are available for calculating variance of the design-based
estimates: (1) the local mean estimator, (2) the simple random sampling
estimator, (3) the Horvitz-Thompson estimator, and (4) the Yates-Grundy
estimator. For the Horvitz-Thompson and Yates-Grundy estimators, there are
three choices for calculating joint inclusion probabilities: (1) the Overton
approximation, (2) the Hartley-Rao approximation, and (3) the Brewer
approximation. The lm
function in the stats package is used to fit a
linear model using a formula
argument that specifies the proportion or
mean estimates as the response variable and years as the regressor variable.
For fitting the SLR model, the yearID
variable from the dframe
argument is modified by subtracting the minimum value of years from all
values of the variable. Parameter estimates are extracted from the object
returned by the lm
function. For the weighted linear regression (WLR)
model, the process is the same as the SLR model except that the inverse of
the variances of the proportion or mean estimates is used as the
weights
argument in the call to the lm
function. For the LMM
option, the lmer
function in the lme4 package is used to fit a linear
mixed-effects model for trend across years. For both the GLMM and LMM
options, the default Piepho and Ogutu (PO) model includes fixed effects for
intercept and trend (slope) and random effects for intercept and trend for
individual sites, where the siteID
variable from the dframe
argument identifies sites. Correlation between the random effects for site
intercepts and site trends is included in the model. Finally, the PO model
contains random effects for year variance and residual variance. For the GLMM
and LMM options, arguments cat_rhs
and cont_rhs
, respectively,
can be used to specify the right hand side of the model formula. Internally,
a variable named Wyear
is created that is useful for specifying the
cat_rhs
and cont_rhs
arguments. The Wyear
variable is
created by subtracting the minimum value of the yearID
variable from
all values of the variable. If argument invprboot
is FALSE
,
parameter estimates are extracted from the object returned by the lmer
function. If argument invprboot
is TRUE
, the boot
function in the boot package is used to generate bootstrap replicates using a
function named bootfcn
as the statistic
argument passed to the
boot
function. For each bootstrap replicate, bootfcn
calls the
glmer
or lmer
function, as appropriate, using the specified
model. design weights identified by the weight
argument for
the trend_analysis
function are passed as the weights
argument
for the boot
function, which specifies importance weights. Using the
design weights as the weights
argument ensures that bootstrap
replicates are representative of the survey population. Parameter estimates
are calculated using the object returned by the boot
function.
Tom Kincaid [email protected]
change_analysis
for change analysis
# Example using a categorical variable with three resource classes and a # continuous variable mydframe <- data.frame( siteID = rep(paste0("Site", 1:40), rep(5, 40)), yearID = rep(seq(2000, 2020, by = 5), 40), wgt = rep(runif(40, 10, 100), rep(5, 40)), xcoord = rep(runif(40), rep(5, 40)), ycoord = rep(runif(40), rep(5, 40)), All_Sites = rep("All Sites", 200), Region = sample(c("North", "South"), 200, replace = TRUE), Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE), ContVar = rnorm(200, 10, 1) ) myvars_cat <- c("Resource_Class") myvars_cont <- c("ContVar") mysubpops <- c("All_Sites", "Region") trend_analysis( dframe = mydframe, vars_cat = myvars_cat, vars_cont = myvars_cont, subpops = mysubpops, model_cat = "WLR", model_cont = "SLR", siteID = "siteID", yearID = "yearID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord" )
# Example using a categorical variable with three resource classes and a # continuous variable mydframe <- data.frame( siteID = rep(paste0("Site", 1:40), rep(5, 40)), yearID = rep(seq(2000, 2020, by = 5), 40), wgt = rep(runif(40, 10, 100), rep(5, 40)), xcoord = rep(runif(40), rep(5, 40)), ycoord = rep(runif(40), rep(5, 40)), All_Sites = rep("All Sites", 200), Region = sample(c("North", "South"), 200, replace = TRUE), Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE), ContVar = rnorm(200, 10, 1) ) myvars_cat <- c("Resource_Class") myvars_cont <- c("ContVar") mysubpops <- c("All_Sites", "Region") trend_analysis( dframe = mydframe, vars_cat = myvars_cat, vars_cont = myvars_cont, subpops = mysubpops, model_cat = "WLR", model_cont = "SLR", siteID = "siteID", yearID = "yearID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord" )
This function prints the warnings messages from the grts()
, irs()
,
and analysis functions.
warnprnt(warn_df = get("warn_df", envir = .GlobalEnv), m = 1:nrow(warn_df))
warnprnt(warn_df = get("warn_df", envir = .GlobalEnv), m = 1:nrow(warn_df))
warn_df |
Data frame that contains warning messages. The default is
|
m |
Vector of indices for warning messages that are to be printed. The
default is a vector containing the integers from 1 through the number of
rows in |
Printed warnings.
Tom Kincaid [email protected]