Analyzing Data

If you have yet not read the “Start Here” vignette, please do so by running

vignette("start-here", "spsurvey")

Introduction

spsurvey’s analysis functions are used to analyze data in a variety of contexts. We focus mainly on using the NLA_PNW analysis data to introduce some of these analysis functions. The NLA_PNW data contains response variables measured at several lakes in the Pacific Northwest Region (PNW) of the United States. There are several variables in NLA_PNW you will use throughout this vignette:

  • SITE_ID: a unique site identifier
  • WEIGHT: the design weights
  • URBAN: urban land categories (Urban and Non-Urban)
  • STATE: state name (California, Oregon, and Washington)
  • BMMI: benthic macroinvertebrate multi-metric index
  • BMMI_COND: benthic macroinvertebrate multi-metric index condition categories (Poor and Good)
  • PHOS_COND: phosphorous condition categories (Poor and Good)
  • NITR_COND: nitrogen condition categories (Poor Fair, and Good)

Before proceeding, we load spsurvey by running

library(spsurvey)

Categorical Variable Analysis

Categorical variables are analyzed in spsurvey using the cat_analysis() function. The cat_analysis() function estimates the proportion of observations and the total units (i.e. extent) that belong to each level of the categorical variable (total units refer to the total number (point resources), total line length (linear network), or total area (areal network)). Several useful pieces of information are returned by cat_analysis(), including estimates, standard errors, margins of error, and confidence intervals. The analysis results contain columns with a .P and .U suffixes. The .P suffix corresponds to estimates of proportions for each category, while the .U suffix corresponds to estimates of total units (i.e. extent) for each category.

Unstratified Analysis

To estimate the proportion of total lakes in each nitrogen condition category and the total number of lakes in each nitrogen condition category, run

cat_ests <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT"
)
cat_ests
#>        Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1 All_Sites     All Sites NITR_COND     Fair    24   23.69392   6.194024
#> 2 All_Sites     All Sites NITR_COND     Good    38   51.35111   7.430172
#> 3 All_Sites     All Sites NITR_COND     Poor    34   24.95496   5.919180
#> 4 All_Sites     All Sites NITR_COND    Total    96  100.00000   0.000000
#>   MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1        12.14006   11.55386   35.83399   2530.428   693.5593        1359.351
#> 2        14.56287   36.78824   65.91398   5484.120  1223.3708        2397.763
#> 3        11.60138   13.35359   36.55634   2665.103   658.0966        1289.846
#> 4         0.00000  100.00000  100.00000  10679.652  1416.2707        2775.840
#>   LCB95Pct.U UCB95Pct.U
#> 1   1171.077   3889.780
#> 2   3086.357   7881.883
#> 3   1375.258   3954.949
#> 4   7903.812  13455.491

The estimate of the proportion of lakes in Good condition is 51.35% with a 95% confidence interval of (36.8%, 65.9%), while the estimate of the total number of lakes in Good condition is 5484 lakes with a 95% confidence interval of (3086, 7882). In each case, the estimated standard error and margin of error is given. The confidence level can be changed using the conf argument. If more than one categorical variable is of interest, then vars can be a vector of variables and separate analyses are performed for each variable.

Sometimes the goal is to estimate proportions and totals separately for different subsets of the population – these subsets are called subpopulations. To estimate the proportion of total lakes and in each nitrogen condition category the total number of lakes in each nitrogen condition category separately for California, Oregon, and Washington lakes, run

cat_ests_sp <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT",
  subpop = "STATE"
)
cat_ests_sp
#>     Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1  STATE    California NITR_COND     Fair     6   8.239162   4.712515
#> 2  STATE    California NITR_COND     Good     8  73.722638  13.359879
#> 3  STATE    California NITR_COND     Poor     5  18.038200  11.909638
#> 4  STATE    California NITR_COND    Total    19 100.000000   0.000000
#> 5  STATE        Oregon NITR_COND     Fair     8  27.152211   9.763817
#> 6  STATE        Oregon NITR_COND     Good    26  59.670307   9.717093
#> 7  STATE        Oregon NITR_COND     Poor    13  13.177483   4.901892
#> 8  STATE        Oregon NITR_COND    Total    47 100.000000   0.000000
#> 9  STATE    Washington NITR_COND     Fair    10  30.396139  11.938582
#> 10 STATE    Washington NITR_COND     Good     4  22.711979  11.878964
#> 11 STATE    Washington NITR_COND     Poor    16  46.891882  13.041148
#> 12 STATE    Washington NITR_COND    Total    30 100.000000   0.000000
#>    MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1         9.236359   0.000000   17.47552   208.4605   87.63415        171.7598
#> 2        26.184881  47.537757   99.90752  1865.2692  940.12797       1842.6170
#> 3        23.342462   0.000000   41.38066   456.3876  299.29104        586.5997
#> 4         0.000000 100.000000  100.00000  2530.1172  978.65831       1918.1350
#> 5        19.136729   8.015482   46.28894  1298.8470  526.66732       1032.2490
#> 6        19.045152  40.625155   78.71546  2854.3752  674.02641       1321.0675
#> 7         9.607532   3.569950   22.78501   630.3551  198.49966        389.0522
#> 8         0.000000 100.000000  100.00000  4783.5773  706.53216       1384.7776
#> 9        23.399190   6.996949   53.79533  1023.1210  437.69351        857.8635
#> 10       23.282341   0.000000   45.99432   764.4755  453.48899        888.8221
#> 11       25.560181  21.331701   72.45206  1578.3606  556.63044       1090.9756
#> 12        0.000000 100.000000  100.00000  3365.9571  741.89397       1454.0855
#>    LCB95Pct.U UCB95Pct.U
#> 1    36.70069   380.2202
#> 2    22.65222  3707.8861
#> 3     0.00000  1042.9873
#> 4   611.98220  4448.2523
#> 5   266.59801  2331.0960
#> 6  1533.30774  4175.4427
#> 7   241.30288  1019.4072
#> 8  3398.79970  6168.3549
#> 9   165.25751  1880.9845
#> 10    0.00000  1653.2976
#> 11  487.38500  2669.3362
#> 12 1911.87165  4820.0426

If more than one type of subpopulation is of interest, then subpop can be a vector of subpopulation variables and separate analyses are performed for each subpopulation. If both vars and subpops are vectors, separate analyses are performed for each variable and subpopulation combination.

Analysis results for all sites (ignoring subpopulations) can be presented alongside the subpopulation analysis results using the All_Sites argument:

cat_ests_sp <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT",
  subpop = "STATE",
  All_Sites = TRUE
)
cat_ests_sp
#>         Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1      STATE    California NITR_COND     Fair     6   8.239162   4.712515
#> 2      STATE    California NITR_COND     Good     8  73.722638  13.359879
#> 3      STATE    California NITR_COND     Poor     5  18.038200  11.909638
#> 4      STATE    California NITR_COND    Total    19 100.000000   0.000000
#> 5      STATE        Oregon NITR_COND     Fair     8  27.152211   9.763817
#> 6      STATE        Oregon NITR_COND     Good    26  59.670307   9.717093
#> 7      STATE        Oregon NITR_COND     Poor    13  13.177483   4.901892
#> 8      STATE        Oregon NITR_COND    Total    47 100.000000   0.000000
#> 9      STATE    Washington NITR_COND     Fair    10  30.396139  11.938582
#> 10     STATE    Washington NITR_COND     Good     4  22.711979  11.878964
#> 11     STATE    Washington NITR_COND     Poor    16  46.891882  13.041148
#> 12     STATE    Washington NITR_COND    Total    30 100.000000   0.000000
#> 13 All_Sites     All Sites NITR_COND     Fair    24  23.693923   6.194024
#> 14 All_Sites     All Sites NITR_COND     Good    38  51.351112   7.430172
#> 15 All_Sites     All Sites NITR_COND     Poor    34  24.954965   5.919180
#> 16 All_Sites     All Sites NITR_COND    Total    96 100.000000   0.000000
#>    MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1         9.236359   0.000000   17.47552   208.4605   87.63415        171.7598
#> 2        26.184881  47.537757   99.90752  1865.2692  940.12797       1842.6170
#> 3        23.342462   0.000000   41.38066   456.3876  299.29104        586.5997
#> 4         0.000000 100.000000  100.00000  2530.1172  978.65831       1918.1350
#> 5        19.136729   8.015482   46.28894  1298.8470  526.66732       1032.2490
#> 6        19.045152  40.625155   78.71546  2854.3752  674.02641       1321.0675
#> 7         9.607532   3.569950   22.78501   630.3551  198.49966        389.0522
#> 8         0.000000 100.000000  100.00000  4783.5773  706.53216       1384.7776
#> 9        23.399190   6.996949   53.79533  1023.1210  437.69351        857.8635
#> 10       23.282341   0.000000   45.99432   764.4755  453.48899        888.8221
#> 11       25.560181  21.331701   72.45206  1578.3606  556.63044       1090.9756
#> 12        0.000000 100.000000  100.00000  3365.9571  741.89397       1454.0855
#> 13       12.140064  11.553860   35.83399  2530.4285  693.55933       1359.3513
#> 14       14.562870  36.788242   65.91398  5484.1199 1223.37078       2397.7627
#> 15       11.601379  13.353585   36.55634  2665.1033  658.09658       1289.8456
#> 16        0.000000 100.000000  100.00000 10679.6517 1416.27075       2775.8397
#>    LCB95Pct.U UCB95Pct.U
#> 1    36.70069   380.2202
#> 2    22.65222  3707.8861
#> 3     0.00000  1042.9873
#> 4   611.98220  4448.2523
#> 5   266.59801  2331.0960
#> 6  1533.30774  4175.4427
#> 7   241.30288  1019.4072
#> 8  3398.79970  6168.3549
#> 9   165.25751  1880.9845
#> 10    0.00000  1653.2976
#> 11  487.38500  2669.3362
#> 12 1911.87165  4820.0426
#> 13 1171.07716  3889.7798
#> 14 3086.35722  7881.8826
#> 15 1375.25770  3954.9489
#> 16 7903.81199 13455.4913

Stratified Analysis

To estimate the proportion of total lakes in each nitrogen condition category and the total number of lakes in each nitrogen condition category stratified by URBAN category (whether the lake is classified as Urban or Non-Urban), run

strat_cat_ests <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT",
  stratumID = "URBAN"
)
strat_cat_ests
#>        Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1 All_Sites     All Sites NITR_COND     Fair    24   23.69392   6.027083
#> 2 All_Sites     All Sites NITR_COND     Good    38   51.35111   7.472377
#> 3 All_Sites     All Sites NITR_COND     Poor    34   24.95496   5.882487
#> 4 All_Sites     All Sites NITR_COND    Total    96  100.00000   0.000000
#>   MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1        11.81287   11.88106   35.50679   2530.428   683.8837        1340.387
#> 2        14.64559   36.70552   65.99670   5484.120  1229.9485        2410.655
#> 3        11.52946   13.42550   36.48443   2665.103   653.6357        1281.102
#> 4         0.00000  100.00000  100.00000  10679.652  1440.4796        2823.288
#>   LCB95Pct.U UCB95Pct.U
#> 1   1190.041   3870.816
#> 2   3073.465   7894.775
#> 3   1384.001   3946.206
#> 4   7856.363  13502.940

To then compute these estimates separately for California, Oregon, and Washington, run

strat_cat_ests_sp <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT",
  stratumID = "URBAN",
  subpop = "STATE"
)
strat_cat_ests_sp
#>     Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1  STATE    California NITR_COND     Fair     6   8.239162   5.481100
#> 2  STATE    California NITR_COND     Good     8  73.722638  13.653267
#> 3  STATE    California NITR_COND     Poor     5  18.038200  12.512366
#> 4  STATE    California NITR_COND    Total    19 100.000000   0.000000
#> 5  STATE        Oregon NITR_COND     Fair     8  27.152211   9.592356
#> 6  STATE        Oregon NITR_COND     Good    26  59.670307   9.845022
#> 7  STATE        Oregon NITR_COND     Poor    13  13.177483   5.325784
#> 8  STATE        Oregon NITR_COND    Total    47 100.000000   0.000000
#> 9  STATE    Washington NITR_COND     Fair    10  30.396139  11.383474
#> 10 STATE    Washington NITR_COND     Good     4  22.711979   9.039351
#> 11 STATE    Washington NITR_COND     Poor    16  46.891882  12.109215
#> 12 STATE    Washington NITR_COND    Total    30 100.000000   0.000000
#>    MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1         10.74276   0.000000   18.98192   208.4605   87.91849        172.3171
#> 2         26.75991  46.962726  100.00000  1865.2692  953.91452       1869.6381
#> 3         24.52379   0.000000   42.56199   456.3876  306.37270        600.4795
#> 4          0.00000 100.000000  100.00000  2530.1172  994.94678       1950.0599
#> 5         18.80067   8.351538   45.95288  1298.8470  531.02949       1040.7987
#> 6         19.29589  40.374417   78.96620  2854.3752  685.86488       1344.2705
#> 7         10.43835   2.739137   23.61583   630.3551  177.79364        348.4691
#> 8          0.00000 100.000000  100.00000  4783.5773  737.53853       1445.5490
#> 9         22.31120   8.084941   52.70734  1023.1210  435.45383        853.4738
#> 10        17.71680   4.995177   40.42878   764.4755  433.94100        850.5087
#> 11        23.73363  23.158256   70.62551  1578.3606  554.91047       1087.6045
#> 12         0.00000 100.000000  100.00000  3365.9571  748.29764       1466.6364
#>    LCB95Pct.U UCB95Pct.U
#> 1     36.1434   380.7775
#> 2      0.0000  3734.9073
#> 3      0.0000  1056.8671
#> 4    580.0574  4480.1771
#> 5    258.0483  2339.6457
#> 6   1510.1048  4198.6457
#> 7    281.8859   978.8242
#> 8   3338.0283  6229.1262
#> 9    169.6472  1876.5948
#> 10     0.0000  1614.9842
#> 11   490.7561  2665.9652
#> 12  1899.3207  4832.5935

Continuous Variable Analysis

Continuous variables are analyzed in spsurvey using the cont_analysis() function. The cont_analysis() function estimates cumulative distribution functions (CDFs), percentiles, and means of continuous variables. By default, all these quantities are estimated (though this can be changed using the statsitics argument to cont_analysis()). For the quantities requiring estimation, several useful pieces of information are returned by cont_analysis(), including estimates, standard errors, margins of error, and confidence intervals. The .P suffix corresponds to estimates of proportions for each variable, while the .U suffix corresponds to estimates of total units (i.e. extent) for each variable.

Unstratified Analysis

To estimate the cumulative distribution function (CDF), certain percentiles, and means of BMMI, run

cont_ests <- cont_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "BMMI",
  weight = "WEIGHT"
)

To view the analysis results for the mean estimates, run

cont_ests$Mean
#>        Type Subpopulation Indicator nResp Estimate StdError MarginofError
#> 1 All_Sites     All Sites      BMMI    96 56.50929 1.782278        3.4932
#>   LCB95Pct UCB95Pct
#> 1 53.01609 60.00249

Similarly, the CDF and select percentile estimates can be viewed (the output is omitted here) by running

cont_ests$CDF
cont_ests$Pct

To visualize the CDF estimates, run

plot(cont_ests$CDF)

The solid line indicates the CDF estimates, and the dashed lines indicate lower and upper 95% confidence interval bounds for the CDF estimates. cdf_plot() can equivalently be used in place of plot() (cdf_plot() is currently maintained for backwards compatibility with previous spsurvey versions).

To estimate the CDF, certain percentiles, and means of BMMI separately for each state, run

cont_ests_sp <- cont_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "BMMI",
  weight = "WEIGHT",
  subpop = "STATE"
)

To view the analysis results for the mean estimates, run

cont_ests_sp$Mean
#>    Type Subpopulation Indicator nResp Estimate StdError MarginofError LCB95Pct
#> 1 STATE    California      BMMI    19 50.48964 4.049094      7.936079 42.55357
#> 2 STATE        Oregon      BMMI    47 61.29675 2.581032      5.058730 56.23802
#> 3 STATE    Washington      BMMI    30 54.23036 3.143924      6.161977 48.06838
#>   UCB95Pct
#> 1 58.42572
#> 2 66.35548
#> 3 60.39234

Stratified Analysis

To estimate the CDF, certain percentiles, and means of BMMI for a design stratified by URBAN category, run

strat_cont_ests <- cont_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "BMMI",
  weight = "WEIGHT",
  stratumID = "URBAN"
)

To view the analysis results for the mean estimates, run

strat_cont_ests$Mean
#>        Type Subpopulation Indicator nResp Estimate StdError MarginofError
#> 1 All_Sites     All Sites      BMMI    96 56.50929 1.795959      3.520015
#>   LCB95Pct UCB95Pct
#> 1 52.98928 60.02931

To then compute these estimates separately for each state, run

strat_cont_ests_sp <- cont_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "BMMI",
  weight = "WEIGHT",
  stratumID = "URBAN",
  subpop = "STATE",
)

To view the analysis results for the mean estimates, run

strat_cont_ests_sp$Mean
#>    Type Subpopulation Indicator nResp Estimate StdError MarginofError LCB95Pct
#> 1 STATE    California      BMMI    19 50.48964 4.110046      8.055543 42.43410
#> 2 STATE        Oregon      BMMI    47 61.29675 1.630357      3.195441 58.10131
#> 3 STATE    Washington      BMMI    30 54.23036 2.930517      5.743708 48.48665
#>   UCB95Pct
#> 1 58.54519
#> 2 64.49219
#> 3 59.97407

Additional Analysis Approaches

Alongside the cat_analysis() and cont_analysis() functions, spsurvey offers functions for estimating attributable risk, relative risk, risk difference, change, and trend. Attributable risk analysis, relative risk analysis, and risk difference analysis quantify the attributable risk, relative risk, and risk difference, respectively, of environmental resources being in poor condition after exposure to a stressor. Attributable risk analysis is performed using the attrisk_analysis() function, relative risk analysis is performed using the relrisk_analysis() function, and risk difference analysis is performed using the diffrisk_analysis() function. Change and trend analysis capture the behavior of environmental resources between two samples, while trend analysis generalizes this approach to include more than two samples. Often, change and trend analyses are performed to study an environmental resource through time. Change analysis is performed using the change_analysis() function, and trend analysis is performed using the trend_analysis() function. The attrisk_analysis(), relrisk_analysis(), diffrisk_analysis(), change_analysis() and trend_analysis() functions all share very similar syntax with the cat_analysis() and cont_analysis() functions.

Attributable Risk, Relative Risk, and Risk Difference Analysis

The attributable risk is defined as $$1 - \frac{P(Response = Poor | Stressor = Good)}{P(Response = Poor)},$$ where P(⋅) is a probability and P(⋅|⋅) is a conditional probability. The attributable risk measures the proportion of the response variable in poor condition that could be eliminated if the stressor was always in good condition. To estimate the attributable risk of benthic macroinvertebrates with a phosphorous condition stressor, run

attrisk_ests <- attrisk_analysis(
  NLA_PNW, 
  siteID = "SITE_ID",
  vars_response = "BMMI_COND",
  vars_stressor = "PHOS_COND",
  weight = "WEIGHT"
)
attrisk_ests
#>        Type Subpopulation  Response  Stressor nResp  Estimate StdError_log
#> 1 All_Sites     All Sites BMMI_COND PHOS_COND    96 0.6201042     0.624808
#>   MarginofError_log  LCB95Pct  UCB95Pct WeightTotal Count_RespPoor_StressPoor
#> 1          1.224601 -0.292713 0.8883582    10679.65                         5
#>   Count_RespPoor_StressGood Count_RespGood_StressPoor Count_RespGood_StressGood
#> 1                         7                        18                        66
#>   Prop_RespPoor_StressPoor Prop_RespPoor_StressGood Prop_RespGood_StressPoor
#> 1               0.03971418               0.01738181                 0.158931
#>   Prop_RespGood_StressGood
#> 1                 0.783973

The relative risk is defined as $$\frac{P(Response = Poor | Stressor = Poor)}{P(Response = Poor | Stressor = Good)},$$ which measures the risk of the response variable being in poor condition relative to the stressor’s condition. To estimate the relative risk of benthic macroinvertebrates being in poor condition with a phosphorous condition category stressor, run

relrisk_ests <- relrisk_analysis(
  NLA_PNW, 
  siteID = "SITE_ID",
  vars_response = "BMMI_COND",
  vars_stressor = "PHOS_COND",
  weight = "WEIGHT"
)
relrisk_ests
#>        Type Subpopulation  Response  Stressor nResp Estimate Estimate_num
#> 1 All_Sites     All Sites BMMI_COND PHOS_COND    96 9.217166    0.1999252
#>   Estimate_denom StdError_log MarginofError_log LCB95Pct UCB95Pct WeightTotal
#> 1     0.02169053    0.8753855          1.715724 1.657555 51.25389    10679.65
#>   Count_RespPoor_StressPoor Count_RespPoor_StressGood Count_RespGood_StressPoor
#> 1                         5                         7                        18
#>   Count_RespGood_StressGood Prop_RespPoor_StressPoor Prop_RespPoor_StressGood
#> 1                        66               0.03971418               0.01738181
#>   Prop_RespGood_StressPoor Prop_RespGood_StressGood
#> 1                 0.158931                 0.783973

The risk difference is defined as P(Response = Poor|Stressor = Poor) − P(Response = Poor|Stressor = Good), which measures the risk of the response variable being in poor condition differenced by the stressor’s condition. To estimate the risk difference of benthic macroinvertebrates being in poor condition with a phosphorous condition category stressor, run

diffrisk_ests <- diffrisk_analysis(
  NLA_PNW, 
  siteID = "SITE_ID",
  vars_response = "BMMI_COND",
  vars_stressor = "PHOS_COND",
  weight = "WEIGHT"
)
diffrisk_ests
#>        Type Subpopulation  Response  Stressor nResp  Estimate
#> 1 All_Sites     All Sites BMMI_COND PHOS_COND    96 0.1782347
#>   Estimate_StressPoor Estimate_StressGood  StdError MarginofError   LCB95Pct
#> 1           0.1999252          0.02169053 0.1139557      0.223349 -0.0451143
#>    UCB95Pct WeightTotal Count_RespPoor_StressPoor Count_RespPoor_StressGood
#> 1 0.4015837    10679.65                         5                         7
#>   Count_RespGood_StressPoor Count_RespGood_StressGood Prop_RespPoor_StressPoor
#> 1                        18                        66               0.03971418
#>   Prop_RespPoor_StressGood Prop_RespGood_StressPoor Prop_RespGood_StressGood
#> 1               0.01738181                 0.158931                 0.783973

By default, the levels of the variables in vars_response and vars_stressor are assumed to equal "Poor" (event occurs) or "Good" (event does not occur). If those default levels do not match the levels of the variables in vars_response and vars_stressor, the levels of vars_response and vars_stressor must be explicitly stated using the response_levels and stressor_levels arguments, respectively. Similar to cat_analysis() and cont_analysis() from the previous sections, subpopulations and stratification are incorporated using subpops and stratumID, respectively. For more on attributable and relative risk in an environmental resource context, see Van Sickle and Paulsen (2008).

Change and Trend Analysis

To demonstrate change analysis, we use the NRSA_EPA7 data. There are several variables in NRSA_EPA7 you will use next:

  • SITE_ID: a unique site identifier
  • WEIGHT: the survey design weights
  • NITR_COND: nitrogen condition category (Good, Fair, and Poor)
  • BMMI: benthic macroinvertebrate multi-metric index
  • YEAR: probability sample (survey) year

To estimate the change between samples (time points) for BMMI (a continuous variable) and NITR_COND (a categorical variable), run

change_ests <- change_analysis(
  NRSA_EPA7,
  siteID = "SITE_ID",
  vars_cont = "BMMI",
  vars_cat = "NITR_COND",
  surveyID = "YEAR",
  weight = "WEIGHT"
)

The surveyID argument is the variable in the data distinguishing the different samples (YEAR in the previous example).

To view the analysis results for NITR_COND (the categorical variable), run

change_ests$catsum
#>   Survey_1 Survey_2      Type Subpopulation Indicator     Category  DiffEst.P
#> 1  2008-09  2013-14 All_Sites     All Sites NITR_COND         Fair -1.8867976
#> 2  2008-09  2013-14 All_Sites     All Sites NITR_COND         Good -2.9648182
#> 3  2008-09  2013-14 All_Sites     All Sites NITR_COND Not Assessed -0.2447633
#> 4  2008-09  2013-14 All_Sites     All Sites NITR_COND         Poor  5.0963791
#>   StdError.P MarginofError.P  LCB95Pct.P UCB95Pct.P  DiffEst.U StdError.U
#> 1  3.5366139       6.9316358  -8.8184334  5.0448382 -2919.8839  4990.4714
#> 2  4.7633367       9.3359683 -12.3007865  6.3711502 -4597.9365  6772.4175
#> 3  0.2072643       0.4062305  -0.6509938  0.1614672  -363.1305   306.1656
#> 4  5.8020788      11.3718655  -6.2754864 16.4682446  6598.4548 19777.9218
#>   MarginofError.U LCB95Pct.U UCB95Pct.U nResp_1 Estimate.P_1 StdError.P_1
#> 1       9781.1441 -12701.028   6861.260      37   11.2929433    2.7579622
#> 2      13273.6943 -17871.631   8675.758      22   18.5076549    3.6712147
#> 3        600.0735   -963.204    236.943       1    0.2447633    0.2072643
#> 4      38764.0144 -32165.560  45362.469     119   69.9546385    4.3636869
#>   MarginofError.P_1 LCB95Pct.P_1 UCB95Pct.P_1 Estimate.U_1 StdError.U_1
#> 1         5.4055067     5.887437   16.6984500   16754.1954    3998.9767
#> 2         7.1954486    11.312206   25.7031034   27457.9317    5388.5407
#> 3         0.4062305     0.000000    0.6509938     363.1305     306.1656
#> 4         8.5526691    61.401969   78.5073076  103784.6070   12266.4461
#>   MarginofError.U_1 LCB95Pct.U_1 UCB95Pct.U_1 nResp_2 Estimate.P_2 StdError.P_2
#> 1         7837.8504     8916.345    24592.046      28     9.406146     2.213884
#> 2        10561.3456    16896.586    38019.277      34    15.542837     3.035055
#> 3          600.0735        0.000      963.204       0     0.000000     0.000000
#> 4        24041.7926    79742.814   127826.400     112    75.051018     3.823919
#>   MarginofError.P_2 LCB95Pct.P_2 UCB95Pct.P_2 Estimate.U_2 StdError.U_2
#> 1          4.339133     5.067013     13.74528     13834.31     2985.463
#> 2          5.948599     9.594238     21.49144     22860.00     4102.349
#> 3          0.000000     0.000000      0.00000         0.00        0.000
#> 4          7.494743    67.556274     82.54576    110383.06    15514.525
#>   MarginofError.U_2 LCB95Pct.U_2 UCB95Pct.U_2
#> 1          5851.400     7982.911     19685.71
#> 2          8040.456    14819.539     30900.45
#> 3             0.000        0.000         0.00
#> 4         30407.911    79975.151    140790.97

Estimates are provided for the difference between the two samples and for each of the two individual samples (the _1 and _2 suffixes).

To view the analysis results for BMMI (the continuous variable), run

change_ests$contsum_mean
#>   Survey_1 Survey_2      Type Subpopulation Indicator  DiffEst StdError
#> 1  2008-09  2013-14 All_Sites     All Sites      BMMI 3.971559 2.561155
#>   MarginofError  LCB95Pct UCB95Pct nResp_1 Estimate_1 StdError_1
#> 1      5.019771 -1.048211  8.99133     179   25.88274   1.468744
#>   MarginofError_1 LCB95Pct_1 UCB95Pct_1 nResp_2 Estimate_2 StdError_2
#> 1        2.878686   23.00405   28.76142     174    29.8543   2.098166
#>   MarginofError_2 LCB95Pct_2 UCB95Pct_2
#> 1        4.112331   25.74196   33.96663

Though we don’t show an example here, the median can be estimated using the test argument in change_anlaysis().

Trend analysis generalizes change analysis to more than two samples (usually time points). Though we omit an example here, the arguments to trend_analysis() are very similar to the arguments for change_analysis(). One difference is that trend_analysis() contains arguments that specify which statistical model to apply to the estimates from each sample.

Variance Estimation

The default variance estimator in spsurvey is the local neighborhood variance estimator (Stevens and Olsen, 2003). The local neighborhood variance estimator incorporates the spatial locations of design sites into the variance estimation process. Because the local neighborhood variance estimator incorporates spatial locations, the resulting variance estimate tends to be smaller than variance estimates from approaches ignoring spatial locations. The local neighborhood variance estimator requires x-coordinates and y-coordinates of the design sites. When the analysis data is an sf object, the coordinates from the data’s geometry are used. When the analysis data is just a data frame, x-coordinates and y-coordinates (from an appropriate coordinate reference system) must be provided using the xcoord and ycoord arguments, respectively, of the analysis function being used.

Several additional variance estimation options are available in spsurvey’s analysis functions through the vartype and jointprob arguments.

References

Sickle, J. V., & Paulsen, S. G. (2008). Assessing the attributable risks, relative risks, and regional extents of aquatic stressors. Journal of the North American Benthological Society, 27(4), 920-931.

Stevens Jr, D. L., & Olsen, A. R. (2003). Variance estimation for spatially balanced samples of environmental resources. Environmetrics, 14(6), 593-610.