Start Here

Installing and loading spsurvey

If this is your first time using the spsurvey package, run

install.packages("spsurvey")

to install the package. You only need to run this code once per version of R.

After the spsurvey package is installed, load it into R each new R session by running

library(spsurvey)

Citation information

If you used spsurvey in your work, please cite it. You can view the most recent citation by running

citation("spsurvey")
#> To cite spsurvey in publications use:
#> 
#>   Michael Dumelle, Tom Kincaid, Anthony R. Olsen, Marc Weber (2023).
#>   spsurvey: Spatial Sampling Design and Analysis in R. Journal of
#>   Statistical Software, 105(3), 1-29. doi:10.18637/jss.v105.i03
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {{spsurvey}: Spatial Sampling Design and Analysis in {R}},
#>     author = {Michael Dumelle and Tom Kincaid and Anthony R. Olsen and Marc Weber},
#>     journal = {Journal of Statistical Software},
#>     year = {2023},
#>     volume = {105},
#>     number = {3},
#>     pages = {1--29},
#>     doi = {10.18637/jss.v105.i03},
#>   }

spsurvey terminology

spsurvey implements a design-based approach to statistical inference, with a focus on spatial data. There are a few terms helpful to define before we move forward with spsurvey, as these terms will be used throughout the vignettes and documentation:

  • Survey design: All aspects of a survey from establishment of a need for data to the production of final results.
  • Sampling frame: The set of all sites (i.e. units) from which a sample is selected.
  • Sampling design: The process by which sites are selected from the sampling frame.
  • Design sites: The set of sites selected from the sampling frame according to the sampling design. Sometimes the term “sample” is used to refer to a collection of design sites (e.g. the term “GRTS sample” more formally means “the collection of design sites selected using the GRTS algorithm according to the sampling design”).
  • Analysis data: Data collected at the design sites. This includes information from the survey design such as design weights, stratification variables, subpopulation variables, etc.
  • Analysis results: The results from an appropriate statistical analysis of the analysis data.

Vignettes in spsurvey

There are three additional vignettes in spsurvey:

  1. Exploratory Data Analysis: Summarizing and Visualizing Sampling Frames, Design Sites, and Analysis Data
    • Use the summary() and plot() functions to summarize and visualize sampling frames, design sites, and analysis data
    • To view this vignette, run vignette("EDA", "spsurvey")
  2. Spatially Balanced Sampling
    • Use the grts() function to implement the Generalized Random Tessellation Stratified (GRTS) algorithm (Stevens and Olsen, 2004) to select spatially balanced samples
    • To view this vignette, run vignette("sampling", "spsurvey").
  3. Analyzing Data
    • Use several analysis functions to analyze data
    • To view this vignette, run vignette("analysis", "spsurvey").

These vignettes cover some of the core functions (and arguments within those functions) in spsurvey. To learn more about features of spsurvey that are not covered in these vignettes, we encourage you to read spsurvey’s documentation available for download here or viewable interactively on our website here. Help files for a particular function are viewable by running ?function_name after loading spsurvey. For example, to learn more about the grts() function, run ?grts.

Installing a previous version of spsurvey

The version 5.0.0 update to spsurvey implemented many significant changes to existing functions. As a result, some of your old code may not run properly while using version 5.0.0. Though we recommend adapting your code to work with the version 5.0.0, you may also install a previous version of spsurvey. For information regarding the installation of previous version of R packages, please see the RStudio support page here. Additionally, old versions of spsurvey are also available for download in the release tags section of our GitHub repository here.

sf objects

The sampling functions in spsurvey (grts() and irs()) require that your sampling frame is an sf object. An sf object (shorthand for a “simple features” object) is an R object with a unique structure used to conveniently store spatial data. sf objects are constructed using the sf package (Pebesma, 2018). The sf package is loaded and installed alongside the spsurvey package, so you do not need to run install.packages("sf") or library(sf) to access the sf package if spsurvey is already installed and loaded. For more on the sf package, see here.

Next we discuss a few ways to construct sf objects in R. The first is to read a shapefile directly into R using sf::read_sf(). The second is to use the sf::st_sf() function or the sf::st_as_sf() function to combine an appropriate R object (most commonly a data frame) and an appropriate geometry object into an sf object. To illustrate one approach for turning a data frame into an sf object, we start with NE_Lakes_df, a data frame in spsurvey that contains variables and geographic coordinates (latitude and longitude coordinates) for lakes in the Northeastern United States. To turn NE_Lakes_df into NE_Lakes_geo, an sf object with geographic coordinates, run

NE_Lakes_geo <- st_as_sf(NE_Lakes_df, coords = c("XCOORD", "YCOORD"), crs = 4326)
NE_Lakes_geo
#> Simple feature collection with 195 features and 4 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -73.64778 ymin: 41.07065 xmax: -69.96715 ymax: 42.73616
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>         AREA AREA_CAT   ELEV ELEV_CAT                   geometry
#> 1  10.648825    large 264.69     high POINT (-72.08896 42.55508)
#> 2   2.504606    small 557.63     high POINT (-73.18199 42.36727)
#> 3   3.979199    small  28.79      low POINT (-71.14074 42.15596)
#> 4   1.645657    small 212.60     high   POINT (-73.06726 41.783)
#> 5   7.489052    small 239.67     high  POINT (-72.2602 42.36255)
#> 6  86.533725    large 195.37     high POINT (-71.74634 41.87624)
#> 7   1.926996    small 158.96     high POINT (-73.48408 41.34238)
#> 8   6.514217    small  29.26      low POINT (-73.25487 41.20551)
#> 9   3.100221    small 204.62     high POINT (-72.20897 42.12512)
#> 10  1.868094    small  78.77      low POINT (-72.70233 42.18012)

The coords argument to sf::st_as_sf specifies the columns in NE_Lakes_df that are the x-coordinates and y-coordinates. The crs argument specifies the coordinate reference system, which we discuss in more detail next.

Coordinate reference systems

Spatial data and sf objects rely on coordinate reference systems. A coordinate reference system (CRS) provides a structure by which to identify unique locations on the Earth’s surface. CRSs are either geographic or projected. A geographic CRS uses longitude (east-west direction) and latitude (north-south direction) coordinates to represent location with respect to a specific ellipsoid or spheroid surface. Geographic CRSs are measured in degrees, not units like meters or feet – this has important consequences. For example, a one degree difference in latitude is different at different longitudes. Projected CRSs are measured in standard Cartesian coordinates with respect to a flat surface. They have x and y locations, an origin, and a unit of measurement (like meters or feet).

You can move between coordinate systems using sf::st_transform(). For example, we can transform NE_Lakes_geo (which uses a geographic CRS) to NE_Lakes (which uses a projected CRS) by running

NE_Lakes <- st_transform(NE_Lakes_geo, crs = 5070)
NE_Lakes
#> Simple feature collection with 195 features and 4 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 1834001 ymin: 2225021 xmax: 2127632 ymax: 2449985
#> Projected CRS: NAD83 / Conus Albers
#> First 10 features:
#>         AREA AREA_CAT   ELEV ELEV_CAT                geometry
#> 1  10.648825    large 264.69     high POINT (1930929 2417191)
#> 2   2.504606    small 557.63     high POINT (1849399 2375085)
#> 3   3.979199    small  28.79      low POINT (2017323 2393723)
#> 4   1.645657    small 212.60     high POINT (1874135 2313865)
#> 5   7.489052    small 239.67     high POINT (1922712 2392868)
#> 6  86.533725    large 195.37     high POINT (1977163 2350744)
#> 7   1.926996    small 158.96     high POINT (1852292 2257784)
#> 8   6.514217    small  29.26      low POINT (1874421 2247388)
#> 9   3.100221    small 204.62     high POINT (1933352 2368181)
#> 10  1.868094    small  78.77      low POINT (1892582 2364213)

CRSs in R have traditionally been stored using EPSG codes or proj4string values. This meant that in order to transform your coordinates from one CRS to another, you needed two EPSG codes or proj4string values, one for each CRS. Recent updates to R’s handling of spatial data follow GDAL and PROJ (more information available here), and CRSs in sf objects are stored in R as lists with two components: input, which contains information regarding the EPSG code and proj4string; and wkt, an open geospatial standard format. For more information on CRSs and EPSG codes, see Pebesma (2018) and Lovelace et al. (2019). To search for various CRSs and EPSG codes, see here and here.

spsurvey will use the CRS from your sf object, so it is your responsibility to make sure the sf object has an appropriate CRS. If the CRS is not specified correctly, you may get misleading results.

References

Lovelace, R., Nowosad, J., & Muenchow, J. (2019). Geocomputation with R. CRC Press.

Pebesma, E., (2018). Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal, 10 (1):439-446. https://doi.org/10.32614/RJ-2018-009

Stevens Jr, D. L. and Olsen, A. R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465):262-278.