--- title: "Start Here" author: "Michael Dumelle, Tom Kincaid, Anthony Olsen, and Marc Weber" output: html_document: theme: flatly number_sections: true highlighted: default toc: yes toc_float: collapsed: no smooth_scroll: no toc_depth: 2 vignette: > %\VignetteIndexEntry{Start Here} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) ``` # Installing and loading spsurvey If this is your first time using the spsurvey package, run ```{r, eval = FALSE} install.packages("spsurvey") ``` to install the package. You only need to run this code once per version of R. After the spsurvey package is installed, load it into R each new R session by running ```{r setup} library(spsurvey) ``` # Citation information If you used spsurvey in your work, please cite it. You can view the most recent citation by running ```{r} citation("spsurvey") ``` # spsurvey terminology spsurvey implements a design-based approach to statistical inference, with a focus on spatial data. There are a few terms helpful to define before we move forward with spsurvey, as these terms will be used throughout the vignettes and documentation: * Survey design: All aspects of a survey from establishment of a need for data to the production of final results. * Sampling frame: The set of all sites (i.e. units) from which a sample is selected. * Sampling design: The process by which sites are selected from the sampling frame. * Design sites: The set of sites selected from the sampling frame according to the sampling design. Sometimes the term "sample" is used to refer to a collection of design sites (e.g. the term "GRTS sample" more formally means "the collection of design sites selected using the GRTS algorithm according to the sampling design"). * Analysis data: Data collected at the design sites. This includes information from the survey design such as design weights, stratification variables, subpopulation variables, etc. * Analysis results: The results from an appropriate statistical analysis of the analysis data. # Vignettes in spsurvey There are three additional vignettes in spsurvey: 1. Exploratory Data Analysis: Summarizing and Visualizing Sampling Frames, Design Sites, and Analysis Data * Use the `summary()` and `plot()` functions to summarize and visualize sampling frames, design sites, and analysis data * To view this vignette, run `vignette("EDA", "spsurvey")` 2. Spatially Balanced Sampling * Use the `grts()` function to implement the Generalized Random Tessellation Stratified (GRTS) algorithm (Stevens and Olsen, 2004) to select spatially balanced samples * To view this vignette, run `vignette("sampling", "spsurvey")`. 3. Analyzing Data * Use several analysis functions to analyze data * To view this vignette, run `vignette("analysis", "spsurvey")`. These vignettes cover some of the core functions (and arguments within those functions) in spsurvey. To learn more about features of spsurvey that are not covered in these vignettes, we encourage you to read spsurvey's documentation available for download [here](https://cran.r-project.org/package=spsurvey) or viewable interactively on our website [here](https://usepa.github.io/spsurvey/). Help files for a particular function are viewable by running `?function_name` after loading spsurvey. For example, to learn more about the `grts()` function, run `?grts`. # Installing a previous version of spsurvey The version 5.0.0 update to spsurvey implemented many significant changes to existing functions. As a result, some of your old code may not run properly while using version 5.0.0. Though we recommend adapting your code to work with the version 5.0.0, you may also install a previous version of spsurvey. For information regarding the installation of previous version of R packages, please see the RStudio support page [here](https://support.posit.co/hc/en-us/articles/219949047-Installing-older-versions-of-packages). Additionally, old versions of spsurvey are also available for download in the release tags section of our GitHub repository [here](https://github.com/USEPA/spsurvey/releases). # `sf` objects The sampling functions in spsurvey (`grts()` and `irs()`) require that your sampling frame is an `sf` object. An `sf` object (shorthand for a "simple features" object) is an R object with a unique structure used to conveniently store spatial data. `sf` objects are constructed using the sf package (Pebesma, 2018). The sf package is loaded and installed alongside the spsurvey package, so you do not need to run `install.packages("sf")` or `library(sf)` to access the sf package if spsurvey is already installed and loaded. For more on the sf package, see [here](https://cran.r-project.org/package=sf). Next we discuss a few ways to construct `sf` objects in R. The first is to read a shapefile directly into R using `sf::read_sf()`. The second is to use the `sf::st_sf()` function or the `sf::st_as_sf()` function to combine an appropriate R object (most commonly a data frame) and an appropriate geometry object into an `sf` object. To illustrate one approach for turning a data frame into an `sf` object, we start with `NE_Lakes_df`, a data frame in spsurvey that contains variables and geographic coordinates (latitude and longitude coordinates) for lakes in the Northeastern United States. To turn `NE_Lakes_df` into `NE_Lakes_geo`, an `sf` object with geographic coordinates, run ```{r} NE_Lakes_geo <- st_as_sf(NE_Lakes_df, coords = c("XCOORD", "YCOORD"), crs = 4326) NE_Lakes_geo ``` The `coords` argument to `sf::st_as_sf` specifies the columns in `NE_Lakes_df` that are the x-coordinates and y-coordinates. The `crs` argument specifies the coordinate reference system, which we discuss in more detail next. # Coordinate reference systems Spatial data and `sf` objects rely on coordinate reference systems. A coordinate reference system (CRS) provides a structure by which to identify unique locations on the Earth's surface. CRSs are either geographic or projected. A geographic CRS uses longitude (east-west direction) and latitude (north-south direction) coordinates to represent location with respect to a specific ellipsoid or spheroid surface. Geographic CRSs are measured in degrees, *not units like meters or feet* -- this has important consequences. For example, a one degree difference in latitude is different at different longitudes. Projected CRSs are measured in standard Cartesian coordinates with respect to a flat surface. They have x and y locations, an origin, and a unit of measurement (like meters or feet). You can move between coordinate systems using `sf::st_transform()`. For example, we can transform `NE_Lakes_geo` (which uses a geographic CRS) to `NE_Lakes` (which uses a projected CRS) by running ```{r} NE_Lakes <- st_transform(NE_Lakes_geo, crs = 5070) NE_Lakes ``` CRSs in R have traditionally been stored using EPSG codes or `proj4string` values. This meant that in order to transform your coordinates from one CRS to another, you needed two EPSG codes or `proj4string` values, one for each CRS. Recent updates to R's handling of spatial data follow GDAL and PROJ (more information available [here](https://r-spatial.org/r/2020/03/17/wkt.html)), and CRSs in `sf` objects are stored in R as lists with two components: `input`, which contains information regarding the EPSG code and `proj4string`; and `wkt`, an open geospatial standard format. For more information on CRSs and EPSG codes, see Pebesma (2018) and Lovelace et al. (2019). To search for various CRSs and EPSG codes, see [here](https://epsg.io/) and [here](https://spatialreference.org). spsurvey will use the CRS from your `sf` object, so it is your responsibility to make sure the `sf` object has an appropriate CRS. If the CRS is not specified correctly, you may get misleading results. # References Lovelace, R., Nowosad, J., & Muenchow, J. (2019). Geocomputation with R. CRC Press. Pebesma, E., (2018). Simple Features for R: Standardized Support for Spatial Vector Data. *The R Journal*, 10 (1):439-446. https://doi.org/10.32614/RJ-2018-009 Stevens Jr, D. L. and Olsen, A. R. (2004). Spatially balanced sampling of natural resources. *Journal of the American Statistical Association*, 99(465):262-278.