In this document, users will be introduced to different methods to access computational toxicology and exposure data. The two main methods described are through the CompTox Chemicals Dashboard and the CTX APIs, via the R package ctxR. References to papers and additional information on resources can be found within the exposition of this document. For more detailed information on specific areas of data, please refer to the other vignettes included within ctxR.
Accessing chemical data is a vital step in many workflows related to chemical, biological, and environmental modeling. While there are many resources available from which one can pull data, the CompTox Chemicals Dashboard (CCD), built and maintained by the United States Environmental Protection Agency, is particularly well-designed and suitable for these purposes. Originally introduced in The CompTox Chemistry Dashboard: a community data resource for environmental chemistry, the CCD contains information on over 1.2 million chemicals as of May 2024 and has been cited 612 times according to CrossRef. To learn more about the CCD, please visit the page About CCD.
The CCD includes chemical information from many different domains, including physicochemical, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay data. For information on data sources and current versions, please review the CCD Release Notes. It provides a graphical user interface that allows for an interactive user experience and is easy to navigate. As such, users can explore the data available on the CCD without any programming background.
The CCD can be queried for one chemical at a time or using batch search.
In single-substance search, the user enters a full or partial chemical identifier (name, CASRN, InChiKey, or DSSTox ID) into a search box on the CCD homepage. Autocomplete can provide a list of possible matches. Figure 1 shows an example: the CCD landing page for the chemical Bisphenol A. This page is generated for Bisphenol A and links to but does not represent other chemicals. Each chemical in the CCD has its own page similar to this, with varying levels of accessible based on the information that is available.
The different domains of data available for this chemical are shown by the tabs on the left side of the page: for example, “Physchem Prop.” (physico-chemical properties), “Env. Fate/Transport” (environmental fate and transport data), and “Hazard Data” (in vivo hazard and toxicity data), among others.
In batch search, the user enters a list of search inputs, separated by new lines, into a search box. The user selects the type(s) of inputs by selecting one or more checkboxes – include chemical identifiers, monoisotopic masses, or molecular formulas. Then, the user selects “Display All Chemicals” to display the list of substances matching the batch-search inputs, or “Choose Export Options” to choose options for exporting the batch-search results as a spreadsheet. The exported spreadsheet may include data from most of the domains available on an individual substance’s CCD page.
The user can download the selected information in various formats, such as Excel (.xlsx), comma-separated values (.csv), or different types of chemical table files (.e.g, MOL).
The web interface for batch search only allows input of 10,000 identifiers at a time. If a user wants to retrieve information for more than 10,000 chemicals, identifiers will need to be separated into multiple batches and searched separately.
Practicing researchers may follow a workflow where many chemicals are being evaluated that looks something like this:
Because each of these workflow steps requires manual interaction with the search and download process, the risk of human error inevitably creeps in. Here are a few real-world possibilities (the authors can neither confirm nor deny that they have personally committed any of these errors):
Moreover, the manual stages of this kind of workflow are also non-transparent and not easily reproducible. Utilizing APIs for data exploration and retrieval can alleviate these concerns.
Recently, the US EPA’s Center for Computational Toxicology and Exposure (CCTE) developed a set of Application Programming Interfaces (APIs) that allows programmatic access to the CCD, bypassing the manual steps of the web-based batch search workflow. APIs effectively automate the process of accessing and downloading the data that populates the CCD.
The Computational Toxicology and Exposure (CTX) APIs are publicly available at no cost to the user. However, in order to use the CTX APIs, users must have a individual API key. The API key uniquely identifies the user to the CCD servers and verifies that you have permission to access the database. Getting an API key is free, but requires contacting the API support team at [email protected].
The APIs are organized into sets of “endpoints” by data domains:
Chemical
, Hazard
, and
Bioactivity
. An endpoint provides access to a specific set
of information and data, e.g. physical-chemical properties for a
chemical that the user specifies. A view from the Chemical API web
interface is pictured below.
On the left side of each domain’s web interface page, there will be
several different tabs listed depending on information requests
available within the domain. In Figure 4, the
Chemical Details Resource
endpoint provides basic chemical
information; the Chemical Property Resource
endpoint
provides more comprehensive physico-chemical property information; the
Chemical Fate Resource
endpoint provides chemical fate and
transport information; and so on.
Authentication
, found in upper left tab on each web
interface page, is required to use the APIs. To authenticate themselves
in the API web interface, the user must input their unique API key. To
request an API key, please contact the API support team at [email protected].
APIs effectively automate the process of accessing and downloading the data that populates the CCD. APIs do this via requests using the Hypertext Transfer Protocol (HTTP) that enables communication between clients (e.g. your computer) and servers (e.g. the CCD).
In the CTX API web interface, the colored boxes next to each endpoint indicate the type of the associated HTTP method. GET is used to request data from a specific web resource (e.g. a specific URL); POST is used to send data to a server to create or update a web resource. For the CTX APIs, POST requests are used to perform multiple (batch) searches in a single API call; GET requests are used for non-batch searches.
You do not need to understand the details of POST and GET requests in
order to use the API. Let’s consider constructing an API request to
Get data by dtxsid
under the
Chemical Details Resource
.
The web interface has two subheadings:
projection
parameter, a string that
can take one of five values (chemicaldetailall
,
chemicaldetailstandard
, chemicalidentifier
,
chemicalstructure
, ntatoolkit
). Depending on
the value of this string, the API can return different sets of
information about the chemical. If the projection
parameter
is left blank, then a default set of chemical information is
returned.The default return format is displayed below and includes a variety of fields with data types represented.
Pictured below is an example of returned Details for Bisphenol A with
the chemicaldetailstandard
value for
projection
selected.
Formatting an http request is not necessarily intuitive nor worth the time for someone not already familiar with the process, so these endpoints may provide a resource that for many would require a significant investment in time and energy to learn how to use. However, there is a solution to this in the form of the R package ctxR.
ctxR was developed to streamline the process of accessing the information available through the CTX APIs without requiring prior knowledge of how to use APIs.
Users can run library(ctxR)
to install from CRAN or
install the development version of ctxR like so:
As previously described, a user must have an API key to use in order
to access the CTX APIs. A FREE API key can be obtained by
emailing the CTX API Admins. In
the example code, the API key will be stored as the variable
my_key
.
For general use of the package, the user may use the function
register_ctx_api_key()
to store the API key in the current
session or more permanently for access across sessions.
# This stores the key in the current session
register_ctx_api_key(key = '<YOUR API KEY>')
# This stores the key across multiple sessions and only needs to be run once. If the key changes, rerun this with the new key.
register_ctx_api_key(key = '<YOUR API KEY>', write = TRUE)
Once the API key is stored, the default display setting is turned off for protection. To change this, use the following functions as demonstrated.
# To show the API key
ctxR_show_api_key()
getOption('ctxR')$display_api_key
# To hide the API key
ctxR_hide_api_key()
getOption('ctxR')$display_api_key
Finally, to access the key, use the ctx_key()
function.
As some quick start examples, we demonstrate the relative ease (compared to using the CCD or API web interface) of retrieving the information across endpoints for Bisphenol A using ctxR.
Tables output in each example have been filtered to only display the first few rows of data. For additional examples and more comprehensive documentation on each endpoint, consider reviewing the other ctxR vignettes for the data domain of interest.
In this section, several ctxR functions are used to access different types of information from the CTX Chemical API.
The function get_chemical_details()
takes in either the
DTXSID or DTXCID of a chemical and the user-specific API key. Relevant
chemical details for Bisphenol A, which has DTXSID “DTXSID7020182”, are
obtained in a data.table.
The function get_chem_info()
returns phys-chem
properties for the selected chemical, and can be filtered to
‘experimental’ or ‘predicted’ if desired.
Here all phys-chem properties are returned for Bisphenol A.
Request can be filtered to return experimental results only.
In this section, several ctxR functions are used to access different types of information from the CTX Hazard API.
The function get_hazard_by_dtxsid()
retrieves hazard
data (all human or ecological toxicity data) for a given chemical based
on input DTXSID. get_human_hazard_by_dtxsid()
and
get_ecotox_hazard_by_dtxsid()
can filter returned hazard
results for the given chemical to human or ecological toxicity data,
respectively.
Here all hazard data is returned for Bisphenol A:
Request can be refined to return results for human hazard,
or EcoTox results.
In this section, several ctxR functions are used to access different types of information from the CTX Bioactivity API.
The function get_bioactivity_details()
retrieves all
bioactivity data for a given chemical based on input DTXSID.
The function get_bioactivity_details()
can also be used
to retrieve all bioactivity data for a given endpoint, based on input
AEID (assay endpoint identifier).
The ctxR package provides a streamlined approach to accessing data from the CCD for users with little or no prior experience using APIs.
For additional examples and more comprehensive documentation on each endpoint, consider reviewing the other ctxR vignettes for the data domain of interest.