CtxR: New Tool Helps Researchers Working with Data for Many Chemicals at a Time
Published September 3, 2024
Accessing chemical data is a vital step in chemical, biological, and environmental modeling. While there are numerous resources available to pull data from, the CompTox Chemicals Dashboard (Dashboard), built and maintained by EPA researchers, is particularly well-designed and suitable for scientists to use in their research.
The Dashboard includes information on over 1.2 million chemicals from different domains, including physicochemical, environmental fate and transport (i.e., where chemicals go and how they get there), exposure, usage, in vivo (in a living organism) toxicity, and in vitro (outside of a living organism) bioassay data. The Dashboard provides an interface that allows for an interactive user experience and is easy to navigate, even for users without any programming experience. The user interface makes the process of small-scale chemical research easy and accessible; however, at a larger scale, the manual interaction required with the search and download process can be time consuming, inconvenient, and subject to the risk of human error.
To help alleviate these concerns, EPA developed a set of CompTox and Exposure Application Programming Interfaces (CTX APIs) that allow programmatic access to the CompTox Chemicals Dashboard, bypassing the manual steps of the web-based searching. APIs effectively automate the process of retrieving and downloading Dashboard data that the user wants to access.
The CTX APIs are publicly available at no cost to the user. However, many researchers who use Dashboard data may not be familiar with APIs and formatting an API request is not necessarily intuitive nor worth the time for someone not already familiar with the process. To make the CTX APIs -- and the Dashboard data -- more easily accessible, EPA researchers created the R package “ctxR”.
ctxR was developed to streamline the process of accessing the information available through the CTX APIs without requiring prior knowledge of how to use APIs. This R package allows researchers to easily query chemical data from the CompTox Chemicals Dashboard in transparent, reproducible programmatic workflows.
“It’s now as easy as calling one function in R,” EPA data scientist Dr. Caroline Ring said. “If you are a researcher or practitioner of any kind that uses R for any chemical data analysis, uses data from the CompTox Chemicals Dashboard, then this package is useful for you.”
The team working on ctxR plans to continue developing this package along with CTX API endpoint updates. This is similar to the approach the team took with the recent CTX API exposure endpoint release.
How Can ctxR be Used?
ctxR is already being integrated into a workflow for prioritization of chemicals in biosolids, which are products of wastewater treatment plants. EPA assesses the potential human health and environmental risk posed by pollutants found in biosolids. Pollutants found in biosolids vary in space and time, depending on industrial and other inputs to individual wastewater treatment facilities.
To help assess pollutants in biosolids, EPA uses the Biosolids Screening Tool (BST). The BST is a screening-level model that can estimate human and ecological hazards based on potential exposures associated with biosolids or placement of biosolids in a surface disposal unit, such as a landfill. The results can be used to identify pollutants, pathways, and receptors of greatest interest and to inform decisions about the need to perform more refined modeling or to address data gaps or uncertainties. The BST requires physicochemical property values to run simulations, and pulling large amounts of this data manually takes time and introduces the risk of error. EPA began using ctxR in 2023, before it was publicly available, to pull large amounts of data from the Dashboard for screening purposes.
How Can I Access ctxR?
ctxR is free and publicly available on CRAN and GitHub.
For additional examples and more comprehensive documentation on each endpoint, consider reviewing the ctxR vignettes for the data domain of interest.
Learn More About the Science