Chemical Characterization and Informatics
The use of chemical structural information to predict properties and behavior of chemicals is a fundamental part of chemical safety decision making. The Chemical Characterization and Informatics (CCI) research area focuses on providing high quality chemical structures and using computer models and chemical analogs to predict chemical properties.
Research includes models to predict physicochemical properties (e.g., molecular weight, melting point, boiling point, vapor point) and chemical transformation as well as approaches for read-across (a data gap filling technique) and cross-species extrapolation of toxicity. Research improves the understanding of chemical fate and activity in human and ecological species and the environment using cheminformatics and bioinformatics, computational tools for interpreting and predicting chemical and biological data.
Research Efforts
Chemical Curation
Chemistry unifies data to support research across the Chemical Safety for Sustainability National Research Program. A well-curated database of chemical substance information is available as part of the ToxCast screening program. Data is generated on chemical identifiers, structures, physicochemical properties, and chemical transformation. These data are incorporated into established databases (e.g., Distributed Structure-Searchable Toxicity Database (DSSTox)). Modeled data is used to predict potential metabolites and chemical transformation products (e.g., those generated by the Chemical Transformation Simulator (CTS)).
Cross-Species Extrapolation
Human and environmental risk assessments for chemicals use a limited number of model organisms to generate toxicity data, which are extrapolated to species of concern. For ecological assessments this can involve extrapolation of effects from a few representative species to other species. Advancing approaches that rapidly maximize the use of existing data through tools such as the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, are necessary for cross-species chemical safety evaluations.
Structure-Activity Relationship Models
Quantitative structure-activity relationship (QSAR) models provide an automated method for the estimation of all types of chemical safety relevant endpoints for data-poor chemicals. Endpoints may include the prediction of toxicities, chemical behavior in the body, and environmental fate and physicochemical properties to support exposure modeling. Work includes development of automated workflows to transform raw experimental data to modeling data sets and then to QSAR models as demonstrated in tools such as the Toxicity Estimation Software Tool (TEST).
Read-Across
Read-across is a commonly used data gap filling technique using endpoint information for one substance (the source substance) to predict the same endpoint for another substance (the target substance) which is ‘similar’ in some way, usually based on chemical structure similarity. Generalized Read-Across (GenRA) is an automated approach to make reproducible read-across predictions of toxicity.
Non-Targeted Analysis (NTA) Methods
NTA methods are developed to rapidly characterize a broad range of compounds including chemicals of immediate and emerging concern (e.g., per- and polyfluoroalkyl chemicals), real-world mixtures, and substances of unknown or variable composition (UVCB). High-resolution mass spectrometry is the primary NTA tool for identifying previously unknown or understudied chemicals. NTA methods can be applied to any type of sample, including consumer products, environmental matrices, and biological media.
Per- and Polyfluoroalkyl Substances (PFAS)
PFAS are a group of chemicals used to make products that resist heat, oil, stains, grease, and water. Many PFAS will persist on geologic time scales and can bioaccumulate, or become concentrated inside the bodies of living things, to toxic levels. They have diverse structures and typically lack adequate information needed to inform risk evaluations on individual substances.
To address PFAS research needs, the EPA is pursuing a categorization approach informed by structure, mechanistic information, and chemical behavior in the body. This research will be guided by the objectives of EPA’s PFAS Strategic Roadmap to meet EPA’s National PFAS Testing Strategy. EPA's Safer Chemicals Researchers are continuing to develop and refine of PFAS categories to create expertly curated lists of PFAS and a chemical library of PFAS that can be used for testing.
Additionally, EPA researchers use Quantitative Structure–Activity Relationships (QSAR) models to estimate physicochemical properties of PFAS, model chemical break down products, and map to parent substances. QSAR is a way of mapping the way a molecule is linked with a process, such as biological activity or chemical reactivity.
Tools and Resources
Generalized Read-Across (GenRA)
GenRA is an automated approach to make reproducible read-across predictions of toxicity.
Computational Toxicology (CompTox) Chemicals Dashboard
The CompTox Chemicals Dashboard contains chemistry, toxicity, and exposure information for over one million chemicals, with over 300 chemical lists based on structure or category.
Other Tools and Data
- Distributed Structure-Searchable Toxicity (DSSTox) Database. Provides a high-quality public chemistry resource for supporting improved predictive toxicology.
- Toxicity Estimation Software Tool (TEST): Estimates the toxicity of chemicals using Quantitative Structure Activity Relationship (QSAR) methodologies.
- Chemical Transformation Simulator (CTS): Web-based tool for predicting environmental and biological transformation pathways and physicochemical properties of organic chemicals.
- Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): Extrapolates from data rich model organisms to thousands of other non-target species to evaluate potential chemical susceptibility.