
Language model-guided anticipation and discovery of unknown metabolites
November 12, 2024 @ 12:00 pm – 1:30 pm
Speaker: Michael Skinnider, Princeton University
Lunch is available beginning at 12 PM
Speaker to begin promptly at 12:30 PM
Abstract: Despite decades of study, large parts of the human metabolome remain unexplored. Mass spectrometry-based metabolomics routinely detects thousands of unidentified small molecules within human tissues and biofluids, but structure elucidation of novel metabolites remains a low-throughput endeavour. Here, we present an approach that leverages chemical language models to discover previously uncharacterized metabolites. We introduce DeepMet, a language model that learns the latent biosynthetic logic embedded within the chemical structures of known metabolites and exploits this understanding to anticipate the existence of as-of-yet undiscovered metabolites. Prospective synthesis of metabolites predicted to exist by DeepMet directs their targeted discovery. Integrating DeepMet with tandem mass spectrometry (MS/MS) data enables automated metabolite discovery within complex tissues. We demonstrate the potential for language models to accelerate the mapping of the metabolome by harnessinging DeepMet to discover several dozen mammalian metabolites.