Publication date: Oct 13, 2024
These tasks included binary classification, regression, and generation tasks, with representations like SMILES strings for molecules and amino acid sequences for proteins. The Therapeutics Data Commons (TDC) offers datasets to help AI models predict drug properties, yet these models work independently. In therapeutics, specialized models like graph neural networks (GNNs) represent molecules as graphs for functions such as drug discovery. Interestingly, training with non-small molecule datasets, such as proteins, improved performance on small molecule tasks. Notably, Tx-LLM excelled in datasets combining SMILES molecular strings with text features like disease or cell line descriptions, likely due to its pretrained knowledge of the text. Current AI models focus on specialized tasks within this pipeline, but their limited scope can hinder performance. Tx-LLM excels in tasks combining molecular representations with text and shows positive transfer between different drug types.
Concepts | Keywords |
---|---|
Costly | Datasets |
Diverse | |
Pharmacokinetics | Drug |
Sharesdeveloping | End |
Large | |
Llm | |
Llms | |
Models | |
Molecules | |
Proteins | |
Tasks | |
Tdc | |
Therapeutic | |
Tuned | |
Tx |