Hidden challenges in evaluating spillover risk of zoonotic viruses using machine learning models.

Publication date: May 20, 2025

Machine learning models have been deployed to assess the zoonotic spillover risk of viruses by identifying their potential for human infectivity. However, the lack of comprehensive datasets for viral infectivity poses a major challenge, limiting the predictable range of viruses. In this study, we address this limitation through two key strategies: constructing expansive datasets across 26 viral families and developing the BERT-infect model, which leverages large language models pre-trained on extensive nucleotide sequences. Here we show that our approach substantially boosts model performance. This enhancement is particularly notable in segmented RNA viruses, which are involved with severe zoonoses but have been overlooked due to limited data availability. Our model also exhibits high predictive performance even with partial viral sequences, such as high-throughput sequencing reads or contig sequences from de novo sequence assemblies, indicating the model’s applicability for mining zoonotic viruses from virus metagenomic data. Furthermore, models trained on data up to 2018 demonstrate robust predictive capability for most viruses identified post-2018. Nonetheless, high-resolution evaluation based on phylogenetic analysis reveals general limitations in current machine learning models: the difficulty in alerting the human infectious risk in specific zoonotic viral lineages, including SARS-CoV-2. Our study provides a comprehensive benchmark for viral infectivity prediction models and highlights unresolved issues in fully exploiting machine learning to prepare for future zoonotic threats.

Concepts Keywords
Extensive Comprehensive
Nucleotide Hidden
Trained High
Viruses Infectivity
Zoonoses Learning
Model
Models
Predictive
Risk
Sequences
Spillover
Trained
Viral
Viruses
Zoonotic

Semantics

Type Source Name
disease MESH zoonotic spillover
disease IDO infectivity

Original Article

(Visited 3 times, 1 visits today)

Leave a Comment

Your email address will not be published. Required fields are marked *