Misassembly of long reads undermines de novo-assembled ethnicity-specific genomes: validation in a Chinese Han population.

Misassembly of long reads undermines de novo-assembled ethnicity-specific genomes: validation in a Chinese Han population.

Publication date: Jun 05, 2019

An ethnicity is characterized by genomic fragments, single nucleotide polymorphisms (SNPs), and structural variations specific to it. However, the widely used ‘standard human reference genome’ GRCh37/38 is based on Caucasians. Therefore, de novo-assembled reference genomes for specific ethnicities would have advantages for genetics and precision medicine applications, especially with the long-read sequencing techniques that facilitate genome assembly. In this study, we assessed the de novo-assembled Chinese Han reference genome HX1 vis-cE0-vis the standard GRCh38 for improving the quality of assembly and for ethnicity-specific applications. Surprisingly, all genomic sequencing datasets mapped better to GRCh38 than to HX1, even for the datasets of the Chinese Han population. This gap was mainly due to the massive structural misassembly of the HX1 reference genome rather than the SNPs between the ethnicities, and this misassembly could not be corrected by short-read whole-genome sequencing (WGS). For example, HX1 and the other de novo-assembled personal genomes failed to assemble the mitochondrial genome as a contig. We mapped 97.1% of dbSNP, 98.8% of ClinVar, and 97.2% of COSMIC variants to HX1. HX1-absent, non-synonymous ClinVar SNPs were involved in 140 genes and many important functions in various diseases, most of which were due to the assembly failure of essential exons. In contrast, the HX1-specific regions were scantly expressible, as shown in the cell lines and clinical samples of Chinese patients. Our results demonstrated that the de novo-assembled individual genome such as HX1 did not have advantages against the standard GRCh38 genome due to insufficient assembly quality, and that it is, therefore, not recommended for common use.

Mai, Z., Liu, W., Ding, W., and Zhang, G. Misassembly of long reads undermines de novo-assembled ethnicity-specific genomes: validation in a Chinese Han population. 04754. 2019 Hum Genet.

Concepts Keywords
Assembly Branches of biology
Caucasians Life sciences
Chinese Molecular biology
Contig Biotechnology
Exons DNA
Genome Bioinformatics
Genomic Sequencing Genomics
Han Reference genome
Mitochondrial Genome Single-nucleotide polymorphism
Reference Genome Whole genome sequencing
Sequencing DNA sequencing
Single Nucleotide Polymorphisms
SNPs
Vis

Semantics

Type Source Name
gene UNIPROT RASA1
gene UNIPROT RGS6
drug DRUGBANK Tropicamide

Similar

Original Article

Leave a Comment

Your email address will not be published. Required fields are marked *