The complete sequence of a human genome!

A historical scientific milestone has been achieved: the first complete sequence of a human genome (!) published in a special issue of Science. Scientists of the Telomere-to-Telomere (T2T) Consortium sequenced a fully homozygous human cell line, CHM13, generated by the erroneous fertilization of an enucleated egg, to reconstruct the complete genome. The new assembly, named T2T-CHM13, includes 8% of heterochromatin completely missing from the previous human genome assembly (GRCh38). Besides the original Human Genome Project, this is the second most significant progress in our sequencing and understanding of the human genome.

Summary abstract from “A complete reference genome improves analysis of human genetic variation”

Dennis lab members, including Megan and graduate students Colin Shew and Daniela Soto, contributed to a number of the published studies, including:

  • A complete reference genome improves analysis of human genetic variation,” where the impact of the complete genome in the analysis of human genetic variation was thoroughly assessed. This was a truly collaborative effort led by four labs (Schatz, McCoy, Zook, and ours). In particular, Daniela performed genome-wide analysis of collapsed duplication in GRCh38, finding 8 Mbp of sequence incorrectly represented in GRCh38 and fixed in T2T-CHM13. Further, we helped demonstrate significant improvements in variant calling across medically-relevant genes.

  • Complete genomic and epigenetic maps of human centromeres” by generating CHM13 RNA-seq data, with Colin analyzing transcript abundance across the T2T-CHM13 transcriptome and showing expression of numerous genes near centromeres.

  • The complete sequence of a human genome”, with Megan performing comparative genomic analysis of the expanded FRG1 primate-specific gene expansion, which has several missing copies in GRCh38 that are resolved in T2T-CHM13.

This new genome opens many possibilities for scientists who can now explore the most repetitive sequence of the human genome, such as centromeres. Particularly interesting for us is the full resolution of segmental duplications, large historical duplicated segments that are a hallmark of great ape evolution and harbor genes associated with neurodevelopment. Moving forward, we are fully embracing T2T-CHM13 to answer questions regarding the evolution and disease implication of human-specific duplicated genes at unprecedented resolution (and hope many of you will too!).

Additional resources available for the new T2T-CHM13 genome include:

  • A UCSC Genome Browser

  • CHM13 open-access data available via GitHub

  • Dennis lab analysis available via GitHub

  • Genome stratifications of reference artefacts (incl. collapsed dups) are here

  • Variant calls can be found here as well as on AnVIL

You can read Megan’s commentary about the implications of this scientific achievement in a perspective published in Genome Research. Also, our work, in addition to contributions by Prof Chuck Langley, were highlighted by UC Davis here.