February 10, 2021 | Katharine Wrighton | Nature Portfolio
In 2020, almost 30 years after the launch of the Human Genome Project, Miga, Koren and colleagues published a paper describing the first gapless, telomere-to-telomere (T2T) assembly of a human chromosome, namely the X chromosome. This breakthrough was the work of the T2T consortium and brought together sequencing technologies that had been developed in the preceding 6 years.
In 2015, Chaisson et al. showed that long-read sequencing technology from Pacific Biosciences (PacBio) could be used to sequence a human genome, specifically that of the complete hydatidiform mole (CHM) cell line CHM1. As CHM cells have a duplicated paternal (but no maternal) genome, bypassing the need to assemble both haplotypes of a diploid genome, they became a key reference genome. Later that year, Berlin, Koren et al. reported the first de novo assembly of a human genome based on PacBio sequencing long reads alone. Then, in 2018, Jain et al. revealed that ultra-long-read nanopore sequencing (from Oxford Nanopore Technologies) could also be used to assemble a human genome de novo (Milestone 8). Finally, in 2019, Wenger, Peluso et al. introduced PacBio high-fidelity (HiFi) sequencing, which was 99.8% accurate in sequencing the human genome reference standard HG002 over average read lengths of 13.5 kb.
Although these technological advancements were reported to have closed gaps in the GRCh37 or GRCh38 version of the human reference genome, no chromosome had been sequenced in full owing to difficulties in sequencing features such as large regions of repeat-rich DNA in centromeres and segmental duplications. Miga, Koren et al. reasoned that, by combining data generated by these different long-read sequencing technologies, they could increase the length of continuous sequences (contigs) used to assemble a reference genome, identifying missing sequences and assembling a gapless chromosome.