It is Postdoctoberfest! This week, we are celebrating the vital contributions that postdoctoral scholars make to UC Santa Cruz, and to the advancement of science more broadly.
This Postdoctober, we are spotlighting Jean Monlong, who has been a postdoc with the Genomics Institute since 2018. His research in Benedict Paten’s lab is part of a major project to update the twenty-year-old human reference genome and transform it from a single sequence based largely on one person’s DNA to a more diverse and complete “pangenome.”
After four years contributing substantially to the research of the Genomics Institute, we are pleased to share that in May, Jean will be starting his own lab at the French National Institute of Health and Medical Research, which is the French equivalent of the NIH. We spoke with him about his research here at UC Santa Cruz:
What originally drew you to UC Santa Cruz?
For me, it was really the pangenome project, which very few people were doing, and UC Santa Cruz and the Paten lab were really leaders in that. I wanted to do pangenome work and that was the best place to be.
In my Ph.D. work, I was trying to find differences between genomes that involve many bases/nucleotides, which was difficult to do with the traditional technology and methods. An example of the difficult variants I was looking for would be a large region of a genome that was either absent or repeated when comparing individuals. Then, at a conference, I saw David Haussler present about this pangenome work and I thought, “Wow, this would help so much with what I am doing now.”
What is a pangenome?
Pangenomes represent multiple genomes. You can picture this as the human genome reference augmented with genomic variation from other individuals. A pangenome has many advantages [over a traditional single reference genome]. It is more representative of the diverse genomic landscape of the human population, and it incorporates genomic variation, including difficult to analyze variation early in the analysis of genome sequencing data. This helps us find variation in new samples, which we call “genotyping.”
What has been your role in the Human Pangenome Project?
I’ve been working on the Paten lab’s variation graphs (vg), which is a pangenomic toolkit that provides a way to represent multiple genomes. Instead of displaying ten copies of ten completely different genomes, we collapse the areas that are the same in all the sequences, something like 99%, and we make bubbles in the graph to show where the sequences have variant base pairs.
With other people in the lab, I’ve shown that we can use variation graphs to genotype difficult variants, and more recently that this approach is efficient enough to scale to thousands of genomes.
The next exciting step is to build a comprehensive pangenome using the latest and greatest new sequencing approaches. In much the same vein as the recent Telomere-to-Telomere Consortium’s complete human genome, the Human Pangenome Reference Consortium (HPRC) is attempting this feat across hundreds of diverse individuals. We’ve released a preprint of the first set of 47 genomes!
It is amazing to see this first batch of data already after the consortium has only been alive for one or two years. There is a lot in this first release, and even if a more complete pangenome is coming in the next few years, researchers can already start using this one.
What is next? What are you most excited about moving forward?
Now that we have shown that we can find genetic variants better with the pangenome, I think the next step for me is to use our pangenome tools for association studies, which are studies in which you try to link variants with a disease or a trait of a disease.
I’m particularly excited about the new genomic territories that we will finally be able to explore across the population thanks to new technologies, tools, and projects like the Human Pangenome Reference Consortium. We know that we are still missing a lot around the role of genetics in disease, so I’m eager to work on this problem with this approach. That’s why I will continue this research with a focus on applications to disease cohorts in my new lab at the French National Institute of Health and Medical Research in Toulouse (France), starting in May 2023.
In the Paten Lab, I was fortunate to help with a project much closer to the clinic in collaboration with Stanford and Google. It was a side project unrelated to the pangenome, but it follows the same idea that I am interested in – using the newest technology and tools to find the variants that we could not find before. The goal was to sequence the genome of newborn babies at the neonatal intensive care unit as fast as possible to provide the doctors with a genomic diagnosis. We actually set a Guinness World Record by going from blood to genomic variants in a little over 5 hours! It was amazing to see how each step of the process (“wet” lab, sequencing machine, computational analysis, and variant curation) was pushed to its limits and “streamed” together. It was very exciting to see that what we found actually made a difference. One of the clinicians was able to quickly get a patient on the list for a heart transplant based on what we found because they were able to see the condition was genetic rather than the result of reversible causes. It was something that was very new to me and something that I want to do more in the future, although it will require some time to set up.