Data sharing was a core principle that led to the success of the Human Genome Project 20 years ago. Now scientists are struggling to keep information free.
Kendall Powell | Nature | February 10, 2021
In July 2000, David Haussler remembers crying as he watched the first fully assembled human genome streaming across his computer screen. He and Jim Kent, a graduate student at the time, built the first-ever web-based tool for exploring the three billion letters of the human genome. They had published the rough draft of the genome on the Internet a mere 11 days after finishing the herculean task of stitching it all together — a task assigned to them as part of the Human Genome Project (HGP), the international collaboration that had been working towards this goal for a decade. It would still be several months before the group published its analysis of the genome in the pages of Nature1, but the data were ready to share.
“There it was, going out into the whole world,” recalls Haussler, scientific director of the University of California Santa Cruz Genomics Institute. Soon, every person in the world could explore it — chromosome by chromosome, gene by gene, base by base — on the web.
It was a historic moment, says Haussler. Before the HGP launched in the early 1990s, “there had not been a serious discussion about data sharing in biomedical research”, Haussler says. “The standard was that a successful investigator held onto their own data as long as they could.”
That standard clearly wouldn’t work for such a large and collaborative effort. If countries or scientists hoarded the data they were producing, it would derail the project. So in 1996, the HGP researchers got together to lay out what became known as the Bermuda Principles, with all parties agreeing to make the human genome sequences available in public databases, ideally within 24 hours — no delays, no exceptions.
Fast-forward two decades, and the field is bursting with genomic data, thanks to improved technology both for sequencing whole genomes and for genotyping them by sequencing a few million select spots to quickly capture the variation within. These efforts have produced genetic readouts for tens of millions of individuals, and they sit in data repositories around the globe. The principles laid out during the HGP, and later adopted by journals and funding agencies, meant that anyone should be able to access the data created for published genome studies and use them to power new discoveries.
If only it were that simple.