We are pleased to announce our fourth data release for the coronavirus genome browser. (See also our first, second, and third releases)
In line with our previous releases, these tracks include diverse data such as gene models, immunology, pathogenicity, and conservation. We would also like to bring attention to our recently released COVID-19 GWAS tracks for the GRCh37/hg19 and GRCh38/hg38 human assemblies. The COVID-19 GWAS track displays data which aims to identify genetic determinants of SARS-CoV-2 infection susceptibility and disease severity.
Clicking on any of the track titles below will lead to the track description page, which includes additional information and allows for configuration of various display options.
This release includes the following tracks:
- PhyloCSF Genes – Curated conserved genes – This composite includes two tracks displaying curated SARS-CoV-2 protein-coding genes conserved within the Sarbecovirus subgenus as determined using PhyloCSF, FRESCo, and other comparative genomics methods, consistent with experimental evidence in SARS-CoV-2.
- PhyloCSF Genes – This track shows the conserved protein-coding genes, namely ORF1a, ORF1ab, S, ORF3a, ORF3c (a.k.a. ORF3h, ORF3a*, and 3a.iORF1), E, M, ORF6, ORF7a, ORF7b, ORF8, N, and ORF9b (a.k.a. ORF9a).
- PhyloCSF Rejected Genes – This track shows other genes that have been proposed that do not show the signature of conserved protein-coding genes or persuasive experimental evidence, and are thus unlikely to be actual protein-coding genes, namely ORF3d, ORF3b, ORF14 (a.k.a. ORF9b, ORF9c), and ORF10.
- New ORFs based on RNA-seq and Ribo-seq by the Weizman Institute tracks – The Weizman ORFs (Open Reading Frames) track shows previously unannotated ORF predictions based on Ribo-Seq and RNA-seq data. It contains four tracks comprised of the predicted gene models, and data supporting them.
- icSHAPE RNA Structure – This track shows normalized icSHAPE reactivity data of in vivo and in vitro SARS-CoV-2 experiments.
- Validated epitopes from IEDB – This track shows epitope sequences displayed by various class I MHC alleles as annotated by National Institute for Allergy and Infectious Diseases (NIAID) Immune Epitope Database (IEDB).
- Potential pathogenic insertions and deletions from Gussow et al, PNAS 2020 – This track shows genomic features that differentiate SARS-CoV-2 and the viruses behind the two previous deadly coronavirus outbreaks, SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), from less pathogenic coronaviruses.
- Natural selection analysis from Sergei Pond’s research group – This track shows data from Pond et al, 2020 “Natural selection analysis of SARS-CoV-2/COVID-19” where authors used several statistical techniques to identify selection sites of interest in SARS-CoV-2 data from GISAID.
- Phylogenetic Tree and Variants from High-coverage Sequences in Public Databases – This track displays a phylogenetic tree inferred from SARS-CoV-2 genome sequences from GenBank, COG-UK and the China National Center for Bioinformation, and variants found in the sequences. It uses the phylogenetic tree from the sarscov2phylo 28-08-20 release, pruned to include only public sequences. Since the public sequences are unrestricted, we can offer VCF files with sequence variants for download.
- Spike protein receptor-binding domain (S RBD) Deep Mutational Scanning – Tracks created from this data include:
- S RBD Deep Mutational Scanning: Antibody Escape – This track shows deep mutational scanning data measuring the effect of the mutations to the Spike RBD to binding of antibodies using a yeast surface display system.
- S RBD Deep Mutational Scanning: ACE2 Binding – This track shows deep mutational scanning data measuring effect of all possible point (amino acid) RBD mutations on ACE2 binding affinity using a yeast surface display system.
- S RBD Deep Mutational Scanning: Expression – This track shows deep mutational scanning data measuring effect of all possible point (amino acid) RBD mutations on protein expression using a yeast surface display system.
- Updated – Problematic sites where masking or caution are recommended for analysis – This track shows locations in the SARS-CoV-2 genome that have been identified as problematic for analysis for various reasons. The data was updated to the most recent release from July 29th.
- Updated – Phylogenetic Tree and Variants from High-coverage Sequences in GISAID EpiCoV TM – This track displays a phylogenetic tree inferred from SARS-CoV-2 genome sequences collected by GISAID, and variants found in the sequences. It has now been updated to sarscov2phylo 28-08-20 release.
We would like to thank the publication authors De Maio et al, Gussow et al, Pond et al, Starr et al, Sun et al, and Turakhia et al for making these data available. We would also like to thank Rob Lanfear, Irwin Jungreis, Qiangfeng Cliff Zhang, Jason Fernandes, Santrupti Nerli, Bjoern Peters, the Bloom Lab, the Weizmann Institute of Science and the GISAID Initiative.
These tracks are made possible due to the worldwide efforts of scientists, including the Genome Browser team. Will will continue to provide SARS-COVID-2 resources as they become available. For the latest data, see our development site. Note that content on our preview server has not undergone our QA process, and is subject to change at any time.