Abstract
Nanopore sequencing devices read individual RNA strands directly. This facilitates
identification of exon linkages and nucleotide modifications; however, using conventional direct
RNA nanopore sequencing, the 5′ and 3′ ends of poly(A) RNA cannot be identified
unambiguously. This is due in part to RNA degradation in vivo and in vitro that can obscure
transcription start and end sites. In this study, we aimed to identify individual full-length human
RNA isoforms among ~4 million nanopore poly(A)-selected RNA reads. First, to identify RNA
strands bearing 5′ m7G caps, we exchanged the biological cap for a modified cap attached to a
45-nucleotide oligomer. This oligomer adaptation method improved 5′ end sequencing and
ensured correct identification of the 5′ m7G capped ends. Second, among these 5′-capped
nanopore reads, we screened for features consistent with a 3′ polyadenylation site. Combining
these two steps, we identified 294,107 individual high-confidence full-length RNA scaffolds from
human GM12878 cells, most of which (257,721) aligned to protein-coding genes. Of these,
4,876 scaffolds indicated unannotated isoforms that were often internal to longer, previously
identified RNA isoforms. Orthogonal data for m7G caps and open chromatin, such as CAGE and
DNase-HS seq, confirmed the validity of these high-confidence RNA scaffolds.