Transcriptome analysis is a crucial step in understanding gene expression and regulation in biological and disease studies. However, traditional RNA sequencing methods have limitations in resolving complex transcript structures, such as those generated by alternative splicing. This is partly because these technologies use short read sequencing, which are less successful in resolving difficult sequencing regions.
Newly developed technologies are able to make longer reads. This has enabled researchers to uncover new disease indicators that short reads were unable to identify. One of these emergent technologies is providing a more accurate and comprehensive view of the transcriptome than previously possible.
PacBio RNA-sequencing technology can sequence full-length complementary DNA (cDNA) from bulk and single-cell transcriptomes, providing an unambiguous view of the transcriptome. This is valuable for exploring isoforms — groups of proteins that have similar functions.
Two RNA Technologies: MAS-Seq and Iso-Seq
MAS-Seq is an increased-throughput method of cDNA sequencing. MAS-Seq links sections of cDNA into longer fragments prior to sequencing (typically 500–1,000 bp). Results are an estimated 16x higher throughput than comparable results using non-MAS-Seq.
Iso-Seq uses PacBio SMRT technology to sequence the entire cDNA molecule. This is possible for viral, bacterial, and eukaryotic RNA. In a comparison study by an RNA long-read consortium, Iso-Seq identified long, rare isoforms with greater accuracy than other tools. This technology eliminates the need for transcript assembly, which is required for short-read methods (100–200 bp per read).
Transcript assembly is challenging because many isoforms share highly similar structures, and the inferred transcripts are often inaccurate. By sequencing full-length cDNAs, the PacBio method can accurately resolve isoform structures without the need for computational inference.
How Does Iso-Seq Work?
Typically, PacBio Iso-Seq follows these steps:
- RNA isolation: High-quality RNA is isolated from the biological sample of interest, such as a cell line or tissue.
- cDNA synthesis: The isolated RNA is reverse-transcribed into cDNA, typically using a template-switching approach to add a specific sequence to the 5' end. This sequence serves as a unique molecular identifier for each transcript.
- Size selection: The cDNA is then size-selected to enrich for full-length transcripts. This step helps ensure that the sequenced molecules represent complete mRNA transcripts.
- SMRT sequencing: The size-selected cDNA is subjected to PacBio's Single-Molecule Real-Time (SMRT) sequencing. This results in long reads that span the entire length of the cDNAs.
- Data analysis: After sequencing, the data is processed to generate full-length, high-quality transcripts. Bioinformatics tools correct errors and collapse redundant sequences, yielding a comprehensive view of the transcriptome.
What Can You Do?
With PacBio’s Iso-Seq and MAS-Seq methods, you can:
- Distinguish between alternative splicing (AS) events (start sites, end sites, intron retention)
- Detect and identify allele-specific isoforms
- Identify differentially expressed isoforms
- Predict the functional impact of novel isoforms
These methods can help scientists to characterize full-length transcript isoform sequences generated by alternative splicing. Alternative splicing generates functional diversity by expressing different combinations of exons in the same gene. This means that a single gene can produce multiple mRNA transcripts, each with a unique combination of exons. By studying the isoform structure of genes, researchers gain a better understanding of gene regulation, cellular differentiation, and disease mechanisms.
In the Field
MAS-Seq and Iso-Seq capabilities are already making progress in rare Mendelian disease research. In 2023, researchers teamed up with PacBio scientists to develop a multiomic approach to understanding the molecular basis of undiagnosed diseases.
The group applied this analysis of the genome, methylome, epigenome, and transcriptome to a participant in the Undiagnosed Diseases Network. With MAS-Seq, they were able to identify the disruption of four genes, each by a distinct mechanism. Without the combination of multiple omics technologies and the benefit of long reads, it had been nearly impossible to identify all of these mutations.
Iso-Seq is also particularly useful in plant research, where genomes tend to be much larger than the human genome. A 2017 study out of Malaysia used Iso-Seq to map the transcriptome of three unique pitcher plant species. Scientists compared the transcriptomes of carnivorous, non-carnivorous, and a hybrid cross of the two species, permitting the team to uncover insights into the unique traits of carnivorous plants. At the time, there was limited genetic information about this plant family. This study generated full reference transcriptomes for all three plant specimens.
PacBio Iso-Seq is particularly valuable in fields like genomics, transcriptomics, and functional genomics. It can provide insights into gene structure, alternative splicing, gene expression, and facilitate the discovery of novel transcripts. Long reads and full-length transcript information make it a powerful tool for characterizing complex transcriptomes and identifying rare or low-abundance transcripts.