Azenta Life Sciences and PacBio® held a virtual symposium titled Decoding the Complexity of Human Health – A HiFi Vision on October 5, 2021 that explored applications of high-fidelity (HiFi) long-read sequencing within human biomedical research. Speakers from top academic research labs and biotech shared their cutting-edge work with this next generation sequencing (NGS) technology. Here, we recap the six presentations and discuss the themes that emerged from the event.
Presented by Michael Kladde, Ph.D., Professor of Biochemistry and Molecular Biology, University of Florida
Genetic and epigenetic analysis based on NGS can be performed at various scales: at the low end, multiplex PCR followed by NGS can target a few loci; at the high end is whole genome sequencing. Mid-scale methods that capture dozens to hundreds of targets often rely on a Capture-Seq approach using RNA bait hybridization. Synthesizing these probes—that is, a large set of custom modified oligos—can be costly. Furthermore, short-read sequencing offers limited windows (<300 bp) to explore the interplay between methylation and chromatin accessibility at the resolution of individual DNA molecules.
A new approach combines four technologies to simultaneously characterize endogenous DNA methylation and chromatin accessibility in about 150 targeted regions with minimal use of expensive modified oligos:
The figure below summarizes the workflow. Results show approximately 80% on-target sequencing for regions of 940 bp. High sequencing coverages enable identification of rare epigenetic states (1 in 1000 epiallele frequency) and mechanistic insights. The longer reads reveal more epigenetic features at a distance, for example, the sliding of +1 nucleosomes on a bidirectional promoter.
Presented by Mark Driscoll, Ph.D., Chief Scientific Officer, Shoreline Biome
Traditional amplicon-based approaches for bacterial identification, such as the V1-V9 regions of the 16S rRNA gene, are insufficient to distinguish strains. Shotgun whole genome sequencing has the potential for strain differentiation but only at high coverages, leading to greater cost and analytical complexity.
The StrainID method uses a 2.5 kb amplicon that encompasses the entire 16S gene, part of the 23S gene, and the internal transcribed spacer (ITS) that intervenes (see figure below). Many bacteria have multiple copies of the 16S-ITS-23S locus in their genomes; for example, E. coli has seven and Klebsiella pneumoniae has eight. HiFi sequencing of the amplicon sequence variants (ASVs) enables a genetic fingerprint of the strain. Requiring 200X to 2000X less data per sample than high-coverage shotgun metagenomics, StrainID has sufficient sensitivity and resolution to identify novel bacteria and track them over time in infant fecal microbiomes3.
Presented by Melanie Kirsche, Ph.D. Candidate in Michael Schatz's Lab, Johns Hopkins University
Structural variants (SV) are large-scale genomic mutations that come in several varieties, including insertions, deletions, duplications, inversions, and translocations. Trio datasets are often analyzed to find de novo variants, which involves sequencing and comparing the genomes of a child and its parents. Structural variants are considerably trickier to identify than single nucleotide variants. Sequencing and mapping errors complicate the process of determining whether two or more SVs can be called the same. False positives are a major problem with standard variant calling programs, which can overestimate de novo SVs by as much as 200-fold.
Jasmine tackles SV calling by representing the variants as points in 2D space based on their chromosomal position and length (see figure below). An algorithm determines if nearby points on the graph should be grouped together, thus representing the same structural variant. Jasmine outperforms widely used comparison methods, demonstrating more than a 5-fold decrease in Mendelian discordance in trio datasets4. It achieves the highest confidence in SV calling with PacBio HiFi reads, as this sequencing data is best suited for capturing structural variants (see next section).
Presented by Edd Lee, Director of Marketing, Rare and Inherited Disease, PacBio
Rare diseases, in aggregate, affect 10% of the population, yet more than half of cases remain unsolved after short-read exome or whole genome sequencing. The issue is particularly acute for diseases caused by structural variants. Large insertions, deletions, duplications, and translocations can be very difficult to identify using short reads. Several medically relevant genes lie in “dark” genomic regions that cannot be assembled or aligned using short-read sequencing methods.
PacBio sequencing technology produces highly accurate long reads up to 15 kb with greater than 99.9% accuracy. It has the best performance for detecting all variant classes, including single nucleotide variants (SNVs), indels, and structural variants (see figure below). In the 2020 precisionFDA challenge, PacBio HiFi reads outperformed both short reads and noisy long reads5. Of 193 medically relevant genes with coverage issues (i.e., those located in dark regions), PacBio sequencing can detect 152 with 100% coverage. Thus, this technology has the power to elucidate the genetics behind many rare and heritable diseases.
Presented by Xin Li, Ph.D., Associate Professor, University of Rochester Medical Center
Sperm were considered a bottleneck for the transmission of epigenetic information across generations. The cells lose most of their cytosol and histones during development, and methylation patterns from paternal DNA are erased in early embryogenesis. However, miRNA from sperm was recently discovered to convey transgenerational effects. Are there mRNA transcripts in sperm that can do the same? Answering this question requires first identifying whether sperm cells contain intact mRNA. Previous work suggested that mRNA in sperm is highly fragmented. Short-read sequencing suffers from the inability to distinguish between fragmented and intact mRNA, as RNA fragmentation is an early step in NGS library preparation.
Sequencing on the PacBio platform enables full-length HiFi reads of transcripts. This approach, called isoform sequencing (Iso-Seq), can therefore distinguish between fragmented and long intact mRNA. It also provides unambiguous information about the transcript’s start, polyadenylation, and splice sites from a single read (see figure below). Analysis of the sperm transcriptome with Iso-Seq identified 3,440 long intact RNA species, of which 2,479 were novel isoforms and 198 were novel loci6.
Presented by Elizabeth Louie, Ph.D., Supervisor, Technical Applications, Azenta Life Sciences
Adeno-associated virus (AAV) is gaining popularity as a gene therapy vector for oncology applications. A recombinant AAV (rAAV) genome carries a transgene of up to ~4.5 kb, flanked by two inverted terminal repeat (ITR) regions. Contaminant DNA or mutated rAAV sequences introduced during viral packaging can affect clinical safety and efficacy. Thus, it’s imperative to examine genome integrity and measure the heterogeneity of packaged material, as part of quality control.
Sequencing on the PacBio platform enables full-length HiFi reads of the rAAV genome, providing comprehensive analysis of mutations in the packaged DNA. It can identify truncation hotspots and recombination events as well as quantify the abundance of each variant (see figure below). ITR regions, which are notoriously unstable7, can be sequenced with high accuracy. Azenta Life Sciences has optimized library preparation for the PacBio platform and developed comprehensive and customizable analysis pipelines to assess rAAV integrity, as part of its viral genome sequencing services.
PacBio sequencing provides highly accurate long reads that are ideal for analyzing larger features of the genome, epigenome, or transcriptome, especially when accurate reconstruction from short reads isn’t feasible. By capturing more information (of high quality) per read, HiFi sequencing enables powerful tools to explore chromatin structure, monitor bacterial pathogens, identify structural variants, discover novel transcripts, and assess the quality of gene therapy products. For more details, watch the full recording of Decoding the Complexity of Human Health – A HiFi Vision.
Have a question about PacBio HiFi sequencing? Feel free to reach out to one of our technical experts. We’ll gladly discuss your project and help you figure out if long-read sequencing is the best solution.