A molecular biologist is fortunate nowadays to have many options to analyze the genome or transcriptome. Next generation sequencing (NGS), quantitative PCR (qPCR), and Sanger sequencing (in conjunction with PCR) are used widely in genomics; however, choosing the best tool for your project isn’t always obvious. For example, all three technologies can provide useful information about a particular transcript: qPCR can measure its expression, PCR + Sanger can identify its sequence, and NGS can do both. There are several factors to consider but most important are the objectives of your experiment. It’s imperative that the assay type is well suited to answer your biological question. To help your decision-making, we offer an interactive selection guide (see above) as well as practical information about PCR + Sanger, qPCR, and NGS approaches.
Sanger sequencing remains the gold standard for DNA sequencing. It determines the sequence of one DNA strand by taking snapshots of its synthesis. The method relies on in vitro polymerization using fluorescent dye terminators and capillary electrophoresis to separate and detect the sequencing products. It can be paired with PCR to interrogate regions of the genome or transcriptome in a relatively fast two-step process. PCR bulks up the target DNA in sufficient quantity to be analyzed by Sanger sequencing. Genomic DNA cannot be analyzed directly by Sanger sequencing, as the technology requires a high number of copies of the target sequence.
Input sample: Purified genomic DNA or cDNA. A region of interest, usually less than 1 kb, will be amplified via PCR, purified, and then sequenced. Sanger sequencing requires a homogeneous template for best results, so the PCR assay should be optimized to eliminate off-target amplification.
What can be targeted: A region of the genome or a transcript. Each Sanger sequencing reaction usually provides at least 500 bp of high-quality data. Amplicons larger than 500 bp can be fully analyzed using multiple sequencing reactions. As both PCR and sequencing rely on primers, the locus of interest (or more precisely, the primer binding sites) must be defined beforehand. However, the sequence between the primers can be unknown, enabling sequence discovery.
Number of targets: 1 per assay. No multiplexing is possible with the PCR + Sanger method. Only one target is amplified per PCR reaction per sample, and the resulting product is used for one or more sequencing reactions.
Output data: Chromatograms (also known as trace or AB1 files) are used to determine the sequence of the PCR product (in a SEQ or FASTA file). If more than one sequencing reaction is performed per amplicon, the data can be assembled into a contiguous sequence. Sanger sequencing can detect a certain degree of heterogeneity, but it doesn’t provide quantitative information. For example, a single nucleotide polymorphism (SNP) from a heterozygous individual would appear as a mixed base in the chromatogram—that is, two overlapping peaks at the same position. The data quickly becomes less interpretable with greater diversity in the amplicon population. Amplification and Sanger sequencing of the 16S rRNA gene from a complex microbial community would produce an indecipherable chromatogram of overlapping traces.
Polymerase chain reaction (PCR) amplifies a target region in an exponential manner. By measuring the amount of newly synthesized DNA after each cycle, the starting amount of target DNA/RNA can be calculated. Quantitative PCR, also known as real-time PCR, uses fluorescence-based imaging to measure the amount of amplified DNA in real time during the reaction. The fluorescent label can be an intercalating dye, such as SYBR™, that preferentially binds to double-stranded DNA or a probe—that is, a modified oligo with an attached fluorophore and quencher.
Input sample: Purified genomic DNA or complementary DNA (cDNA). For gene expression analysis, total RNA is isolated and converted into cDNA via reverse transcription (RT). It’s possible to perform cDNA synthesis and qPCR amplification in the same tube, a method known as one-step RT-qPCR, which reduces sample handling.
What can be targeted: A region of the genome or a transcript. Typically, a region of 70-200 bp will be amplified, although longer amplicons are possible. The PCR primers and probe, if applicable, provide target specificity. As these oligos must be designed and synthesized prior to performing the assay, qPCR is limited to targeting known sequences.
Number of targets: 1 to 5 per reaction. Multiplexing is possible with multiple probes, each targeted to a unique sequence and conjugated to a different reporter dye.
Output data: The quantification cycle (Cq), also known as the threshold cycle (Ct), is the key result, calculated from the raw data—the plot of fluorescence over time. It represents the cycle number where the signal from amplification exceeds background fluorescence. A lower Cq value correlates with a higher amount of the input target sequence, such as higher gene expression for RNA samples. Quantitative PCR can be also used as a highly sensitive and reproducible detection assay with a well-defined limit of detection.
Next generation sequencing is a broad term for several technologies and techniques that enable high-throughput sequence analysis. It provides both qualitative and quantitative data, combining the advantages of qPCR and Sanger sequencing. It can be used to comprehensively examine the entire genome/transcriptome or deployed in a targeted manner to analyze one or a few loci. The general strategy involves creating a library of DNA molecules, often large in number and diverse, with common flanking sequences containing binding sites for universal sequencing primers. The molecules are then sequenced in parallel, the details of which depend on the platform used.
Input sample: Genomic DNA or total RNA.
What can be targeted: Some or all of the genome/transcriptome. Whole genome sequencing and RNA sequencing (RNA-Seq) are comprehensive and unbiased methods to profile the genome and transcriptome, respectively. Many targeted NGS methods are available and may involve using hybridization baits or PCR for selection of one to thousands of loci prior to sequencing.
Number of targets: 1 to >10,000.
Output data: Raw data as FASTQ files. Bioinformatics is used to assemble sequences, calculate read counts, and perform downstream analysis (e.g., differential gene expression).
|PCR + Sanger
|Number of targets per reaction
|1 to 5
|1 to >10,000
|~500 bp per sequencing reaction per PCR product
|Typically 70 to 200 bp
|Up to >100 Gb
PCR: 1-3 hours
Sanger sequencing: ~8 hours
Library prep: Hours to days
Sequencing: Hours to days
|Relative cost per reaction
|$$ to $$$$
With so many genomics tools available today, it can be difficult to know which assay suits your project best. By considering a few factors, you can quickly zero in on the right answer. The guidelines in this article are by no means exhaustive, but they provide a good starting point when designing your genomics project.