Analysis: STRUCTURE Program
STRUCTURE is an open-source software developed at Stanford University that we use to analyze the results of hybridization or genetic purity assays. For this type of study, we currently use a technique called amplified fragment length polymorphism (AFLP) that compares a sample with a reference population using 119 different genetic markers. The STRUCTURE program analyzes the results of AFLP assays to determine the relationship—the degree of similarity or difference—between the sample and the reference population.
DNA Purification: Spin-Column DNA Purification
Molecular DNA testing requires the extraction of DNA from larger samples. Although there are many methods for extracting DNA, the most commonly used method by far is spin-column DNA purification. At Pisces, we use this technique nearly exclusively. In spin-column purification, a sample is first treated with an agent that causes cell membranes to break down and is then placed in a spin column. The spin column—which consists, simply, of silica membranes—is spun in a centrifuge, and this procedure causes DNA to bind to the silica membranes. The spin-column technique is quick, efficient, and produces high-purity DNA extract. However, there are some quirks involved in purifying DNA from eDNA samples. Humic acid, which occurs naturally in many environmental samples, can co-purify with DNA in spin columns and subsequently inhibit PCR reactions.
DNA Sequencing
DNA sequencing is the determination of the order of the nucleotides (adenine, cytosine, guanine, and thymine) in DNA. There are a number of methods for sequencing DNA, including Sanger (or capillary) sequencing and next generation sequencing (NGS). DNA sequencing is critical for PCR assays, because no matter what species we’re trying to detect, every assay must amplify a particular, unique DNA sequence in the target organism’s genome. In theory, there are a vast number of options for targets; for example, the human genome and the rainbow trout genome have about three and two billion base pairs, respectively. In practice, however, molecular biologists frequently select target sequences from within the mitochondrial genome, because each cell contains hundreds or even thousands of mitochondria and, consequently, hundreds or thousands of copies of the mitochondrial genome—a sharp contrast to the two copies of the nuclear genome in most cells.
DNA Sequencing: CO1 Gene
Every qPCR assay requires a DNA target. The CO1 gene —short for cytochrome c oxidase subunit 1—is a mitochondrial gene commonly used by molecular biologists as a target for eDNA assays. There are several reasons that make CO1 a popular choice. First, because it is a mitochondrial gene, there are hundreds of copies of the gene per cell, far more than the two copies of any given nuclear gene in each cell; hence, there is more DNA available to extract from a sample. Second, it codes for an enzyme critical to energy metabolism; consequently, it is present in all eukaryotes.
Third, a bit of history. In 1994, a team of researchers created primers for amplifying CO1 that work for nearly all eukaryotes. These primers (called the Folmer primers after one of the researchers that developed them) led to CO1 being sequenced in many organisms. In 2003, a biologist named Paul Hebert proposed the use of CO1 sequences (rather than morphology) in identifying species, a technique he dubbed DNA barcoding. DNA barcoding has led to the sequencing of CO1 in (to date) more than 234,000 animal species. These sequences are readily available in open-source databases like GenBank. Thus, it has become common practice for a molecular biologist designing a new eDNA assay to begin by searching GenBank for the CO1 sequence for the target organism.
Using CO1 as a target for eDNA assays is convenient, but it does have limitations. For example, because it is so well-conserved, there is greater danger of an assay with a CO1 target cross-reacting with the CO1 sequence of other (especially closely related) species. When the Wyoming Game and Fish Department asked us to develop an eDNA assay for the hornyhead chub, a locally endangered fish they are working to reestablish in historic habitats, we created a computer program to scan the entire mitochondrial genome to find areas of maximum divergence from closely related species and found that the best region for qPCR primers was not in CO1 but rather in the ND2 gene sequence.
DNA Sequencing: Next Generation (DNA) sequencing (NGS)
Next generation (DNA) sequencing (NGS) refers to a range of techniques that use similar chemistry to capillary sequencing but different ways of reading the data. NGS techniques have much higher throughput than capillary sequencing; a typical NGS instrument can sequence three billion base pairs overnight. Thus, NGS can capture as much data in one day as 10 years of capillary sequencing did in the Human Genome Project. However, such an abundance of data requires much more elaborate analyses in order to make sense of it. For instance, if three billion base pairs are read by NGS, they will not all be unique; overlaps must be sorted out after the sequences have been read. Capillary sequencing, by contrast, requires a primer that identifies the gene or gene fragment being sequenced. Presumably, there is also a reason for choosing a particular gene or gene fragment—for example, in order to find a suitable portion to use as the target sequence in a qPCR assay.
DNA Sequencing: Sanger or Capillary (DNA) sequencing
Capillary sequencing—also known as Sanger sequencing (named after the developer of the technique, the late Frederick Sanger, who won the Nobel prize in chemistry twice)—relies on an ingenious chemical technique that labels DNA fragments with fluorescent dye; the color of the dye corresponds to the terminal base on a given fragment (A, C, G, or T). The dye-labeled fragments are moved through a very fine capillary by electrophoresis. As a fragment moves through the capillary, it passes a laser that excites the fluorescent dye. A camera captures the color of the dye, and based on the color, the sequencing apparatus records the base at the end of the fragment. Because the fragments move through the capillary in order according to their size (smallest to largest), the exact sequence of the target DNA can be determined. Capillary sequencing instruments typically handle 4 to 96 samples at a time and have the capacity to read about 1,000 bases per sample.
Capillary sequencing was used for the Human Genome Project; the project took approximately 10 years to sequence the three billion base pairs in the human genome. Although the technique has been around since the 1970s, it is more than 99.99% accurate and is still considered the gold standard for DNA sequencing. Capillary sequencers are still used at Pisces and many other labs around the world.
Environmental DNA (eDNA)
Environmental DNA refers to a technique in molecular biology that utilizes the polymerase chain reaction (PCR) to detect the presence of organisms in an environment without the need to capture or even see them. All organisms shed bits and pieces of themselves, such as skin cells, hair, mucus, feces, and urine. Nearly all of this organic debris contains cells, and the cells contain DNA. By filtering a particular environment, we can collect organic debris, isolate DNA from it, and use the DNA to detect the presence of a species. In most instances, eDNA samples are collected by filtering water; sometimes they are collected by filtering air or soil.
Polymerase Chain Reaction (PCR)
Polymerase chain reaction—best known by its acronym, PCR—is a technique invented in 1983 that allows the exponential amplification of specific DNA sequences. First, DNA (a polymer) is melted, causing it to denature, or unzip into two separate strands. A pair of short synthetic nucleic acid molecules, called primers, are used to target the beginning and ending of the sequence to be amplified. A polymerase, an enzyme added to the reaction, catalyzes the synthesis of two copies of the target sequence (one from each strand of the original double-stranded DNA). These copies become templates for the creation of more copies, and a chain reaction occurs when the process is repeated. During the exponential phase of the reaction, the number of copies doubles in each cycle. The chain reaction results in billions of copies of the target DNA sequence in just a few hours. PCR enables molecular biologists to detect the presence or absence of miniscule quantities of DNA from a specific species in a sample—such as that of chytrid fungus on a swab or that of a quagga mussel on a tow net.
Polymerase Chain Reaction (PCR): Internal Positive Control (IPC)
Internal positive controls (IPCs) are used in a PCR assay to test for inhibition of the PCR reaction and for correct reaction setup. To run an internal positive control, we add the reaction components for a second target, a control, into a PCR assay. If the control target is amplified when we run the assay, we know that the reaction was set up properly and that the PCR reaction was not inhibited. As an example, consider an eDNA sample that was tested for rainbow trout and returned a negative result. Without an IPC, we have no way of determining if this result was a false negative. With an IPC, we know more—if the IPC target was amplified, our unknown sample was truly negative for rainbow trout DNA; if the IPC target was not amplified, our PCR assay results are void, either because of PCR inhibition or because of an error in setup.
Polymerase Chain Reaction (PCR): Limit of Detection (LOD)
Limit of detection (LOD) is a way of expressing the sensitivity of a qPCR assay; a lower LOD indicates a more sensitive assay. LOD can be defined as the least amount of target DNA that can be detected at a given probability. A common expression for limit of detection for a qPCR assay is, “We can detect x molecules of target DNA 95% of the time when we run multiple reactions.” Most well-designed qPCR assays have a limit of detection of three molecules of target DNA. The fewer molecules of target DNA there are in a sample, the greater the odds that we will not withdraw any target DNA when we take an aliquot for an assay; this is known as stochastic (or random) sampling error. The more replicates we run of an assay, the lower the stochastic sampling error and the lower the limit of detection. However, running more replicates involves greater effort, time, and cost. At Pisces, we typically run three replicates of all eDNA assays.
Polymerase Chain Reaction (PCR): Positive Controls—Genomic DNA versus Plasmid DNA versus gBlocks
Positive controls—samples that are known to contain a given target DNA sequence—are used to develop the standard curve for a qPCR assay or to confirm that an assay with an unknown sample is running properly. The source of the positive control (known sample) can vary as follows:
• Genomic DNA positive controls are prepared using a tissue sample from the target organism. For example, if an assay is intended to detect rainbow trout, genomic DNA can be extracted from a rainbow trout fin clip and used as a positive control by preparing a series of tenfold dilutions of rainbow trout DNA (see Standard Curve for more detail). Genomic DNA only provides a relative control, because we don’t know how many copies of the target sequence are in each of the dilutions used to create the standard curve. Consequently, it is difficult to compare qPCR results for assays with genomic DNA controls between labs or even within the same lab across any significant period of time.
• Plasmid DNA positive controls are prepared by inserting a synthetic copy of a target DNA sequence into plasmid DNA from E. coli. By calculating the weight of a plasmid, we can calculate the number of plasmids in a given quantity of DNA. Because each plasmid carries only one copy of the target sequence, the number of plasmids is equal to the number of copies of the target sequence in a given control. Therefore, plasmid DNA positive controls provide an absolute standard, and their use enables comparison between labs and assays carried out at different times.
• Gene synthesis companies can manufacture synthetic pieces of DNA to be used as positive controls, usually in about two days. gBlocks is a trade name for one such product (from Integrated DNA Technologies, Ltd.), but the name is used ubiquitously, like Kleenex is for tissues. gBlocks are ordered simply by typing the required target DNA sequence into an online ordering system. Like plasmid DNA positive controls, gBlocks provide an absolute standard that facilitates comparison of qPCR results.
Reaction Efficiency
If a qPCR assay has a reaction efficiency of 100%, then during the exponential phase of the PCR reaction, the number of copies of the target DNA will double with each cycle. Reaction efficiency is measured by checking the slope of the standard curve; a reaction efficiency of 100% results in a standard curve with a slope of −3.3. In order for the measurements from a qPCR assay to be considered valid, the reaction efficiency must be greater than 90%. Assays are optimized to have reaction efficiency as close to 100% as possible by adjusting the amounts of the reaction components and the reaction (annealing) temperature.
Polymerase Chain Reaction (PCR): Quantitative PCR (qPCR) versus Real-Time PCR versus Endpoint PCR
Quantitative PCR (qPCR) and real-time PCR are the same technique, described in two different ways; the terms are interchangeable. Real-time refers to the fact that qPCR measures the amplification of target DNA in real time during the exponential phase of a PCR reaction; quantitative refers to the results, which indicate how many molecules of the target DNA were present in the sample. These two terms differentiate qPCR/real-time PCR from endpoint PCR (a newer term for the original PCR technique). An endpoint PCR assay has a qualitative result; it detects the presence or absence of the target DNA sequence. A qPCR assay has a quantitative result; it indicates how many molecules of the target DNA were present at the beginning of the assay. For more information on PCR results, see Interpreting Results.
Polymerase Chain Reaction (PCR): SYBR Green Assay
The term SYBR Green assay refers to the second of the two most common methods by which qPCR measures the amplification of a target DNA sequence in real time. SYBR Green is a fluorescent dye that binds only to double-stranded DNA; its fluorescence dramatically increases when this binding takes place. When the DNA target is amplified as a qPCR reaction proceeds, there is more and more double-stranded DNA. When the SYBR Green dye binds to the double-stranded DNA, fluorescence increases. By measuring the level of fluorescence and comparing it with a standard curve, we can quantify the amount of target DNA present at the beginning of the assay. Unlike TaqMan assays, SYBR Green assays do not require the use of a probe, which simplifies assay design and makes SYBR Green assays less expensive. However, SYBR Green binds to all double-stranded DNA—not just the target sequence. Consequently, SYBR Green assays cannot have multiple targets or be multiplexed and are more susceptible to false positives. In addition, because SYBR Green causes both target and non-target DNA to fluoresce, the quantitative results of a SYBR Green assay may have a lower signal-to-noise ratio (that is, higher background fluorescence) than those of a TaqMan assay.
Like many decisions in fisheries, wildlife, and conservation biology, deciding when to use a TaqMan assay versus a SYBR Green assay is a matter of trade-offs:
• Because of the use of a probe that binds to the target sequence, TaqMan assays can have multiple targets or be multiplexed; however, the use of a probe makes TaqMan assays more complicated to design and more expensive.
• The fact that SYBR Green assays don’t use a probe means that they cannot have multiple targets or be multiplexed; however, the lack of a probe makes SYBR Green assays easier to design and less expensive.
• SYBR Green assays are no less accurate than TaqMan assays, but because SYBR Green binds to all double-stranded DNA, they are more susceptible to false positives.
Polymerase Chain Reaction (PCR): Standard Curve
Regardless of the method used in a qPCR assay—TaqMan or SYBR Green—the number of target DNA molecules in a sample is determined by measuring the fluorescence in the reaction and comparing it with a standard curve prepared for that assay. Fluorescence is measured during the exponential phase of a qPCR assay, when the number of copies of the target DNA sequence is doubling with each cycle. The reaction cycle at which the fluorescence can be discerned above any background fluorescence is referred to as the threshold cycle (Ct) or quantification cycle (Cq). To create a standard curve, assays are run with a series of tenfold dilutions (e.g., 104, 103, 102, 101, and 100 copies per unit volume) of the target DNA (see positive controls). When the Cq values are plotted against the log copy numbers of the corresponding dilutions, the resulting line is the standard curve. The number of target DNA molecules in an assay with an unknown sample can then be determined by comparing the assay’s Cq value to the standard curve.
Polymerase Chain Reaction (PCR): TaqMan Assay
The term TaqMan assay refers to one of the two most common methods by which qPCR measures the amplification of a target DNA sequence in real time. The trade name TaqMan is a composite name drawn from (1) the Taq polymerase used in the process, and (2) the classic video game Pac-Man. The action of the video game is an illustration of how a TaqMan assay works. A TaqMan assay utilizes a probe—a nucleotide chain that binds to the target DNA sequence. The probe has a fluorescent dye on one end and a quencher molecule on the other end. When the dye and quencher are in close proximity, the quencher prevents the dye from fluorescing. As the Taq polymerase moves down a single strand of DNA synthesizing a double-stranded copy, it gobbles up the probe like Pac-Man, causing the dye and quencher molecules to separate and allowing the dye to fluoresce. Thus, fluorescence increases with each new copy of the target DNA sequence created. By measuring the fluorescence and comparing it with a standard curve, we can quantify the amount of target DNA present in the sample. TaqMan assays are more specific than other types of qPCR assays because the fluorescent probe binds specifically to the target DNA sequence. Because different color dyes can be used on different probes, TaqMan assays can have multiple target sequences and can also be multiplexed. For example, an eDNA sample could be tested for the presence of rainbow trout, brook trout, and brown trout DNA in one reaction using three different TaqMan assays as long as different color fluorescent dyes are used on the three probes.
Polymerase Chain Reaction (PCR): Threshold Cycle (Ct) or Quantification Cycle (Cq)
Threshold cycle (Ct) and quantification cycle (Cq) are synonymous terms referring to the cycle in a qPCR assay at which the level of fluorescence crosses a predetermined measurement threshold. Since the publication of guidelines for sharing the results of qPCR experiments in 2009, Cq is the officially preferred term. The threshold level is the level at which fluorescence is discernible above any background fluorescence in the assay. In a qPCR assay, the number of target DNA molecules in an unknown sample is determined by measuring the Cq value and comparing it to a standard curve. For example, suppose a standard curve shows that a control assay with 100 molecules of target DNA had a Cq of 20. If an unknown sample also has a Cq of 20, we know that there were 100 molecules of target DNA in the sample. The Cq value is inversely proportional to the number of target molecules in a sample. For example, if a control assay with 100 molecules of target DNA has a Cq of 20, the next dilution in the control series, with 10 molecules of target DNA, might have a Cq of a little over 23.