USFWS
Genetic Monitoring for Managers
Alaska

 

Principles of Conservation Genetics

Genes and Genomes

The genetic code is like a mystery novel, history text, and time log all wrapped into one. The goal of a geneticist is to learn how to read these books written in a simple 4 letter alphabet and decode critical information about the species under study. In this section we will learn some of the basics of the DNA structure and function so we can later see how this information is used by conservationists and managers.

What is DNA?dna

DNA, or deoxyribonucleic acid for those that want to impress their friends, is composed of a sequence of molecules called nucleotides. There are only 4 different nucleotides in DNA that create the genetic code and these are often referred to by a single letter: Adenine (A), Thymine (T), Cytosine (C), and Guanine (G). Yes, the text for your entire body is written in a code made up of 4 letters! These nucleotides can also be referred to as base pairs because the nucleotides bond in pairs that are always the same (A+T pairs and C +G pairs). Sequences of nucleotides acts as a code that provides instructions to the cell to build proteins, and ultimately the entire organism (but this is getting ahead of ourself).

Most of us are familiar with the spiral staircase shaped DNA molecule. This is simply the nucleotides joined together into long strands along with their complimentary strands. The "rungs of the ladder" are the bonds between the A+T pairs or the C+G pairs. Double helix is another common term used for describing the twisted structure of the DNA molecule.

There are an estimated 3 billion base pairs in the human genome (a genome is simply all the DNA in the cell). If you think of the nucleotides as letters, it would take a person typing 60 words per minute, eight hours a day, around 50 years to type the human genome! For those of us typists that "hunt and peck" letters, we better start typing now! Amazingly, the majority of our cells contain all of this 3 billion bits of information (I can't say all the cells because our blood cells don't have a nucleus, thus no nuclear DNA).

Organization inside the Cell

The DNA inside the nucleus of a cell is not randomly scattered like toys in my kid's room. It is neatly organized into structures called chromosomes, each of which contains thousands of genes. Chromosomes come in pairs with one inherited from each parent (for this you can blame your parents). As a consequence, animals have two different copies of each gene. In humans there are 23 pairs of chromosomes (46 total) in each cell, but the number of chromosomes varies widely between species. The number of chromosomes is not related to the developmental complexity or intelligence of the species - humans have 46 chromosomes, chickens have 78 chromosomes and king crabs have 208! Of course this number is small compared to some plants…after all, the Adder's Tounge Fern has approximately 1,440 chromosomes.

Sex is also determined by chromosomes. Most animal species have one set of sex chromosomes. In mammals these sex chromosomes are denoted by the letters X and Y. All of the genes required for male development are located on the Y chromosome (yes the smaller of the two chromosomes shown below…directly related to intelligence, ability to ask for directions, and listening skills, right?), therefore individuals with two X chromosomes are female, and individuals with an X and a Y chromosome are male. This is the basis for many of the sex tests we will discuss later from forensic type samples (i.e., hairs). The figure below on the left shows the how the DNA strands are wound tightly into chromosomes found in the cell's nucleus. The figures on the right are microscopic photographs of human chromosomes, the top shows the 46 chromosomes in a cell's nucleus, and the bottom photograph shows a close-up of tightly wound DNA strands in the X and Y sex chromosomes.

replication

 

 

 

 

 

 

 

 

Making New DNAstrand-sep

For an organism to grow its cells must divide and be able to copy (replicate) the DNA so that all cells contain identical genetic information. This replication is done in a few steps (which laboratory geneticists try to emulate - but again we are getting ahead of ourselves). In the first step (shown in the figure below) the strands are separated. Next each strand's complementary DNA sequence is recreated by bonding the complimentary base pair onto the original strand. In this way, DNA acts as its own template to create copies of itself. One of the critical chemicals that is involved in DNA replication is something called a polymerase, we will mention this molecule later on as it has become the most important tool in both geneticist's and medical researcher's toolboxes.
Now that we have a basic understanding of DNA structure we need to take just a minute to go over some important terminology that will help us later communicate …

Terminology

We refer to all of the genetic information possessed by an organism as its genome. A gene is a specific sequence of nucleotides in a genome that codes for a particular trait. A given gene is always found at a specific location (or locus or loci) in a genome. A more general term is genetic marker, which refers to any place in the genome. This term, genetic marker, originated with the medical field that early on couldn't readily identify genes, but could determine if individuals had certain DNA sequences at certain places (loci). Some of these were associated with diseases. So, if individuals that had these "genetic markers" they would likely have a disease. As we advance our understanding of the genome, we can find the exact gene that is associated with the disease, increasing our ability to predict who will be susceptible to which diseases.

Obviously, each human has genes that express themselves differently. While we all have the genes for hair color, some of us have a gene for blond, while others have a gene for brown hair. We need a name for these different forms of genes. Thus, alleles are the different versions of a gene passed down from each parent and are essentially slight variations in the DNA sequence at the same locus. Each individual has 2 copies or alleles of each gene, thus an individual can have 2 copies of the same allele, or 2 different alleles. Organisms with two copies of the same allele are called homozygous, while organisms with two different alleles are heterozygous. The set of alleles for a given organism is called its genotype, while the physically observable traits that the organism displays has is called its phenotype.

The figure below is called an electrophoritic gel; it is how geneticists look at different loci. Each column is a unique individual at 1 locus. In most studies tens or hundreds of loci are put together to form a barcode (i.e., a person or animal's genotype). Sample 1 has two copies of allele 2, whereas sample 2 has 2 different alleles. While the lack of genetic variation (i.e., having only 1 allele) is not so important at just one locus, if the majority of an individual's genome is homozygous this can spell trouble…

gel

mtdnaWhat is mitochondrial DNA?

Most of us think of nuclear DNA when we hear the term genetics. Every cell in an individual has one nucleus and within the nucleus is a collection of chromosomes that contain all of the genetic information necessary to build that organism as described above. This is what we term nuclear DNA. However, organisms actually have two different types of DNA in their cells. There is a second type of DNA found in the mitochondria of the cells which we refer to as mitochondrial DNA or mtDNA.

Mitochondria are organelles found in every cell that are responsible for producing cellular energy. Their ancestry is not fully understood, but, mitochondria are thought to be descended from ancient bacteria, which were engulfed by cells in the evolutionary process. Yes, each human now has an alien being called a mitochondria living in each of our cells. Mitochondrial DNA is a small circular structure (human mtDNA has ~17,000 base pairs) that primarily codes for cellular machinery and not for exterior traits. In fact, mitochondria are often called the powerhouses of the cell for their ability to generate chemical energy for the cells to use. The figure below shows the circular mitochondrial DNA molecule in a mammalian cell with its gene regions and some specific gene loci.

 

There are differences between nuclear and mtDNA that have important implications in their use in conservation genetics.

  1. There are hundreds to thousands of mitochondria (and their DNA) in each cell versus just one copy of the nuclear genome. Therefore mtDNA is often easier to obtain from poor quality genetic samples (hair, scat) than nuclear DNA. So when we want to design tests to determine which species left a hair, we try to make it a mtDNA test.
  2. mtDNA is maternally inherited. All mitochondria in an organism are derived from the mitochondria present in the female's egg cell. So, everyone reading this (males and females) have the mitochondrial DNA from their mother, grandmother, and great-grand mother. Sorry but dad contributes nothing in terms of mtDNA. This makes mtDNA useful for investigating sex-specific dispersal or tracing maternal lineages over long time periods.
  3. Rather than having two alleles per gene as in nuclear DNA, individuals have one mitochondrial haplotype in their mtDNA that is passed down directly from the mother. Because of this mtDNA it is not subject to recombination (mixing of genes from different DNA strands or chromosomes). This reduces the amount of genetic variation in the mitochondria over time compared to nuclear DNA making mtDNA an effective tool in answering questions about the historic characteristics of populations.

 

DNA Fingerprinting
Contemporary biologists may take for granted the ability to obtain either diagnostic or population genetic information from genetic samples, but it wasn't until Sir Alec Jeffreys began studying DNA variation and the evolution of gene families through the use of "hypervariable" regions of human DNA that molecular biologists were able to produce a genetic fingerprint (Jeffreys 1985a, b). Jeffreys discovered that particular regions of the human genome, which consist of short sequences repeated multiple times, also contain a "core sequence" that could be developed into a tool called a probe. Further, Jeffreys recognized that these probes could be used to explore multiple regions of the human genome that also contain tandem repeats (two or more nucleotides sequentially repeated (e.g., CATG, CATG, CATG)-ultimately producing something biologically akin to a barcode-such that each individual has a unique genetic signature.

This technique became known as DNA fingerprinting, and instantly became a staple of forensic and paternity work worldwide. Two years after the development of DNA fingerprinting, Wetton et al. (1987) found these same Jeffreys probes to be useful for studying house sparrows (Passer domesticus) in the United Kingdom. Within a decade, DNA fingerprinting was common in many wildlife studies, rewriting conventional wisdom on mating systems and gene flow among populations (Wildt et al. 1987; Lynch 1988; Burke et al. 1989).

Initially, there were two major impediments to DNA fingerprinting for addressing wildlife issues. First, although Jeffreys probes provided a DNA "barcode," it was typically not possible to know which bars were associated with which locus. Thus, population genetic models that relied on locus-specific information needed to be adapted or abandoned. Second, high quantities of high-quality DNA were required for these barcodes to be visible on a standard electrophoresis gel. DNA fingerprinting from hair and other samples containing minimal or degraded DNA was therefore unreliable or impossible.

PCR
The next two breakthroughs, which have led to the recent boom in genetic techniques, were the development of the polymerase chain reaction (PCR and the discovery of new classes of genetic markers (Saiki et al. 1988; Tautz 1989; Weber and May 1989). PCR is the "amplification" of a gene, part of a gene, or part of any section of the genome. The process can be likened to a molecular photocopy machine, where a few short DNA fragments are copied many times, ultimately allowing visualization of the PCR product on a standard electrophoresis gel. PCR is conducted in a thermal cycler, which heats and subsequently cools a chemical process to precise temperatures during multiple steps (figure 9.1). Originally, during the PCR process, a critical enzyme (called a polymerase) would break down when DNA strands were heated, making PCR extremely labor intensive, as more polymerase needed to be added during every cycle in the process. The process was greatly facilitated by the discovery of a thermostable enzyme called Taq polymerase (derived from the organism Thermus aquaticus, discovered in hot springs in Yellowstone National Park, USA), which allows the polymerase chain reaction to be subject to extreme temperature increases and decreases without disintegrating the polymerase.

Schematic illustrating the process of deriving individual identification from hair samples. extraction

This process begins with DNA extraction, which produces DNA strands. Next, a particular region of the DNA is amplified using (in this case) microsatellite primers, and the polymerase chain reaction (PCR). The PCR process involves three major steps: (1) the denaturing of the double-stranded DNA molecule; (2) the attachment of primers at a particular locus in the genome; and (3) the extension of these primers to produce a copy of the original locus. After mulitple PCR cycles, millions-if not billions-of copies of the region are created, and can then be visualized on an electrophoresis gel. This gel image shows eight Canada lynx samples evaluated at one microsatellite locus. Even with one locus, multiple individuals can already be discerned, but samples 5 and 7 produce the same banding pattern at this locus. Ultimately, when additional loci were run, these two samples were determined to be from different individuals.


Primers
Primers are critical components of the PCR reaction. A forward and a reverse primer together act as bookends denoting the section of the genome to be copied. These primers can originate from either the mitochondrial or nuclear genomes of an organism. Deciding whether to examine sections of the nuclear or mitochondrial genome will depend largely on the goals of the study. For instance, if the goal is to determine species from hair or fecal samples, the mitochondrial genome is typically used. Mitochondrial DNA (mtDNA) is often less variable within a species than nuclear DNA but variable between species (see Wildlife Genetics in Practice: A Hypothetical Example later in this chapter). If the goal is to produce individual identification or fine-scaled population genetic information, however, nuclear DNA is often preferred.

Microsatellites
Currently, microsatellites are one of the most common genetic tools for producing individual identification from noninvasive genetic samples and for conducting population genetic analyses. Microsatellites belong to a class of primers that contain variable numbers of tandem repeats-in general, these repeats are two to five base pairs in length (Figure 1). They are highly variable in nearly all vertebrates, which ultimately allows the differentiation of individuals within a population.

Microsatellites have several advantages over Jeffreys DNA fingerprinting probes. First, microsatellite loci are codominant markers, meaning that alleles from both of the chromosome pairs in diploid organisms are observed. Second, when used for individual identification-genotyping in genetic terms-each pair of bars on the barcode is a separate microsatellite locus. Thus, a heterozygous individual (an individual with two different alleles) would have two bars (called bands or fragments) from a single microsatellite (see samples 1, 2, 5, 7 and 8 in figure 9.1). Alternatively, a homozygous individual has only one fragment at the microsatellite locus (see samples 3 and 4 in figure 9.1). The ability to distinguish loci from one another enables traditional population genetic models to estimate phenomena such as gene flow or relatedness (Wright 1969). Third, microsatellites are believed to be selectively neutral, conforming to many population genetic models. These properties, plus the ability to either inexpensively develop microsatellites for a particular species or use those already developed for related taxa, have made microsatellites a popular tool for molecular ecologists studying wildlife.

In summary, the coupling of PCR and microsatellite or mtDNA primers allows small amounts of DNA (e.g., from cells attached to the follicle of a single hair) to be transformed into a diagnostic identifier of individuals, species, and populations and makes genetic monitoring using small samples of DNA feasible.

Wildlife Genetics in Practice: A Hypothetical Example
Imagine a genetic sampling survey with the goal of estimating carnivore species diversity in a western forest. Samples in this hypothetical survey consist of feces (also called scat, pellets, dung, or turds, depending on the publication) located by scat detection dogs and hair snared at bait stations. In this section, we walk through the different analyses that are commonly conducted on such samples. It is important to note, however, that the particular molecular genetic techniques applied will depend on the objectives and species under study, as well as the expertise of the laboratory. One size doesn't fit all in molecular ecology.


Species Identification The first question of interest to wildlife researchers is often, what species were detected by my survey? There are several ways in which a laboratory can ascertain species identification, but one of the most common is to sequence a region of the mitochondrial genome. For identifying carnivore species in particular, a standard approach is to use the PCR reaction with primers for the 16S rRNA region of the mitochondrial genome (following the protocols in Hoezel and Green 1992; Mills et al. 2000). Most North American carnivores have a distinct sequence at the 16S rRNA region, and the majority of these species' sequences are entered into a national database (GenBank; National Center for Biotechnology Information 2007), which facilitates identification by matching sequences. For carnivores outside of North America, however, and for other taxa, reference sequences may not be available-although this is changing rapidly.

DNA barcoding is a new trend in molecular biology for species identification. With this approach, short, standardized DNA sequences-typically from a mitochondrial gene-are used to identify known species and to discover new species quickly and easily (Herbert et al. 2004; Savolainen et al. 2005). The initial goal of barcoding was to use a standardized region of the mitochondrial genome to uniquely identify all species, although this is proving difficult. Regardless, the barcoding databases established for many taxa will aid in developing surveys designed for carnivore species' detection worldwide.

DNA sequencing can be expensive. In some cases, however, we can reduce expenses approximately 35% by using a restriction enzyme test to ascertain species identification. Here, as in sequencing, we amplify the 16S rRNA region, but we then immerse the DNA in particular enzymes which cut the DNA at diagnostic "restriction sites." We can identify species by examining the patterns of restriction enzyme-digested, PCR-amplified mtDNA (figure 9.2). This was the approach taken with the thousands of samples collected in the USDA Forest Service's National Lynx Survey (McKelvey et al. 1999). The downside is that research is required to develop such assays. Further, when nontarget species are encountered, they can often not be identified to the species level and may even confound the identification of the target species. Given our hypothetical survey, with the goal of identifying every species that deposited a sample, we would likely sequence either 16S rRNA or another region of the mitochondrial genome and compare these sequences to known species sequences in a genetic database (e.g., GenBank). But if we only wanted to know whether or not the sample was deposited by a given target species, we might choose a restriction enzyme test (e.g., Paxinos et al. 1997; Mills et al. 2000; Dalen et al. 2004).

An electrophoresis gel showing the results of a restriction digest test to determine species identification. Without deploying any restriction enzymes, separation of felids and canids is possible. Using two different enzymes, Canada lynx, bobcats, and mountain lions can be discerned.

enzyme cuts

Gender Determination
Let's suppose that laboratory results show our hypothetical survey to have detected ten gray wolves (Canis lupus), eight Canada lynx (Lynx canadensis), four fishers (Martes pennanti), one elk (Cervus elaphus), and one bushy-tailed woodrat (Neotoma cinerea). The forest manager may want to know if there are any female lynx or gray wolves present in the forest (i.e., to assess whether these might be breeding populations). One of three genes is typically used to identify gender in carnivores. The first is the SRY gene (the testis determining factor), present only on the male Y chromosome. When a sample from a male is analyzed with SRY-specific primers, one band appears on an electrophoresis gel. If the sample is from a female, no bands appear. Unfortunately, a negative result (i.e., no band) can mean either that the sample originated from a female, or that it was of low quality and did not contain adequate amounts of DNA. Therefore, it is common for researchers to coamplify a microsatellite locus with the SRY gene, which amplifies regardless of gender. Failure of the microsatellite to amplify signals that the DNA was of poor quality and the results should be discarded. Multiple repeats of this process are recommended for accuracy.

A second method for identifying sex is to sequence a gene in the zinc-finger region (ZF) of the X and Y chromosomes. In felids, the ZFY (male) band has a three-base pair deletion compared to the ZFX. Thus, a male lynx-and males of most other mammal species-will show two bands on an electrophoresis gel (i.e., a band for the X chromosome and a band for the Y chromosome, which vary in length because of the deletion on Y), whereas a female will only show one band (i.e., females have two X chromosomes, with no length variants; Pilgrim et al. 2005). Similar tests have been published for canids, cetaceans, and bovids.

Last, a similar gene, called the amelogenin gene (which codes for proteins found in tooth enamel) has a twenty-base pair deletion on the Y chromosome of some species. For example, this provides another gender determination test that works for felids (Pilgrim et al. 2005) as well as ursids (Poole et al. 2001).


Individual Identification
Our hypothetical research reveals that, of the eight lynx samples, five were produced by males and three by females. The next common question might be, how many individuals are represented by these samples? While several tools exist to provide individual identification, the most common are microsatellites. For lynx, a panel of six microsatellites is frequently used to determine individual identification (Schwartz et al. 2004). In a lynx study in Minnesota, six microsatellites provided a probability of identity of 1.55 × 10-06 (M. Schwartz, unpubl. data), which translates to the probability of two randomly chosen lynx in the Minnesota population having identical genotypes as being 1 in 645,161. Given that surveys to date have detected fewer than two hundred lynx in Minnesota, the survey was deemed to have had sufficient power to distinguish individuals with six microsatellites. The number of microsatellites necessary for individual identification depends on the amount and distribution of genetic variation in the species (characterized by the probability of identity; Waits et al. 2001). In other work, as few as four microsatellites, or as many as ten, have been required to achieve a reasonable probability of identity, depending on the population and its history (e.g., small and inbred populations tend to have little variability).

In cases where existing microsatellites have low variability, one solution can be to develop microsatellites specifically for the population of interest. Given that microsatellites show an ascertainment bias (i.e., they are more variable in the species and, in some cases, the population for which they are developed), this approach can result in variable microsatellites for the target population, thus requiring fewer microsatellites for individual identification. Today, a number of commercial companies can quickly develop variable microsatellites for a target population at a reasonable cost (e.g., $10,000- $15,000 USD).

The decision to develop microsatellites for a particular species, versus assessing whether suitable primers have already been developed from a closely related species, lies in the initial costs of developing markers, the availability of markers for a related species, and the purpose of the project. For instance, a recent study seeking a panel of microsatellites for sampling mountain beavers (Aplodotia rufa) found that because there are no other members of this genus and typical hair samples consisted of single hairs (ie. small amounts of DNA) made it advisable to develop a suite of microsatellites specifically for this species (Pilgrim et al. 2006).

Once sufficient power to discriminate between individuals is achieved, the resulting microsatellite genotypes can be compared to determine the number of unique individuals (figure 9.1). When employing microsatellites to identify individuals with noninvasively collected genetic samples, it is important that some method be used to ensure that the resultant data are error free (see below). It should be noted that other genetic tools can be used to distinguish individuals (see Avise 2004 for a description of these techniques)-with each having its own benefits and limitations.