Principles of Conservation Genetics
Genes and Genomes
The genetic code is like a mystery novel, history text, and time
log all wrapped into one. The goal of a geneticist is to learn
how to read these books written in a simple 4 letter alphabet
and decode critical information about the species under study.
In this section we will learn some of the basics of the DNA structure
and function so we can later see how this information is used
by conservationists and managers.
What is DNA?
DNA, or deoxyribonucleic acid for those that want to impress
their friends, is composed of a sequence of molecules called nucleotides. There are only 4 different nucleotides in DNA
that create the genetic code and these are often referred to by
a single letter: Adenine (A), Thymine (T), Cytosine (C), and Guanine
(G). Yes, the text for your entire body is written in a code made
up of 4 letters! These nucleotides can also be referred to as
base pairs because the nucleotides bond in pairs that are always
the same (A+T pairs and C +G pairs). Sequences of nucleotides
acts as a code that provides instructions to the cell to build
proteins, and ultimately the entire organism (but this is getting
ahead of ourself).
Most of us are familiar with the spiral staircase shaped DNA
molecule. This is simply the nucleotides joined together into
long strands along with their complimentary strands. The "rungs
of the ladder" are the bonds between the A+T pairs or the
C+G pairs. Double helix is another common term used for describing
the twisted structure of the DNA molecule.
There are an estimated 3 billion base pairs in the human genome
(a genome is simply all the DNA in the cell). If you think of
the nucleotides as letters, it would take a person typing 60 words
per minute, eight hours a day, around 50 years to type the human
genome! For those of us typists that "hunt and peck"
letters, we better start typing now! Amazingly, the majority of
our cells contain all of this 3 billion bits of information (I
can't say all the cells because our blood cells don't have a nucleus,
thus no nuclear DNA).
Organization inside the Cell
The DNA inside the nucleus of a cell is not randomly scattered
like toys in my kid's room. It is neatly organized into structures
each of which contains thousands of genes. Chromosomes come in
pairs with one inherited from each parent (for this you can blame
your parents). As a consequence, animals have two different copies
of each gene. In humans there are 23 pairs of chromosomes (46
total) in each cell, but the number of chromosomes varies widely
between species. The number of chromosomes is not related to the
developmental complexity or intelligence of the species - humans
have 46 chromosomes, chickens have 78 chromosomes and king crabs
have 208! Of course this number is small compared to some plants
all, the Adder's Tounge Fern has approximately 1,440 chromosomes.
Sex is also determined by chromosomes. Most animal species have
one set of sex chromosomes. In mammals these sex chromosomes are
denoted by the letters X and Y. All of the genes required for
male development are located on the Y chromosome (yes the smaller
of the two chromosomes shown below
directly related to intelligence,
ability to ask for directions, and listening skills, right?),
therefore individuals with two X chromosomes are female, and individuals
with an X and a Y chromosome are male. This is the basis for many
of the sex tests we will discuss later from forensic type samples
(i.e., hairs). The figure below on the left shows the how the
DNA strands are wound tightly into chromosomes found in the cell's
nucleus. The figures on the right are microscopic photographs
of human chromosomes, the top shows the 46 chromosomes in a cell's
nucleus, and the bottom photograph shows a close-up of tightly
wound DNA strands in the X and Y sex chromosomes.
Making New DNA
For an organism to grow its cells must divide and be able to
copy (replicate) the DNA so that all cells contain identical genetic
information. This replication is done in a few steps (which laboratory
geneticists try to emulate - but again we are getting ahead of
ourselves). In the first step (shown in the figure below) the
strands are separated. Next each strand's complementary DNA sequence
is recreated by bonding the complimentary base pair onto the original
strand. In this way, DNA acts as its own template to create copies
of itself. One of the critical chemicals that is involved in DNA
replication is something called a polymerase, we will mention
this molecule later on as it has become the most important tool
in both geneticist's and medical researcher's toolboxes.
Now that we have a basic understanding of DNA structure we need
to take just a minute to go over some important terminology that
will help us later communicate
We refer to all of the genetic information possessed by an organism
as its genome. A gene is a specific sequence
of nucleotides in a genome that codes for a particular trait.
A given gene is always found at a specific location (or locus or loci) in a genome. A more general term is genetic
marker, which refers to any place in the genome. This
term, genetic marker, originated with the medical field that early
on couldn't readily identify genes, but could determine if individuals
had certain DNA sequences at certain places (loci). Some of these
were associated with diseases. So, if individuals that had these
"genetic markers" they would likely have a disease.
As we advance our understanding of the genome, we can find the
exact gene that is associated with the disease, increasing our
ability to predict who will be susceptible to which diseases.
Obviously, each human has genes that express themselves differently.
While we all have the genes for hair color, some of us have a
gene for blond, while others have a gene for brown hair. We need
a name for these different forms of genes. Thus, alleles are the different versions of a gene passed down from each parent
and are essentially slight variations in the DNA sequence at the
same locus. Each individual has 2 copies or alleles of each gene,
thus an individual can have 2 copies of the same allele, or 2
different alleles. Organisms with two copies of the same allele
are called homozygous,
while organisms with two different alleles are heterozygous.
The set of alleles for a given organism is called its genotype,
while the physically observable traits that the organism displays
has is called its phenotype.
The figure below is called an electrophoritic gel; it is how
geneticists look at different loci. Each column is a unique individual
at 1 locus. In most studies tens or hundreds of loci are put together
to form a barcode (i.e., a person or animal's genotype). Sample
1 has two copies of allele 2, whereas sample 2 has 2 different
alleles. While the lack of genetic variation (i.e., having only
1 allele) is not so important at just one locus, if the majority
of an individual's genome is homozygous this can spell trouble
is mitochondrial DNA?
Most of us think of nuclear DNA when we hear the term genetics.
Every cell in an individual has one nucleus and within the nucleus
is a collection of chromosomes that contain all of the genetic
information necessary to build that organism as described above.
This is what we term nuclear DNA. However, organisms actually
have two different types of DNA in their cells. There is a second
type of DNA found in the mitochondria of the cells which we refer
to as mitochondrial DNA or mtDNA.
Mitochondria are organelles found in every cell that are responsible
for producing cellular energy. Their ancestry is not fully understood,
but, mitochondria are thought to be descended from ancient bacteria,
which were engulfed by cells in the evolutionary process. Yes,
each human now has an alien being called a mitochondria living
in each of our cells. Mitochondrial DNA is a small circular structure
(human mtDNA has ~17,000 base pairs) that primarily codes for
cellular machinery and not for exterior traits. In fact, mitochondria
are often called the powerhouses of the cell for their ability
to generate chemical energy for the cells to use. The figure below
shows the circular mitochondrial DNA molecule in a mammalian cell
with its gene regions and some specific gene loci.
There are differences between nuclear and mtDNA that have important
implications in their use in conservation genetics.
- There are hundreds to thousands of mitochondria (and their
DNA) in each cell versus just one copy of the nuclear genome.
Therefore mtDNA is often easier to obtain from poor quality
genetic samples (hair, scat) than nuclear DNA. So when we want
to design tests to determine which species left a hair, we try
to make it a mtDNA test.
- mtDNA is maternally inherited. All mitochondria in an organism
are derived from the mitochondria present in the female's egg
cell. So, everyone reading this (males and females) have the
mitochondrial DNA from their mother, grandmother, and great-grand
mother. Sorry but dad contributes nothing in terms of mtDNA.
This makes mtDNA useful for investigating sex-specific dispersal
or tracing maternal lineages over long time periods.
- Rather than having two alleles per gene as in nuclear DNA,
individuals have one mitochondrial haplotype in their mtDNA that is passed down directly from the mother.
Because of this mtDNA it is not subject to recombination (mixing of genes from different DNA strands or chromosomes).
This reduces the amount of genetic variation in the mitochondria
over time compared to nuclear DNA making mtDNA an effective
tool in answering questions about the historic characteristics
Contemporary biologists may take for granted the ability to obtain
either diagnostic or population genetic information from genetic
samples, but it wasn't until Sir Alec Jeffreys began studying
DNA variation and the evolution of gene families through the use
of "hypervariable" regions of human DNA that molecular
biologists were able to produce a genetic fingerprint (Jeffreys
1985a, b). Jeffreys discovered that particular regions of
the human genome, which consist of short sequences repeated multiple
times, also contain a "core sequence" that could be
developed into a tool called a probe. Further, Jeffreys recognized
that these probes could be used to explore multiple regions of
the human genome that also contain tandem repeats (two or more
nucleotides sequentially repeated (e.g., CATG, CATG, CATG)-ultimately
producing something biologically akin to a barcode-such that each
individual has a unique genetic signature.
This technique became known as DNA fingerprinting, and instantly
became a staple of forensic and paternity work worldwide. Two
years after the development of DNA fingerprinting, Wetton
et al. (1987) found these same Jeffreys probes to be useful
for studying house sparrows (Passer domesticus) in the United
Kingdom. Within a decade, DNA fingerprinting was common in many
wildlife studies, rewriting conventional wisdom on mating systems
and gene flow among populations (Wildt
et al. 1987; Lynch
et al. 1989).
Initially, there were two major impediments to DNA fingerprinting
for addressing wildlife issues. First, although Jeffreys probes
provided a DNA "barcode," it was typically not possible
to know which bars were associated with which locus. Thus, population
genetic models that relied on locus-specific information needed
to be adapted or abandoned. Second, high quantities of high-quality
DNA were required for these barcodes to be visible on a standard
electrophoresis gel. DNA fingerprinting from hair and other samples
containing minimal or degraded DNA was therefore unreliable or
The next two breakthroughs, which have led to the recent boom
in genetic techniques, were the development of the polymerase
chain reaction (PCR and the discovery of new classes of genetic
et al. 1988; Tautz
and May 1989). PCR is the "amplification" of a gene,
part of a gene, or part of any section of the genome. The process
can be likened to a molecular photocopy machine, where a few short
DNA fragments are copied many times, ultimately allowing visualization
of the PCR product on a standard electrophoresis gel. PCR is conducted
in a thermal cycler, which heats and subsequently cools a chemical
process to precise temperatures during multiple steps (figure
9.1). Originally, during the PCR process, a critical enzyme (called
a polymerase) would break down when DNA strands were heated, making
PCR extremely labor intensive, as more polymerase needed to be
added during every cycle in the process. The process was greatly
facilitated by the discovery of a thermostable enzyme called Taq
polymerase (derived from the organism Thermus aquaticus, discovered
in hot springs in Yellowstone National Park, USA), which allows
the polymerase chain reaction to be subject to extreme temperature
increases and decreases without disintegrating the polymerase.
Schematic illustrating the process of deriving individual
identification from hair samples.
This process begins with DNA extraction, which produces DNA strands.
Next, a particular region of the DNA is amplified using (in this
case) microsatellite primers, and the polymerase chain reaction
(PCR). The PCR process involves three major steps: (1) the denaturing
of the double-stranded DNA molecule; (2) the attachment of primers
at a particular locus in the genome; and (3) the extension of
these primers to produce a copy of the original locus. After mulitple
PCR cycles, millions-if not billions-of copies of the region are
created, and can then be visualized on an electrophoresis gel.
This gel image shows eight Canada lynx samples evaluated at one
microsatellite locus. Even with one locus, multiple individuals
can already be discerned, but samples 5 and 7 produce the same
banding pattern at this locus. Ultimately, when additional loci
were run, these two samples were determined to be from different
Primers are critical components of the PCR reaction. A forward
and a reverse primer together act as bookends denoting the section
of the genome to be copied. These primers can originate from either
the mitochondrial or nuclear genomes of an organism. Deciding
whether to examine sections of the nuclear or mitochondrial genome
will depend largely on the goals of the study. For instance, if
the goal is to determine species from hair or fecal samples, the
mitochondrial genome is typically used. Mitochondrial DNA (mtDNA)
is often less variable within a species than nuclear DNA but variable
between species (see Wildlife Genetics in Practice: A Hypothetical
Example later in this chapter). If the goal is to produce individual
identification or fine-scaled population genetic information,
however, nuclear DNA is often preferred.
Currently, microsatellites are one of the most common genetic
tools for producing individual identification from noninvasive
genetic samples and for conducting population genetic analyses.
Microsatellites belong to a class of primers that contain variable
numbers of tandem repeats-in general, these repeats are two to
five base pairs in length (Figure 1). They are highly variable
in nearly all vertebrates, which ultimately allows the differentiation
of individuals within a population.
Microsatellites have several advantages over Jeffreys DNA fingerprinting
probes. First, microsatellite loci are codominant markers, meaning
that alleles from both of the chromosome pairs in diploid organisms
are observed. Second, when used for individual identification-genotyping
in genetic terms-each pair of bars on the barcode is a separate
microsatellite locus. Thus, a heterozygous individual (an individual
with two different alleles) would have two bars (called bands
or fragments) from a single microsatellite (see samples 1, 2,
5, 7 and 8 in figure 9.1). Alternatively, a homozygous individual
has only one fragment at the microsatellite locus (see samples
3 and 4 in figure 9.1). The ability to distinguish loci from one
another enables traditional population genetic models to estimate
phenomena such as gene flow or relatedness (Wright
1969). Third, microsatellites are believed to be selectively
neutral, conforming to many population genetic models. These properties,
plus the ability to either inexpensively develop microsatellites
for a particular species or use those already developed for related
taxa, have made microsatellites a popular tool for molecular ecologists
In summary, the coupling of PCR and microsatellite or mtDNA primers
allows small amounts of DNA (e.g., from cells attached to the
follicle of a single hair) to be transformed into a diagnostic
identifier of individuals, species, and populations and makes
genetic monitoring using small samples of DNA feasible.
Wildlife Genetics in Practice: A Hypothetical Example
Imagine a genetic sampling survey with the goal of estimating
carnivore species diversity in a western forest. Samples in this
hypothetical survey consist of feces (also called scat, pellets,
dung, or turds, depending on the publication) located by scat
detection dogs and hair snared at bait stations. In this section,
we walk through the different analyses that are commonly conducted
on such samples. It is important to note, however, that the particular
molecular genetic techniques applied will depend on the objectives
and species under study, as well as the expertise of the laboratory.
One size doesn't fit all in molecular ecology.
Species Identification The first question of interest to
wildlife researchers is often, what species were detected by my
survey? There are several ways in which a laboratory can ascertain
species identification, but one of the most common is to sequence
a region of the mitochondrial genome. For identifying carnivore
species in particular, a standard approach is to use the PCR reaction
with primers for the 16S rRNA region of the mitochondrial genome
(following the protocols in Hoezel and Green
et al. 2000). Most North American carnivores have a distinct
sequence at the 16S rRNA region, and the majority of these species'
sequences are entered into a national database (GenBank;
National Center for Biotechnology Information 2007), which facilitates
identification by matching sequences. For carnivores outside of
North America, however, and for other taxa, reference sequences
may not be available-although this is changing rapidly.
DNA barcoding is a new trend in molecular biology for species
identification. With this approach, short, standardized DNA sequences-typically
from a mitochondrial gene-are used to identify known species and
to discover new species quickly and easily (Herbert
et al. 2004; Savolainen
et al. 2005). The initial goal of barcoding was to use a standardized
region of the mitochondrial genome to uniquely identify all species,
although this is proving difficult. Regardless, the barcoding
databases established for many taxa will aid in developing surveys
designed for carnivore species' detection worldwide.
DNA sequencing can be expensive. In some cases, however, we can
reduce expenses approximately 35% by using a restriction enzyme
test to ascertain species identification. Here, as in sequencing,
we amplify the 16S rRNA region, but we then immerse the DNA in
particular enzymes which cut the DNA at diagnostic "restriction
sites." We can identify species by examining the patterns
of restriction enzyme-digested, PCR-amplified mtDNA (figure 9.2).
This was the approach taken with the thousands of samples collected
in the USDA Forest Service's National Lynx Survey (McKelvey
et al. 1999). The downside is that research is required to
develop such assays. Further, when nontarget species are encountered,
they can often not be identified to the species level and may
even confound the identification of the target species. Given
our hypothetical survey, with the goal of identifying every species
that deposited a sample, we would likely sequence either 16S rRNA
or another region of the mitochondrial genome and compare these
sequences to known species sequences in a genetic database (e.g.,
GenBank). But if we only wanted to know whether or not the sample
was deposited by a given target species, we might choose a restriction
enzyme test (e.g., Paxinos
et al. 1997; Mills
et al. 2000; Dalen
et al. 2004).
An electrophoresis gel showing the results of a restriction
digest test to determine species identification. Without deploying
any restriction enzymes, separation of felids and canids is possible.
Using two different enzymes, Canada lynx, bobcats, and mountain
lions can be discerned.
Let's suppose that laboratory results show our hypothetical survey
to have detected ten gray wolves (Canis lupus), eight Canada lynx
(Lynx canadensis), four fishers (Martes pennanti), one elk (Cervus
elaphus), and one bushy-tailed woodrat (Neotoma cinerea). The
forest manager may want to know if there are any female lynx or
gray wolves present in the forest (i.e., to assess whether these
might be breeding populations). One of three genes is typically
used to identify gender in carnivores. The first is the SRY gene
(the testis determining factor), present only on the male Y chromosome.
When a sample from a male is analyzed with SRY-specific primers,
one band appears on an electrophoresis gel. If the sample is from
a female, no bands appear. Unfortunately, a negative result (i.e.,
no band) can mean either that the sample originated from a female,
or that it was of low quality and did not contain adequate amounts
of DNA. Therefore, it is common for researchers to coamplify a
microsatellite locus with the SRY gene, which amplifies regardless
of gender. Failure of the microsatellite to amplify signals that
the DNA was of poor quality and the results should be discarded.
Multiple repeats of this process are recommended for accuracy.
A second method for identifying sex is to sequence a gene in
the zinc-finger region (ZF) of the X and Y chromosomes. In felids,
the ZFY (male) band has a three-base pair deletion compared to
the ZFX. Thus, a male lynx-and males of most other mammal species-will
show two bands on an electrophoresis gel (i.e., a band for the
X chromosome and a band for the Y chromosome, which vary in length
because of the deletion on Y), whereas a female will only show
one band (i.e., females have two X chromosomes, with no length
variants; Pilgrim et al. 2005). Similar tests have been published
for canids, cetaceans, and bovids.
Last, a similar gene, called the amelogenin gene (which codes
for proteins found in tooth enamel) has a twenty-base pair deletion
on the Y chromosome of some species. For example, this provides
another gender determination test that works for felids (Pilgrim
et al. 2005) as well as ursids (Poole et al. 2001).
Our hypothetical research reveals that, of the eight lynx samples,
five were produced by males and three by females. The next common
question might be, how many individuals are represented by these
samples? While several tools exist to provide individual identification,
the most common are microsatellites. For lynx, a panel of six
microsatellites is frequently used to determine individual identification
(Schwartz et al. 2004). In a lynx study in Minnesota, six microsatellites
provided a probability of identity of 1.55 × 10-06 (M. Schwartz,
unpubl. data), which translates to the probability of two randomly
chosen lynx in the Minnesota population having identical genotypes
as being 1 in 645,161. Given that surveys to date have detected
fewer than two hundred lynx in Minnesota, the survey was deemed
to have had sufficient power to distinguish individuals with six
microsatellites. The number of microsatellites necessary for individual
identification depends on the amount and distribution of genetic
variation in the species (characterized by the probability of
identity; Waits et al. 2001). In other work, as few as four microsatellites,
or as many as ten, have been required to achieve a reasonable
probability of identity, depending on the population and its history
(e.g., small and inbred populations tend to have little variability).
In cases where existing microsatellites have low variability,
one solution can be to develop microsatellites specifically for
the population of interest. Given that microsatellites show an
ascertainment bias (i.e., they are more variable in the species
and, in some cases, the population for which they are developed),
this approach can result in variable microsatellites for the target
population, thus requiring fewer microsatellites for individual
identification. Today, a number of commercial companies can quickly
develop variable microsatellites for a target population at a
reasonable cost (e.g., $10,000- $15,000 USD).
The decision to develop microsatellites for a particular species,
versus assessing whether suitable primers have already been developed
from a closely related species, lies in the initial costs of developing
markers, the availability of markers for a related species, and
the purpose of the project. For instance, a recent study seeking
a panel of microsatellites for sampling mountain beavers (Aplodotia
rufa) found that because there are no other members of this genus
and typical hair samples consisted of single hairs (ie. small
amounts of DNA) made it advisable to develop a suite of microsatellites
specifically for this species (Pilgrim et al. 2006).
Once sufficient power to discriminate between individuals is
achieved, the resulting microsatellite genotypes can be compared
to determine the number of unique individuals (figure 9.1). When
employing microsatellites to identify individuals with noninvasively
collected genetic samples, it is important that some method be
used to ensure that the resultant data are error free (see below).
It should be noted that other genetic tools can be used to distinguish
individuals (see Avise 2004 for a description of these techniques)-with
each having its own benefits and limitations.