The experiment described below explored the same concepts as the one described in Figure 13.1 in the textbook. Read the description of the experiment and answer the questions below the description to practice interpreting data and understanding experimental design.
Mirror Experiment activities practice skills described in the brief Experiment and Data Analysis Primers, which can be found by clicking on the “Resources” button on the upper right of your LaunchPad homepage. Certain questions in this activity draw on concepts described in the Statistics primer. Click on the “Key Terms” buttons to see definitions of terms used in the question, and click on the “Primer Section” button to pull up a relevant section from the primer.
As you have learned, researchers are able to sequence an organism’s genome by digesting or breaking it apart into smaller pieces, sequencing the resulting small genomic fragments, and then aligning these fragments based on sequence overlaps. Although it can be remarkably informative to look at the whole genome sequence of a single organism, researchers can also gain a great deal of information when they compare the whole genome sequences of two different species. They can identify mutations in the genetic sequences that are unique to one species or the other, and can isolate mutations that might be responsible for the different phenotypes observed in these organisms. How different are the sequences of the human and chimpanzee genomes? And if any differences exist between the whole genome sequences of humans and chimpanzees, can these differences help us understand how our species evolved?
The Chimpanzee Sequencing and Analysis Consortium (CSAC) – a group of over 60 researchers at different institutions – hypothesized that by comparing the whole genome sequences of humans and chimpanzees, mutations that were unique to humans or to chimpanzees could be identified. They could effectively determine how different the chimpanzee and human genomes were from one another.
Researchers isolated genetic material from the white blood cells of a chimpanzee. Using an approach similar to that outlined in Fig. 13.1, scientists constructed a preliminary whole genome sequence for this great ape. They compared the chimpanzee and human genomes, and counted the number of nucleotides that differed between these two sequences (Figure 1).
Photo credit: Anup Shah/Animals Animals-Earth Scenes
As you will learn in later chapters, various types of mutations can occur in DNA sequences. Some of these mutations are a change in a single nucleotide, and are referred to as point mutations. Other mutations can involve larger chromosomal segments. A portion of a chromosome can be deleted and effectively removed from an organism’s genome. Chromosome segments can also be duplicated, resulting in extra material within the genome. Often, chromosomal mutations can affect thousands of nucleotides at a time.
The CSAC determined that, as a result of the combined action of point mutations and chromosomal insertions and deletions, the human and chimpanzee genomes differ in sequence by approximately 4%. Some of these differences between the human and chimpanzee genomes can be attributed to normal allelic differences between individuals in the same species. However, 1-2% of these differences are actually lineage-specific, meaning that they are not the result of normal variation within species, but true sequence differences that are unique to either the chimpanzee or human. Researchers were also able to identify certain chromosomal regions where mutations were more likely to occur, predict the rate at which certain genes diverged or evolved, and isolate a handful of mutations that could contribute to distinct human and chimpanzee phenotypes.
The Chimpanzee Sequencing and Analysis Consortium. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 437, 69-87.
As you learned in Fig. 13.1, to sequence a whole genome, researchers will typically break the genome into small fragments, sequence them, and then reassemble these fragments based on sequence overlaps. Two small genomic fragments that have overlapping sequences form a structure known as a contig. The CSAC relied on contigs to sequence the chimpanzee genome.
The sequences of human chromosomes 1, 2, 3, 4, etc. are very close to those of chimpanzee chromosomes 1, 2, 3, 4, etc. Researchers of the CSAC compared the sequences of corresponding human and chimpanzee chromosomes and counted the number of base pairs that differed between them. In this manner, scientists were able to determine the extent to which human and chimpanzee chromosomes differed – or diverged – from one another. If analogous human and chimpanzee chromosomes demonstrate a divergence value of 0.010, this means that the sequences of the chromosomes in these two species differ by about 1%. The higher the divergence value, the more nucleotide differences exist between the sequences.
When comparing the sequences of human and chimpanzee chromosomes,
the CSAC evaluated the divergence value for entire chromosomes, and also
for different chromosomal regions. In fact, scientists noted that certain
chromosomal regions are more likely to experience mutations and diverge in both
humans and chimpanzees. These researchers collected the following data for human
and chimpanzee chromosome 1 (this is a rendering of the actual divergence graph
depicted in Figure 2 of the paper); a regression line, which represents the overall
divergence trend for all areas of chromosome 1, has been superimposed over this data.
|Regression line||A line drawn on a scatterplot that depicts how, on average, the variable y changes as a function of the variable x across the whole set of data.|
Correlation and Regression
Biologists often are also interested in the relation between two different measurements, such as height and weight or number of species on an island versus the size of the island. Such data are often depicted as a scatter plot (Figure 5), in which the magnitude of one variable is plotted along the x-axis and the other along the y-axis, each point representing one paired observation.
Figure 5A is the sort of data that would correspond to fingerprint ridge count (the number of raised skin ridges lying between two reference points in each fingerprint). While the data show some scatter, the overall trend is evident. There is a very strong association between the average fingerprint ridge count of parents and that of their offspring. The strength of association between two variables can be measured by the correlation coefficient, which theoretically ranges between +1 and –1. A correlation coefficient of +1 means a perfect positive relation (as one variable increases, the other increases proportionally), and a correlation coefficient of –1 implies a perfect negative relation (as one variable increases, the other decreases proportionally). Correlation coefficients of +1 or –1 are rarely observed in real data. In the case of fingerprint ridge count, the correlation coefficient is 0.9, which implies that the average fingerprint ridge count of offspring is almost (but not quite) equal to that of the parents. For a complex trait, this is a remarkably strong correlation.
Figure 5B represents data that would correspond to adult height. The data exhibit greater scatter than in Figure 5A; however, there is still a fairly strong resemblance between parents and offspring. The correlation coefficient in this case is 0.5. This value means that, on average, the offspring height is approximately halfway between that of the average of the parents and the average of the population as a whole.
The illustrations in Figure 5A and 5B also emphasize one limitation of the correlation coefficient. The correlation coefficient measures the strength of a straight-line (linear) relation. A nonlinear relation (one curving upward or downward) between two variables could be quite strong, but the data might still show a weak correlation.
Each of the straight lines in Figure 5 is a regression line or, more precisely, a regression line of y on x. Each line depicts how, on average, the variable y changes as a function of the variable x across the whole set of data. The slope of the line tells you how many units y changes, on average, for a unit change in x. A slope of +1 implies that a one-unit change in x results in a one-unit change in y, and a slope of 0 implies that the value of x has no effect on the value of y. The slope of a straight line relating values of y to those of x is known as the regression coefficient.