Baer, C. F., Miyamoto, M. M., & Denver, D. R. (2007). Mutation rate variation in multicellular eukaryotes: causes and consequences. Nature Reviews Genetics, 8(8), 619-631.
Barnes, D. E., & Lindahl, T. (2004). Repair and genetic consequences of endogenous DNA base damage in mammalian cells. Annu. Rev. Genet., 38, 445-476.
Campbell, C. D., & Eichler, E. E. (2013). Properties and rates of germline mutations in humans. Trends in Genetics, 29(10), 575-584.
Hoeijmakers, J. H. (2009). DNA damage, aging, and cancer. New England Journal of Medicine, 361(15), 1475-1485.
Jackson, S. P., & Bartek, J. (2009). The DNA-damage response in human biology and disease. Nature, 461(7267), 1071-1078.
Roach, J. C., Glusman, G., Smit, A. F., Huff, C. D., Hubley, R., Shannon, P. T., ... & Shendure, J. (2010). Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science, 328(5978), 636-639. doi:10.1126/science.1186802
A mutation is a change that occurs in our DNA sequence, either due to mistakes when the DNA is copied or as the result of environmental factors such as UV light and cigarette smoke.
Can you spare 5-8 minutes to tell us what you think of this website? Open survey
A gene variant is a permanent change in the DNA sequence that makes up a gene. This type of genetic change used to be known as a gene mutation, but because changes in DNA do not always cause disease, it is thought that gene variant is a more accurate term. Variants can affect one or more DNA building blocks (nucleotides) in a gene.
Gene variants can be inherited from a parent or occur during a person’s lifetime:
Some genetic changes are described as new (de novo) variants; these variants are recognized in a child but not in either parent. In some cases, the variant occurs in a parent’s egg or sperm cell but is not present in any of their other cells. In other cases, the variant occurs in the fertilized egg shortly after the egg and sperm cells unite. (It is often impossible to tell exactly when a de novo variant happened.) As the fertilized egg divides, each resulting cell in the growing embryo will have the variant. De novo variants are one explanation for genetic disorders in which an affected child has a variant in every cell in the body, but the parents do not, and there is no family history of the disorder.
Variants acquired during development can lead to a situation called mosaicism, in which a set of cells in the body has a different genetic makeup than others. In mosaicism, the genetic change is not present in a parent’s egg or sperm cells, or in the fertilized egg, but happens later, anytime from embryonic development through adulthood. As cells grow and divide, cells that arise from the cell with the altered gene will have the variant, while other cells will not. When a proportion of somatic cells have a gene variant and others do not, it is called somatic mosaicism. Depending on the variant and how many cells are affected, somatic mosaicism may or may not cause health problems. When a proportion of egg or sperm cells have a variant and others do not, it is called germline mosaicism. In this situation, an unaffected parent can pass a genetic condition to their child.
Most variants do not lead to development of disease, and those that do are uncommon in the general population. Some variants occur often enough in the population to be considered common genetic variation. Several such variants are responsible for differences between people such as eye color, hair color, and blood type. Although many of these common variations in the DNA have no negative effects on a person’s health, some may influence the risk of developing certain disorders.
When you have read Chapter 14, you should be able to
Genomes are dynamic entities that change over time as a result of the cumulative effects of small-scale sequence alterations caused by mutation and larger scale rearrangements arising from recombination. Mutation and recombination can both be defined as processes that result in changes to a genome, but they are unrelated and we must make a clear distinction between them:
Both mutation and recombination can have dramatic effects on the cell in which they occur. A mutation in a key gene may cause the cell to die if the protein coded by the mutant gene is defective (Section 14.1.2), and some recombination events lead to defining changes in the biochemical capabilities of the cell, for example by determining the mating type of a yeast cell or the immunological properties of a mammalian B or T lymphocyte. Other mutation and recombination events have a less significant impact on the phenotype of the cell and many have none at all. As we will see in Chapter 15, all events that are not lethal have the potential to contribute to the evolution of the genome but for this to happen they must be inherited when the organism reproduces. With a single-celled organism such as a bacterium or yeast, all genome alterations that are not lethal or reversible are inherited by daughter cells and become permanent features of the lineage that descends from the original cell in which the alteration occurred. In a multicellular organism, only those events that occur in germ cells are relevant to genome evolution. Changes to the genomes of somatic cells are unimportant in an evolutionary sense, but they will have biological relevance if they result in a deleterious phenotype that affects the health of the organism.
With mutations, the issues that we have to consider are: how they arise; the effects they have on the genome and on the organism in which the genome resides; whether it is possible for a cell to increase its mutation rate and induce programmed mutations under certain circumstances; and how mutations are repaired.
Mutations arise in two ways:
When considered purely as a chemical reaction, complementary base-pairing is not particularly accurate. Nobody has yet devised a way of carrying out the template-dependent synthesis of DNA without the aid of enzymes, but if the process could be carried out simply as a chemical reaction in a test tube then the resulting polynucleotide would probably have point mutations at 5–10 positions out of every hundred. This represents an error rate of 5–10%, which would be completely unacceptable during genome replication. The template-dependent DNA polymerases that carry out DNA replication must therefore increase the accuracy of the process by several orders of magnitude. This improvement is brought about in two ways:
Escherichia coli is able to synthesize DNA with an error rate of only 1 per 107 nucleotide additions. Interestingly, these errors are not evenly distributed between the two daughter molecules, the product of lagging-strand replication being prone to about 20 times as many errors as the leading-strand replicate. This asymmetry might indicate that DNA polymerase I, which is involved only in lagging-strand replication (Section 13.2.2), has a less effective base selection and proofreading capability compared with DNA polymerase III, the main replicating enzyme (Francino and Ochman, 1997).
Not all of the errors that occur during DNA synthesis can be blamed on the polymerase enzymes: sometimes an error occurs even though the enzyme adds the ‘correct’ nucleotide, the one that base-pairs with the template. This is because each nucleotide base can occur as either of two alternative tautomers, structural isomers that are in dynamic equilibrium. For example, thymine exists as two tautomers, the keto and enol forms, with individual molecules occasionally undergoing a shift from one tautomer to the other. The equilibrium is biased very much towards the keto form but every now and then the enol version of thymine occurs in the template DNA at the precise time that the replication fork is moving past. This will lead to an ‘error’, because enol-thymine base-pairs with G rather than A (Figure 14.4). The same problem can occur with adenine, the rare imino tautomer of this base preferentially forming a pair with C, and with guanine, enol-guanine pairing with thymine. After replication, the rare tautomer will inevitably revert to its more common form, leading to a mismatch in the daughter double helix.
As stated above, the error rate for DNA synthesis in E. coli is 1 in 107. The overall error rate for replication of the E. coli genome is only 1 in 1010 to 1 in 1011, the improvement compared with the polymerase error rate being the result of the mismatch repair system (Section 14.2.3) that scans newly replicated DNA for positions where the bases are unpaired and hence corrects the few mistakes that the replication enzymes make. The implication is that only one uncorrected replication error occurs every 1000 times that the E. coli genome is copied.
Not all errors in replication are point mutations. Aberrant replication can also result in small numbers of extra nucleotides being inserted into the polynucleotide being synthesized, or some nucleotides in the template not being copied. Insertions and deletions are often called frameshift mutations because when one occurs within a coding region it can result in a shift in the reading frame used for translation of the protein specified by the gene (see Figure 14.12). However, it is inaccurate to use ‘frameshift’ to describe all insertions and deletions because they can occur anywhere, not just in genes, and not all insertions or deletions in coding regions result in frameshifts: an insertion or deletion of three nucleotides, or multiples of three, simply adds or removes codons or parts of adjacent codons without affecting the reading frame.
Insertion and deletion mutations can affect all parts of the genome but are particularly prevalent when the template DNA contains short repeated sequences, such as those found in microsatellites (Section 2.4.1). This is because repeated sequences can induce replication slippage, in which the template strand and its copy shift their relative positions so that part of the template is either copied twice or missed out. The result is that the new polynucleotide has a larger or smaller number, respectively, of the repeat units (Figure 14.5). This is the main reason why microsatellite sequences are so variable, replication slippage occasionally generating a new length variant, adding to the collection of alleles already present in the population.
Replication slippage is probably also responsible for the trinucleotide repeat expansion diseases that have been discovered in humans in recent years (Ashley and Warren, 1995). Each of these neurodegenerative diseases is caused by a relatively short series of trinucleotide repeats becoming elongated to two or more times its normal length. For example, the human HD gene contains the sequence 5′-CAG-3′ repeated between 6 and 35 times in tandem, coding for a series of glutamines in the protein product. In Huntington's disease this repeat expands to a copy number of 36–121, increasing the length of the polyglutamine tract and resulting in a dysfunctional protein (Perutz, 1999). Several other human diseases are also caused by expansions of polyglutamine codons (Table 14.1). Some diseases associated with mental retardation result from trinucleotide expansions in the leader region of a gene, giving a fragile site, a position where the chromosome is likely to break (Sutherland et al., 1998). Expansions involving intron and trailer regions are also known.
How triplet expansions are generated is not precisely understood. The size of the insertion is much greater than occurs with normal replication slippage, such as that seen with microsatellite sequences, and once the expansion reaches a certain length it appears to become susceptible to further expansion in subsequent rounds of replication, so that the disease becomes increasingly severe in succeeding generations. The possibility that expansion involves formation of hairpin loops in the DNA has been raised, based on the observation that only a limited number of trinucleotide sequences are known to undergo expansion, and all of these sequences are GC-rich and so might form stable secondary structures. There is also evidence that at least one triplet expansion region - for Friedreich's ataxia - can form a triple helix structure (Gacy et al., 1998). Studies of similar triplet expansions in yeast have shown that these are more prevalent when the RAD27 gene is inactivated (Freudenreich et al., 1998), an interesting observation as RAD27 is the yeast version of the mammalian gene for FEN1, the protein involved in processing of Okazaki fragments (Section 13.2.2). This might indicate that a trinucleotide repeat expansion is caused by an aberration in lagging-strand synthesis.
Many chemicals that occur naturally in the environment have mutagenic properties and these have been supplemented in recent years with other chemical mutagens that result from human industrial activity. Physical agents such as radiation are also mutagenic. Most organisms are exposed to greater or lesser amounts of these various mutagens, their genomes suffering damage as a result.
The definition of the term ‘mutagen’ is a chemical or physical agent that causes mutations. This definition is important because it distinguishes mutagens from other types of environmental agent that cause damage to cells in ways other than by causing mutations (Table 14.2). There are overlaps between these categories (for example, some mutagens are also carcinogens) but each type of agent has a distinct biological effect. The definition of ‘mutagen’ also makes a distinction between true mutagens and other agents that damage DNA without causing mutations, for example by causing breaks in DNA molecules. This type of damage may block replication and cause the cell to die, but it is not a mutation in the strict sense of the term and the causative agents are therefore not mutagens.
Mutagens cause mutations in three different ways:
The range of mutagens is so vast that it is difficult to devise an all-embracing classification. We will therefore restrict our study to the most common types. For chemical mutagens these are as follows:
The most important types of physical mutagen are as follows:
When considering the effects of mutations we must make a distinction between the direct effect that a mutation has on the functioning of a genome and its indirect effect on the phenotype of the organism in which it occurs. The direct effect is relatively easy to assess because we can use our understanding of gene structure and expression to predict the impact that a mutation will have on genome function. The indirect effects are more complex because these relate to the phenotype of the mutated organism which, as described in Section 7.2.2, is often difficult to correlate with the activities of individual genes.
Many mutations result in nucleotide sequence changes that have no effect on the functioning of the genome. These silent mutations include virtually all of those that occur in intergenic DNA and in the non-coding components of genes and gene-related sequences. In other words, some 98.5% of the human genome (see Box 1.4) can be mutated without significant effect.
Mutations in the coding regions of genes are much more important. First, we will look at point mutations that change the sequence of a triplet codon. A mutation of this type will have one of four effects (Figure 14.11):
Deletion and insertion mutations also have distinct effects on the coding capabilities of genes (Figure 14.12). If the number of deleted or inserted nucleotides is three or a multiple of three then one or more codons are removed or added, the resulting loss or gain of amino acids having varying effects on the function of the encoded protein. Deletions or insertions of this type are often inconsequential but will have an impact if, for example, amino acids involved in an enzyme's active site are lost, or if an insertion disrupts an important secondary structure in the protein. On the other hand, if the number of deleted or inserted nucleotides is not three or a multiple of three then a frameshift results, all of the codons downstream of the mutation being taken from a different reading frame from that used in the unmutated gene. This usually has a significant effect on the protein function, because a greater or lesser part of the mutated polypeptide has a completely different sequence to the normal polypeptide.
It is less easy to make generalizations about the effects of mutations that occur outside of the coding regions of the genome. Any protein binding site is susceptible to point, insertion or deletion mutations that change the identity or relative positioning of nucleotides involved in the DNA-protein interaction. These mutations therefore have the potential to inactivate promoters or regulatory sequences, with predictable consequences for gene expression (Figure 14.13; Sections 9.2 and 9.3). Origins of replication could conceivably be made non-functional by mutations that change, delete or disrupt sequences recognized by the relevant binding proteins (Section 13.2.1) but these possibilities are not well documented. There is also little information about the potential impact on gene expression of mutations that affect nucleosome positioning (Section 8.2.1).
One area that has been better researched concerns mutations that occur in introns or at intron-exon boundaries. In these regions, single point mutations will be important if they change nucleotides involved in the RNA-protein and RNA-RNA interactions that occur during splicing of different types of intron (Sections 10.1.3 and 10.2.3). For example, mutation of either the G or T in the DNA copy of the 5′ splice site of a GU-AG intron, or of the A or G at the 3′ splice site, will disrupt splicing because the correct intron-exon boundary will no longer be recognized. This may mean that the intron is not removed from the pre-mRNA, but it is more likely that a cryptic splice site (see page 289) will be used as an alternative. It is also possible for a mutation within an intron or an exon to create a new cryptic site that is preferred over a genuine splice site that is not itself mutated. Both types of event have the same result: relocation of the active splice site, leading to aberrant splicing. This might delete part of the resulting protein, add a new stretch of amino acids, or lead to a frameshift. Several versions of the blood disease β-thalassemia are caused by mutations that lead to cryptic splice site selection during processing of β-globin transcripts.
Now we turn to the indirect effects that mutations have on organisms, beginning with multicellular diploid eukaryotes such as humans. The first issue to consider is the relative importance of the same mutation in a somatic cell compared with a germ cell. Because somatic cells do not pass copies of their genomes to the next generation, a somatic cell mutation is important only for the organism in which it occurs: it has no potential evolutionary impact. In fact, most somatic cell mutations have no significant effect, even if they result in cell death, because there are many other identical cells in the same tissue and the loss of one cell is immaterial. An exception is when a mutation causes a somatic cell to malfunction in a way that is harmful to the organism, for instance by inducing tumor formation or other cancerous activity.
Mutations in germ cells are more important because they can be transmitted to members of the next generation and will then be present in all the cells of any individual who inherits the mutation. Most mutations, including all silent ones and many in coding regions, will still not change the phenotype of the organism in any significant way. Those that do have an effect can be divided into two categories:
Assessing the effects of mutations on the phenotypes of multicellular organisms can be difficult. Not all mutations have an immediate impact: some are delayed onset and only confer an altered phenotype later in the individual's life. Others display non-penetrance in some individuals, never being expressed even though the individual has a dominant mutation or is a homozygous recessive. With humans, these factors complicate attempts to map disease-causing mutations by pedigree analysis (Section 5.2.4) because they introduce uncertainty about which members of a pedigree carry a mutant allele.
Mutations in microbes such as bacteria and yeast can also be described as loss-of-function or gain-of-function, but with microorganisms this is neither the normal nor the most useful classification scheme. Instead, a more detailed description of the phenotype is usually attempted on the basis of the growth properties of mutated cells in various culture media. This enables most mutations to be assigned to one of four categories:
In addition to these four categories, many mutations are lethal and so result in death of the mutant cell, whereas others have no effect. The latter are less common in microorganisms than in higher eukaryotes, because most microbial genomes are relatively compact, with little non-coding DNA. Mutations can also be leaky, meaning that a less extreme form of the mutant phenotype is expressed. For example, a leaky version of the tryptophan auxotroph illustrated in Figure 14.15 would grow slowly on minimal medium, rather than not growing at all.
Is it possible for cells to utilize mutations in a positive fashion, either by increasing the rate at which mutations appear in their genomes, or by directing mutations towards specific genes? Both types of event might appear, at first glance, to go against the accepted wisdom that mutations occur randomly but, as we shall see, hypermutation and programmed mutations are possible without contravening this dogma.
Hypermutation occurs when a cell allows the rate at which mutations occur in its genome to increase. Several examples of hypermutation are known, one of these forming part of the mechanism used by vertebrates, including humans, to generate a diverse array of immunoglobulin proteins. We have already touched on this phenomenon in Section 12.2.1 when we examined the genome rearrangements that result in joining of the V, D, J and H segments of the immunoglobulin heavy- and light-chain genes (see Figure 12.15). Additional diversity is produced by hypermutation of the V-gene segments after assembly of the intact immunoglobulin gene (Figure 14.17), the mutation rate for these segments being 6–7 orders of magnitude greater than the background mutation rate experienced by the rest of the genome (Shannon and Weigert, 1998). This enhanced mutation rate appears to result from the unusual behavior of the mismatch repair system which normally corrects replication errors. At all other positions within the genome, the mismatch repair system corrects errors of replication by searching for mismatches and replacing the nucleotide in the daughter strand, this being the strand that has just been synthesized and so contains the error (see Section 14.2.3). At V-gene segments, the repair system changes the nucleotide in the parent strand, and so stabilizes the mutation rather than correcting it (Cascalho et al., 1998). The mechanism by which this is achieved has not yet been described.
An apparent increase in mutation rate arising from modifications to the normal DNA repair process does not contradict the dogma regarding the randomness of mutations. However, problems have arisen with reports, dating back to 1988 (Cairns et al., 1988), which suggested that E. coli is able to direct mutations towards genes whose mutation would be advantageous under the environmental conditions that the bacterium is encountering. The original experiments involved a strain of E. coli that has a nonsense mutation in the lactose operon, inactivating the proteins needed for utilization of this sugar (Research Briefing 14.1). The bacteria were spread on an agar medium in which the only carbon source was lactose. This meant that a cell could grow and divide only if a second mutation occurred in the lactose operon, reversing the effects of the nonsense mutation and therefore allowing the lactose enzymes to be synthesized. Mutations with this effect appeared to occur significantly more frequently than expected, and at a rate that was greater than mutations in other parts of the genomes of these E. coli cells.
These experiments suggested that bacteria can program mutations according to the selective pressures that they are placed under. In other words, the environment can directly affect the phenotype of the organism, as suggested by Lamarck, rather than operating through the random processes postulated by Darwin. With such radical implications, it is not surprising that the experiments have been debated at length, with numerous attempts to discover flaws in their design or alternative explanations for the results. Variations of the original experimental system have suggested that the results are authentic, and similar events in other bacteria have been described. Models based on gene amplification rather than selective mutation are being tested (Andersson et al., 1998), and attention has also been directed at the possible roles of recombination events such as transposition of insertion elements in the generation of programmed mutations (Foster, 1999).
In view of the thousands of damage events that genomes suffer every day, coupled with the errors that occur when the genome replicates, it is essential that cells possess efficient repair systems. Without these repair systems a genome would not be able to maintain its essential cellular functions for more than a few hours before key genes became inactivated by DNA damage. Similarly, cell lineages would accumulate replication errors at such a rate that their genomes would become dysfunctional after a few cell divisions.
Most cells possess four different categories of DNA repair system (Figure 14.18; Lindahl and Wood, 1999):
Most if not all organisms also possess systems that enable them to replicate damaged regions of their genome without prior repair. We will examine these systems in Section 14.2.5, and in Section 14.2.6 we will survey the human diseases that result from defects in DNA repair processes.
Most of the types of DNA damage that are caused by chemical or physical mutagens (Section 14.1.1) can only be repaired by excision of the damaged nucleotide followed by resynthesis of a new stretch of DNA, as shown in Figure 14.18B. Only a few types of damaged nucleotide can be repaired directly:
The direct types of damage reversal described above are important, but they form a very minor component of the DNA repair mechanisms of most organisms. This point is illustrated by the draft human genome sequences, which appear to contain just a single gene coding for a protein involved in direct repair (the MGMT gene), but which have at least 40 genes for components of the excision repair pathways (Wood et al., 2001). These pathways fall into two categories:
We will examine each of these pathways in turn.
Base excision is the least complex of the various repair systems that involve removal of one or more damaged nucleotides followed by resynthesis of DNA to span the resulting gap. It is used to repair many modified nucleotides whose bases have suffered relatively minor damage resulting from, for example, exposure to alkylating agents or ionizing radiation (Section 14.1.1). The process is initiated by a DNA glycosylase which cleaves the β-N-glycosidic bond between a damaged base and the sugar component of the nucleotide (Figure 14.20A). Each DNA glycosylase has a limited specificity (Table 14.3), the specificities of the glycosylases possessed by a cell determining the range of damaged nucleotides that can be repaired by the base excision pathway. Most organisms are able to deal with deaminated bases such as uracil (deaminated cytosine) and hypoxanthine (deaminated adenine), oxidation products such as 5-hydroxycytosine and thymine glycol, and methylated bases such as 3-methyladenine, 7-methylguanine and 2-methylcytosine (Seeberg et al., 1995). Other DNA glycosylases remove normal bases as part of the mismatch repair system (Section 14.2.3). Most of the DNA glycosylases involved in base excision repair are thought to diffuse along the minor groove of the DNA double helix in search of damaged nucleotides, but some may be associated with the replication enzymes.
A DNA glycosylase removes a damaged base by ‘flipping’ the structure to a position outside of the helix and then detaching it from the polynucleotide (Kunkel and Wilson, 1996; Roberts and Cheng, 1998). This creates an AP or baseless site (see Figure 14.10) which is converted into a single nucleotide gap in the second step of the repair pathway (Figure 14.20B). This step can be carried out in a variety of ways. The standard method makes use of an AP endonuclease, such as exonuclease III or endonuclease IV of E. coli or human APE1, which cuts the phosphodiester bond on the 5′ side of the AP site. Some AP endonucleases can also remove the sugar from the AP site, this being all that remains of the damaged nucleotide, but others lack this ability and so work in conjunction with a separate phosphodiesterase. An alternative pathway for converting the AP site into a gap utilizes the endonuclease activity possessed by some DNA glycosylases, which can make a cut at the 3′ side of the AP site, probably at the same time that the damaged base is removed, followed again by removal of the sugar by a phosphodiesterase.
The single nucleotide gap is filled by a DNA polymerase, using base-paring with the undamaged base in the other strand of the DNA molecule to ensure that the correct nucleotide is inserted. In E. coli the gap is filled by DNA polymerase I and in mammals by DNA polymerase β (see Table 13.2; Sobol et al., 1996). Yeast seems to be unusual in that it uses its main DNA replicating enzyme, DNA polymerase δ, for this purpose (Seeberg et al., 1995). After gap filling, the final phosphodiester bond is put in place by a DNA ligase.
Nucleotide excision repair has a much broader specificity than the base excision system and is able to deal with more extreme forms of damage such as intrastrand crosslinks and bases that have become modified by attachment of large chemical groups. It is also able to correct cyclobutyl dimers by a dark repair process, providing those organisms that do not have the photoreactivation system (such as humans) with a means of repairing this type of damage.
In nucleotide excision repair, a segment of single-stranded DNA containing the damaged nucleotide(s) is excised and replaced with new DNA. The process is therefore similar to base excision repair except that it is not preceded by selective base removal, and a longer stretch of polynucleotide is excised. The best studied example of nucleotide excision repair is the short patch process of E. coli, so called because the region of polynucleotide that is excised and subsequently ‘patched’ is relatively short, usually 12 nucleotides in length.
Short patch repair is initiated by a multienzyme complex called the UvrABC endonuclease, sometimes also referred to as the ‘excinuclease’. In the first stage of the process a trimer comprising two UvrA proteins and one copy of UvrB attaches to the DNA at the damaged site. How the site is recognized is not known but the broad specificity of the process indicates that individual types of damage are not directly detected and that the complex must search for a more general attribute of DNA damage such as distortion of the double helix. UvrA may be the part of the complex most involved in damage location because it dissociates once the site has been found and plays no further part in the repair process. Departure of UvrA allows UvrC to bind (Figure 14.21), forming a UvrBC dimer that cuts the polynucleotide either side of the damaged site. The first cut is made by UvrB at the fifth phosphodiester bond downstream of the damaged nucleotide, and the second cut is made by UvrC at the eighth phosphodiester bond upstream, resulting in the 12 nucleotide excision, although there is some variability, especially in the position of the UvrB cut site. The excised segment is then removed, usually as an intact oligonucleotide, by DNA helicase II, which presumably detaches the segment by breaking the base pairs holding it to the second strand. UvrC also detaches at this stage, but UvrB remains in place and bridges the gap produced by the excision. The bound UvrB is thought to prevent the single-stranded region that has been exposed from base-pairing with itself, but alternative roles could be to prevent this strand from becoming damaged, or possibly to direct the DNA polymerase to the site that needs to be repaired. As in base excision repair, the gap is filled by DNA polymerase I and the last phosphodiester bond is synthesized by DNA ligase.
E. coli also has a long patch nucleotide excision repair system that involves Uvr proteins but differs in that the piece of DNA that is excised can be anything up to 2 kb in length. Long patch repair has been less well studied and the process is not understood in detail, but it is presumed to work on more extensive forms of damage, possibly regions where groups of nucleotides, rather than just individual ones, have become modified. The eukaryotic nucleotide excision repair process is also called ‘long patch’ but results in replacement of only 24–29 nucleotides of DNA. In fact, there is no ‘short patch’ system in eukaryotes and the name is used to distinguish the process from base excision repair. The system is more complex than in E. coli and the relevant enzymes do not seem to be homologs of the Uvr proteins. In humans at least 16 proteins are involved, with the downstream cut being made at the same position as in E. coli - the fifth phosphodiester bond - but with a more distant upstream cut, resulting in the longer excision. Both cuts are made by endonucleases that attack single-stranded DNA specifically at its junction with a double-stranded region, indicating that before the cuts are made the DNA around the damage site has been melted, presumably by a helicase (Figure 14.22). This activity is provided at least in part by TFIIH, one of the components of the RNA polymerase II initiation complex (see Table 9.5). At first it was assumed that TFIIH simply had a dual role in the cell, functioning separately in both transcription and repair, but now it is thought that there is a more direct link between the two processes (Lehmann, 1995; Svejstrup et al., 1996). This view is supported by the discovery of transcription-coupled repair, which repairs some forms of damage in the template strands of genes that are being actively transcribed. The first type of transcriptioncoupled repair to be discovered was a modified version of nucleotide excision, but it now known that base-excision repair is also coupled with transcription (Cooper et al., 1997). These discoveries do not imply that nontranscribed regions of the genome are not repaired. The excision repair processes protect the entire genome from damage, but it is entirely logical that special mechanisms should exist for directing the processes at genes that are being transcribed. The template strands of these genes contain the genome's biological information and maintaining their integrity should be the highest priority for the repair systems.
Each of the repair systems that we have looked at so far - direct, base excision and nucleotide excision repair - recognize and act upon DNA damage caused by mutagens. This means that they search for abnormal chemical structures such as modified nucleotides, cyclobutyl dimers and intrastrand crosslinks. They cannot, however, correct mismatches resulting from errors in replication because the mismatched nucleotide is not abnormal in any way, it is simply an A, C, G or T that has been inserted at the wrong position. As these nucleotides look exactly like any other nucleotide, the mismatch repair system that corrects replication errors has to detect not the mismatched nucleotide itself but the absence of base-pairing between the parent and daughter strands. Once it has found a mismatch, the repair system excises part of the daughter polynucleotide and fills in the gap, in a manner similar to base and nucleotide excision repair.
The scheme described above leaves one important question unanswered. The repair must be made in the daughter polynucleotide because it is in this newly synthesized strand that the error has occurred; the parent polynucleotide has the correct sequence. How does the repair process know which strand is which? In E. coli the answer is that the daughter strand is, at this stage, undermethylated and can therefore be distinguished from the parent polynucleotide, which has a full complement of methyl groups. E. coli DNA is methylated because of the activities of the DNA adenine methylase (Dam), which converts adenines to 6-methyladenines in the sequence 5′-GATC-3′, and the DNA cytosine methylase (Dcm), which converts cytosines to 5-methylcytosines in 5′-CCAGG-3′ and 5′-CCTGG-3′. These methylations are not mutagenic, the modified nucleotides having the same base-pairing properties as the unmodified versions. There is a delay between DNA replication and methylation of the daughter strand, and it is during this window of opportunity that the repair system scans the DNA for mismatches and makes the required corrections in the undermethylated, daughter strand (Figure 14.23).
E. coli has at least three mismatch repair systems, called ‘long patch’, ‘short patch and ‘very short patch’, the names indicating the relative lengths of the excised and resynthesized segments. The long patch system replaces up to a kb or more of DNA and requires the MutH, MutL and MutS proteins, as well as the DNA helicase II that we met during nucleotide excision repair. MutS recognizes the mismatch and MutH distinguishes the two strands by binding to unmethylated 5′-GATC-3′ sequences (Figure 14.24). The role of MutL is unclear but it might coordinate the activities of the other two proteins so that MutH binds to 5′-GATC-3′ sequences only in the vicinity of mismatch sites recognized by MutS. After binding, MutH cuts the phosphodiester bond immediately upstream of the G in the methylation sequence and DNA helicase II detaches the single strand. There does not appear to be an enzyme that cuts the strand downstream of the mismatch; instead the detached single-stranded region is degraded by an exonuclease that follows the helicase and continues beyond the mismatch site. The gap is then filled in by DNA polymerase I and DNA ligase. Similar events are thought to occur during short and very short mismatch repair, the difference being the specificities of the proteins that recognize the mismatch. The short patch system, which results in excision of a segment less than 10 nucleotides in length, begins when MutY recognizes an A-G or A-C mismatch, and the very short repair system corrects G-T mismatches which are recognized by the Vsr endonuclease.
Eukaryotes have homologs of the E. coli Mut proteins and their mismatch repair processes probably work in a similar way (Kolodner, 2000). The one difference is that methylation might not be the method used to distinguish between the parent and daughter polynucleotides. Methylation has been implicated in mismatch repair in mammalian cells, but the DNA of some eukaryotes, including fruit flies and yeast, is not extensively methylated; it is thought that these organisms must therefore use a different method. Possibilities include an association between the repair enzymes and the replication complex, so that repair is coupled with DNA synthesis, or use of single-strand binding proteins that mark the parent strand.
A single-stranded break in a double-stranded DNA molecule, such as is produced during the base and nucleotide excision repair processes and by some types of oxidative damage, does not present the cell with a critical problem. The double helix retains its overall intactness and the break can be repaired by template-dependent DNA synthesis (Figure 14.25A). A double-stranded break is more serious because this converts the original double helix into two separate fragments which have to be brought back together again in order for the break to be repaired (Figure 14.25B). The two broken ends must be protected from further degradation, which could result in a deletion mutation appearing at the repaired break point. The repair processes must also ensure that the correct ends are joined: if there are two broken chromosomes in the nucleus, then the correct pairs must be brought together so that the original structures are restored. Experimental studies of mouse cells indicate that achieving this outcome is difficult and if two chromosomes are broken then misrepair resulting in hybrid structures occurs relatively frequently (Richardson and Jasin, 2000). Even if only one chromosome is broken, there is still a possibility that a natural chromosome end could be confused as a break and an incorrect repair made. This type of error is not unknown, despite the presence of special telomerebinding proteins that mark the natural ends of chromosomes (Section 2.2.1).
Double-strand breaks are generated by exposure to ionizing radiation and some chemical mutagens, and are also made by the cell, in a controlled fashion, during recombination events such as the genome rearrangements that join together immunoglobulin gene segments and T-cell receptor gene segments in B and T lymphocytes (Section 12.2.1). Progress in understanding the break repair system has been stimulated by studies of mutant human cell lines, which have resulted in the identification of various sets of genes involved in the process (Critchlow and Jackson, 1998). These genes specify a multi-component protein complex that directs a DNA ligase to the break (Figure 14.26). The complex includes a protein called Ku, made up of two non-identical subunits, which binds the DNA ends either side of the break (Walker et al., 2001). Ku binds to the DNA in association with the DNA-PKCS protein kinase, which activates a third protein, XRCC4, which interacts with the mammalian DNA ligase IV, directing this repair protein to the double-strand break.
The repair process is called non-homologous endjoining (NHEJ), the name indicating that there is no need for homology between the two molecules whose ends are being joined, unlike other end-joining mechanisms that we will encounter when we study recombination in Section 14.3. NHEJ is looked on as a type of recombination because, as well as repairing breaks, it can be used to join molecules or fragments that were not previously joined, producing new combinations. A version of the NHEJ system is probably used during construction of immunoglobulin and T-cell receptor genes, but the details are likely to be different because these programmed rearrangements of the genome involve intermediate structures, such as DNA hairpin loops, that are not seen during the repair of DNA breaks resulting from damage.
If a region of the genome has suffered extensive damage then it is conceivable that the repair processes will be overwhelmed. The cell then faces a stark choice between dying or attempting to replicate the damaged region even though this replication may be error-prone and result in mutated daughter molecules. When faced with this choice E. coli cells invariably take the second option, by inducing one of several emergency procedures for bypassing sites of major damage. The best studied of these bypass processes is the SOS response, which enables the cell to replicate its DNA even though the template polynucleotides contain AP sites and/or cyclobutyl dimers and other photoproducts resulting from exposure to chemical mutagens or UV radiation that would normally block, or at least delay, the replication complex. Bypass of these sites requires construction of a mutasome, comprising the UmuD′2C complex (also called DNA polymerase V, a trimer made up of two UmuD′ proteins and one copy of UmuC) and several copies of the RecA protein (Goodman, 2000). The latter is a single-stranded DNA-binding protein that coats the damaged strands, enabling the UmuD′2C complex to displace DNA polymerase III and carry out error-prone DNA synthesis until the damaged region has been passed and DNA polymerase III can take over once again (Figure 14.27).
The SOS response is primarily looked on as the last best chance that the bacterium has to replicate its DNA and hence survive under adverse conditions. However, the price of survival is an increased mutation rate because the mutasome does not repair damage, it simply allows a damaged region of a polynucleotide to be replicated. When it encounters a damaged position in the template DNA, the polymerase selects a nucleotide more or less at random, although with some preference for placing an A opposite an AP site: in effect the error rate of the replication process increases. It has been suggested that this increased mutation rate is the purpose of the SOS response, mutation being in some way an advantageous response to DNA damage, but this idea remains controversial (Chicurel, 2001).
For some time, the SOS response was thought to be the only damage-bypass process in bacteria, but we now appreciate that at least two other E. coli polymerases act in a similar way, although with different types of damage. These are DNA polymerase II, which can bypass nucleotides bound to mutagenic chemicals such as N-2-acetylaminofluorene, and DNA polymerase IV (also called DinB), which can replicate through a region of template DNA in which the two parent polynucleotides have become misaligned (Lindahl and Wood, 1999; Hanaoka, 2001). Bypass polymerases have also been discovered in eukaryotic cells. These include DNA polymerase η, which can bypass cyclobutyl dimers (Johnson et al., 1999), and DNA polymerases ι and ζ, which work together to replicate through photoproducts and AP sites (Johnson et al., 2000).
The importance of DNA repair is emphasized by the number and severity of inherited human diseases that have been linked with defects in one of the repair processes. One of the best characterized of these is xeroderma pigmentosum, which results from a mutation in any one of several genes for proteins involved in nucleotide excision repair. Nucleotide excision is the only way in which human cells can repair cyclobutyl dimers and other photoproducts, so it is no surprise that the symptoms of xeroderma pigmentosum include hypersensitivity to UV radiation, patients suffering more mutations than normal on exposure to sunlight, which often leads to skin cancer (Lehmann, 1995). Trichothiodystrophy is also caused by defects in nucleotide excision repair, but this is a more complex disorder which, although not involving cancer, usually includes problems with both the skin and nervous system.
A few diseases have been linked with defects in the transcription-coupled component of nucleotide excision repair. These include breast and ovarian cancers, the BRCA1 gene that confers susceptibility to these cancers coding for a protein that has been implicated, at least indirectly, with transcription-coupled repair (Gowen et al., 1998), and Cockayne syndrome, a complex disease manifested by growth and neurologic disorders (Hanawalt, 2000). A deficiency in transcription-coupled repair has also been identified in humans suffering from the cancer-susceptibility syndrome called HNPCC (hereditary non-polyposis colorectal cancer; Mellon et al., 1996), although this disease was originally identified as a defect in mismatch repair (Kolodner, 1995). Ataxia telangiectasia, the symptoms of which include sensitivity to ionizing radiation, results from defects in the ATX gene, which is involved in the damage-detection process (Section 13.3.2). Other diseases that are associated with a breakdown in DNA repair are Bloom's and Werner's syndromes, which are caused by inactivation of a DNA helicase that may have a role in NHEJ (Shen and Loeb, 2000; Wu and Hickson, 2001), and Fanconi's anemia, which confers sensitivity to chemicals that cause crosslinks in DNA but whose biochemical basis is not yet known.
Without recombination, genomes would be relatively static structures, undergoing very little change. The gradual accumulation of mutations over a long period of time would result in small-scale alterations in the nucleotide sequence of the genome, but more extensive restructuring, which is the role of recombination, would not occur, and the evolutionary potential of the genome would be severely restricted.
Recombination was first recognized as the process responsible for crossing-over and exchange of DNA segments between homologous chromosomes during meiosis of eukaryotic cells (see Figure 5.15), and was subsequently implicated in the integration of transferred DNA into bacterial genomes after conjugation, transduction or transformation (Section 5.2.4). The biological importance of these processes stimulated the first attempts to describe the molecular events involved in recombination and led to the Holliday model (Holliday, 1964), with which we will begin our study of recombination.
The Holliday model refers to a type of recombination called general or homologous recombination. This is the most important version of recombination in nature, being responsible for meiotic crossing-over and the integration of transferred DNA into bacterial genomes.
The Holliday model describes recombination between two homologous double-stranded molecules, ones with identical or nearly identical sequences, but is equally applicable to two different molecules that share a limited region of homology, or a single molecule that recombines with itself because it contains two separate regions that are homologous with one another.
The central feature of the model is formation of a heteroduplex resulting from the exchange of polynucleotide segments between the two homologous molecules (Figure 14.28). The heteroduplex is initially stabilized by base-pairing between each transferred strand and the intact polynucleotide of the recipient molecule, this basepairing being possible because of the sequence similarity between the two molecules. Subsequently the gaps are sealed by DNA ligase, giving a Holliday structure. This structure is dynamic, branch migration resulting in exchange of longer segments of DNA if the two helices rotate in the same direction.
Separation, or resolution, of the Holliday structure back into individual double-stranded molecules occurs by cleavage across the branch point. This is the key to the entire process because the cut can be made in either of two orientations, as becomes apparent when the three-dimensional configuration or chi form of the Holliday structure is examined (see Figure 14.28). These two cuts have very different results. If the cut is made left-right across the chi form as drawn in Figure 14.28, then all that happens is that a short segment of polynucleotide, corresponding to the distance migrated by the branch of the Holliday structure, is transferred between the two molecules. On the other hand, an up-down cut results in reciprocal strand exchange, double-stranded DNA being transferred between the two molecules so that the end of one molecule is exchanged for the end of the other molecule. This is the DNA transfer seen in crossing-over.
So far we have ignored one aspect of the Holliday model. This is the way in which the two double-stranded molecules interact at the beginning of the process to produce the heteroduplex. In the original scheme, the two molecules lined up with one another and single-stranded nicks appeared at equivalent positions in each helix. This produced free single-stranded ends that could be exchanged, resulting in the heteroduplex (Figure 14.29A). This feature of the model was criticized because no mechanism could be proposed for ensuring that the nicks occurred at precisely the same position on each molecule. The Meselson-Radding modification (Meselson and Radding, 1975) proposes a more satisfactory scheme whereby a single-stranded nick occurs in just one of the double helices, the free end that is produced ‘invading’ the unbroken double helix at the homologous position and displacing one of its strands, forming a D-loop (Figure 14.29B). Subsequent cleavage of the displaced strand at the junction between its single-stranded and base-paired regions produces the heteroduplex.
The Holliday model and Meselson-Radding modification refer to homologous recombination in all organisms but, as with many areas of molecular biology, the initial progress in understanding how the process is carried out in the cell was made with E. coli. The specific recombination system that has been studied has the circular E. coli genome as one partner and a linear chromosome fragment as the second partner, this being the situation that occurs during conjugation, transduction or transformation of bacterial cells (Section 5.2.4).
Mutation studies have identified a number of E. coli genes that, when inactivated, give rise to defects in homologous recombination, indicating that their protein products are involved in the process in some way. Three distinct recombination systems have been described, these being the RecBCD, RecE and RecF pathways, with RecBCD apparently being the most important in the bacterium (Camerini-Otero and Hsieh, 1995). In this pathway, recombination is initiated by the RecBCD enzyme, which has both nuclease and helicase activities. Its precise mode of action is uncertain: in the simplest model the enzyme binds to one end of the linear molecule and unwinds it until it reaches the first copy of the eight-nucleotide consensus sequence 5′-GCTGGTGG-3′ (rather confusingly called the chi site), which occurs once every 6 kb in E. coli DNA (Blattner et al., 1997). The nuclease activity of the enzyme then makes the single-stranded nick at a position approximately 56 nucleotides to the 3′ side of the chi site (Figure 14.30). Alternative proposals have the RecBCD enzyme making nicks as it progresses along the linear DNA, this activity being inhibited when the chi site is reached, the last of these progressive nicks being equivalent to the single nick envisaged in the first model (Eggleston and West, 1996).
Whatever the precise mechanism, the RecBCD enzyme produces the free single-stranded end which, according to the Meselson-Radding modification, invades the intact partner, in this case the circular E. coli genome. This stage is mediated by the RecA protein, which forms a protein-coated DNA filament that is able to invade the intact double helix and set up the D-loop (see Figure 14.30). An intermediate in formation of the D-loop is probably a triplex structure, a three-stranded DNA helix in which the invading polynucleotide lies within the major groove of the intact helix and forms hydrogen bonds with the base pairs it encounters (Camerini-Otero and Hsieh, 1995).
Branch migration is catalyzed by the RuvA and RuvB proteins, both of which attach to the branch point of the Holliday structure. X-ray crystallography studies suggest that four copies of RuvA bind directly to the branch, forming a core to which two RuvB rings, each consisting of eight proteins, attach, one to either side (Figure 14.31; Rafferty et al., 1996). The resulting structure might act as a ‘molecular motor’, rotating the helices in the required manner so that the branch point moves. The RecG protein also has a role in branch migration but it is not clear if this is in conjunction with RuvAB, or as part of an alternative mechanism (Eggleston and West, 1996).
Branch migration does not appear to be a random process, but instead stops preferentially at the sequence
. This sequence occurs frequently in the E. coli genome, so presumably migration does not halt at the first instance of the motif that is reached. When branch migration has ended, the RuvAB complex detaches and is replaced by two RuvC proteins (see Figure 14.31) which carry out the cleavage that resolves the Holliday structure. The cuts are made between the second T and the
components of the recognition sequence.
Although the Holliday model for homologous recombination, either in its original form or as modified by Meselson and Radding, explains most of the results of recombination in all organisms, it has a few inadequacies, which prompted the development of alternative schemes. In particular, it was thought that the Holliday model could not explain gene conversion, a phenomenon first described in yeast and fungi but now known to occur with many eukaryotes. In yeast, fusion of a pair of gametes results in a zygote that gives rise to an ascus containing four haploid spores whose genotypes can be individually determined. If the gametes have different alleles at a particular locus then under normal circumstances two of the spores will display one genotype and two will display the other genotype, but sometimes this expected 2 : 2 segregation pattern is replaced by an unexpected 3 : 1 ratio (Figure 14.32). This is called gene conversion because the ratio can only be explained by one of the alleles ‘converting’ from one type to the other, presumably by recombination during the meiosis that occurs after the gametes have fused.
The double-strand break model provides an opportunity for gene conversion to take place during the recombination process. It initiates not with a single-strand nick, as in the Holliday scheme, but with a double-strand cut that breaks one of the partners in the recombination into two pieces (Figure 14.33). This might appear to be a drastic move to make but it has been shown that the protein responsible for the cut is a Type II DNA topoisomerase (Section 13.1.2) which forms covalent linkages with the two pieces of DNA and hence prevents them drifting completely apart. After the double-stranded cut, one strand in each half of the molecule is trimmed back by a 5′→3′ exonuclease, so each end now has a 3′ overhang of approximately 500 nucleotides. One of these invades the homologous DNA molecule in a manner similar to that envisaged by the Meselson-Radding scheme, setting up a Holliday junction that can migrate along the heteroduplex if the invading strand is extended by a DNA polymerase. To complete the heteroduplex, the other broken strand (the one not involved in the Holliday junction) is also extended. Note that both DNA syntheses involve extension of strands from the partner that suffered the double-stranded cut, using as templates the equivalent regions of the uncut partner. This is the basis of the gene conversion because it means that the polynucleotide segments removed from the cut partner by the exonuclease have been replaced with copies of the DNA from the uncut partner.
The resulting heteroduplex has a pair of Holliday structures that can be resolved in a number of ways, some resulting in gene conversion and others giving a standard reciprocal strand exchange. An example leading to gene conversion is shown in Figure 14.33.
The double-strand break model has been sufficiently well characterized in yeast for there to be little doubt that it occurs, at least in a form approximating to that shown in Figure 14.33. Some of the proteins involved in recombination in yeast are very similar to their counterparts in E. coli - eukaryotic RAD51, for example, has sequence similarity with RecA and is believed to work in the same way (Baumann and West, 1998) - prompting the suggestion that recombination in all organisms follows the double-strand break system. As yet there is little evidence to support this idea, particularly for the larger chromosomes of higher eukaryotes, and many geneticists resist the suggestion that vertebrate DNA undergoes frequent double-strand breaks during meiosis.
A region of extensive homology is not a prerequisite for recombination: the process can also be initiated between two DNA molecules that have only very short sequences in common. This is called site-specific recombination and it has been extensively studied because of the part that it plays during the infection cycle of bacteriophage λ.
After injecting its DNA into an E. coli cell, bacteriophage λ can follow either of two infection pathways (Section 4.2.1). One of these, the lytic pathway, results in the rapid synthesis of λ coat proteins, combined with replication of the λ genome, leading to death of the bacterium and release of new phages within about 20 minutes of the initial infection. In contrast, if the phage follows the lysogenic pathway, new phages do not immediately appear. The bacterium divides as normal, possibly for many cell divisions, with the phage in a quiescent form called the prophage. Eventually, possibly as the result of DNA damage or some other stimulus, the phage becomes active again, replicating its genome, directing synthesis of coat proteins, and bursting from the cell.
During the lysogenic phase the λ genome becomes integrated into the E. coli chromosome. It is therefore replicated whenever the E. coli DNA is copied, and so is passed on to daughter cells as if a standard part of the bacterium's genome. Integration occurs by site-specific recombination between the att sites, one on the λ genome and one on the E. coli chromosome, which have at their center an identical 15-bp sequence (Figure 14.34). Because this is recombination between two circular molecules, the result is that one bigger circle is formed; in other words the λ DNA becomes integrated into the bacterial genome. A second site-specific recombination between the two att sites, now both contained in the same molecule, reverses the original process and releases the λ DNA, which can now return to the lytic mode of infection and direct synthesis of new phages.
The recombination event is catalyzed by a specialized Type I topoisomerase (Section 13.1.2) called integrase (Kwon et al., 1997), a member of a diverse family of recombinases present in bacteria, archaea and yeast. The enzyme makes a staggered double-stranded cut at equivalent positions in the λ and bacterial att sites. The two short single-stranded overhangs are then exchanged between the DNA molecules, producing a Holliday junction which migrates a few base pairs along the heteroduplex before being cleaved. This cleavage, providing that it is made in the appropriate orientation, resolves the Holliday structure in such a way that the λ DNA becomes inserted into the E. coli genome. A similar process underlies excision, which is also carried out by integrase, but in conjunction with a second protein, ‘excisionase’, coded by the λ xis gene. If integrase could carry out excision on its own then it would probably excise the λ DNA as soon as it had integrated it.
Transposition is not a type of recombination but a process that utilizes recombination, the end result being the transfer of a segment of DNA from one position in the genome to another. A characteristic feature of transposition is that the transferred segment is flanked by a pair of short direct repeats (Figure 14.35) which, as we will see, are formed during the transposition process.
In Section 2.4.2 we examined the various types of transposable element known in eukaryotes and prokaryotes and discovered that these could be broadly divided into three categories on the basis of their transposition mechanism (Figure 14.36):
We will now examine the recombination events that are responsible for each of these three types of transposition.
A number of models for replicative and conservative transposition have been proposed over the years but most are modifications of a scheme originally outlined by Shapiro (1979). According to this model, the replicative transposition of a bacterial element such as a Tn3-type transposon or a transposable phage (Section 2.4.2) is initiated by one or more endonucleases that make single-stranded cuts either side of the transposon and in the target site where the new copy of the element will be inserted (Figure 14.37). At the target site the two cuts are separated by a few base pairs, so that the cleaved double-stranded molecule has short 5′ overhangs.
Ligation of these 5′ overhangs to the free 3′ ends either side of the transposon produces a hybrid molecule in which the original two DNAs - the one containing the transposon and the one containing the target site - are linked together by the transposable element flanked by a pair of structures resembling replication forks. DNA synthesis at these replication forks copies the transposable element and converts the initial hybrid into a co-integrate, in which the two original DNAs are still linked. Homologous recombination between the two copies of the transposon uncouples the co-integrate, separating the original DNA molecule (with its copy of the transposon still in place) from the target molecule, which now contains a copy of the transposon. Replicative transposition has therefore occurred.
A modification of the process just described changes the mode of transposition from replicative to conservative (see Figure 14.37). Rather than carrying out DNA synthesis, the hybrid structure is converted back into two separate DNA molecules simply by making additional single-stranded nicks either side of the transposon. This cuts the transposon out of its original molecule, leaving it ‘pasted’ into the target DNA.
From the human perspective, the most important retroelements are the retroviruses, which include the human immunodeficiency viruses that cause AIDS and various other virulent types. Most of what we know about retrotransposition refers specifically to retroviruses, although it is believed that other retroelements, such as retrotransposons of the Ty1/copia and Ty3/gypsy families, transpose by similar mechanisms.
The first step in retrotransposition is synthesis of an RNA copy of the inserted retroelement (Figure 14.38). The long terminal repeat (LTR) at the 5′ end of the element contains a TATA sequence which acts as a promoter for transcription by RNA polymerase II (Section 9.2.2). Some retroelements also have enhancer sequences (Section 9.3) that are thought to regulate the amount of transcription that occurs. Transcription continues through the entire length of the element, up to a polyadenylation sequence (Section 10.1.2) in the 3′ LTR. The transcript now acts as the template for RNA-dependent DNA synthesis, catalyzed by a reverse transcriptase enzyme coded by part of the pol gene of the retroelement (see Figure 2.26). Because this is synthesis of DNA, a primer is required (Section 13.2.2), and as during genome replication, the primer is made of RNA rather than DNA. During genome replication, the primer is synthesized de novo by a polymerase enzyme (see Figure 13.12), but retroelements do not code for RNA polymerases and so cannot make primers in this way. Instead they use one of the cell's tRNA molecules as a primer, which one depending on the retroelement: the Ty1/copia family of elements always use tRNAMet but other retroelements use different tRNAs.
The tRNA primer anneals to a site within the 5′ LTR (see Figure 14.38) At first glance this appears to be a strange location for the priming site because it means that DNA synthesis is directed away from the central region of the retroelement and so results in only a short copy of part of the 5′ LTR. In fact, when the DNA copy has been extended to the end of the LTR, a part of the RNA template is degraded and the DNA overhang that is produced re-anneals to the 3′ LTR of the retroelement which, being a long terminal repeat, has the same sequence as the 5′ LTR and so can base-pair with the DNA copy. DNA synthesis now continues along the RNA template, eventually displacing the tRNA primer. Note that the result is a DNA copy of the entire template, including the priming site: the template switching is, in effect, the strategy that the retroelement uses to solve the ‘end-shortening’ problem, the same problem that chromosomal DNAs address through telomere synthesis (Section 13.2.4).
Completion of synthesis of the first DNA strand results in a DNA-RNA hybrid. The RNA is partially degraded by an RNase H enzyme, coded by another part of the pol gene. The RNA that is not degraded, usually just a single fragment attached to a short polypurine sequence adjacent to the 3′ LTR, primes synthesis of the second DNA strand, again by reverse transcriptase, which is able to act as both an RNA- and DNA-dependent DNA polymerase. As with the first round of DNA synthesis, second-strand synthesis initially results in a DNA copy of just the LTR, but a second template switch, to the other end of the molecule, enables the DNA copy to be extended until it is full length. This creates a template for further extension of the first DNA strand, so that the resulting double-stranded DNA is a complete copy of the internal region of the retroelement plus the two LTRs.
All that remains is to insert the new copy of the retroelement into the genome. It was originally thought that insertion occurred randomly, but it now appears that although no particular sequence is used as a target site, integration occurs preferentially at certain positions (Devine and Boeke, 1996). Insertion involves removal of two nucleotides from the 3′ ends of the double-stranded retroelement by the integrase enzyme (coded by yet another part of pol). The integrase also makes a staggered cut in the genomic DNA so that both the retroelement and the integration site now have 5′ overhangs (Figure 14.39). These overhangs might not have complementary sequences but they still appear to interact in some way so that the retroelement becomes inserted into the genomic DNA. The interaction results in loss of the retroelement overhangs and filling in of the gaps that are left, which means that the integration site becomes duplicated into a pair of direct repeats, one at either end of the inserted retroelement.
Distinguish between the terms ‘mutation’, ‘DNA repair’ and ‘recombination’.2.
Explain how errors in DNA replication can lead to mutations.3.
Giving examples, summarize the key features of trinucleotide repeat expansion diseases.4.
List the various types of chemical and physical agents that have mutagenic properties. Give at least one example of each type of agent and describe the types of mutation that they cause.5.
Describe the various effects that a mutation can have on the coding properties of a genome.6.
Distinguish between the effects of mutations on the somatic and germ cells of multicellular organisms.7.
Name and define the four major types of mutant phenotype recognized in bacteria.8.
Describe, with examples, what is meant by the terms ‘hypermutation’ and ‘programmed mutation’.9.
Distinguish between the various types of DNA repair mechanism that are known.10.
Compare and contrast the direct DNA repair systems of bacteria and eukaryotes.11.
Give detailed descriptions of the base excision and nucleotide excision repair processes of bacteria and eukaryotes.12.
Describe the mismatch repair processes of bacteria and eukaryotes, paying attention to the ways in which the daughter and parent strands are recognized in the two types of organism.13.
Define the term ‘non-homologous end-joining’ and explain how this process results in the repair of double-strand breaks in DNA molecules.14.
How can DNA damage be bypassed during genome replication in Escherichia coli and eukaryotes?15.
Discuss the links between DNA repair and human disease.16.
Draw a fully annotated diagram of the Holliday model for homologous recombination.17.
In what way does the Meselson-Radding modification improve the Holliday model for homologous recombination?18.
Describe the functions of each of the proteins thought to be involved in homologous recombination in Escherichia coli.19.
Draw a fully annotated diagram of the double-strand break model for recombination in yeast.20.
Describe how site-specific recombination underlies insertion and excision of the λ genome into and out of the Escherichia coli chromosome.21.
Explain how recombination events can result in the replicative or conservative transposition of a DNA sequence.22.
Draw a fully annotated diagram illustrating the transposition mechanism of a retrovirus.
Explore the current knowledge concerning trinucleotide repeat expansion diseases, including hypotheses that attempt to explain why triplet expansion in these genes leads to a disease.2.
‘Not all mutations have an immediate impact: some are delayed onset and only confer an altered phenotype later in the individual's life. Others display non-penetrance in some individuals, never being expressed even though the individual has a dominant mutation or is a homozygous recessive.’ Devise mechanisms to explain how mutations can exhibit delayed onset or non-penetrance.3.
Evaluate the evidence for programmed mutations.4.
The bacterium Deinococcus radiodurans is highly resistant to radiation and to other physical and chemical mutagens. Discuss how these special properties of D. radiodurans are reflected in its genome sequence. (See White O, Eisen JA, Heidelberg JF, et al.  Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1. Science, 286, 1571-1577.)5.
Assess the general importance of the double-strand break model for gene conversion in yeast. Is there evidence for this type of recombination in organisms other than yeast?