6 . Sequencing Genomes

6. Sequencing Genomes

Learning outcomes When you have read Chapter 6, you should be able to: Distinguish between the two methods used to sequence DNA Give a detailed description of chain termination sequencing and an outline description of the chemical degradation method Describe the key features of automated DNA sequencing and evaluate the importance of automated sequencing in genomics research State the strengths and limitations of the shotgun, whole-genome shotgun and clone contig methods of genome sequencing Describe how a small bacterial genome can be sequenced by the shotgun method, using the Haemophilusinfluenzae project as an example Outline the various ways in which a clone contig can be built up Explain the basis to the whole-genome shotgun approach to genome sequencing, with emphasis on the steps taken to ensure that the resulting sequence is accurate Give an account of the development of the Human Genome Project up to the publication of the draft sequence in February 2001 Debate the ethical, legal and social issues raised by the human genome projects

6.1. The Methodology for DNA Sequencing 6.2. Assembly of a Contiguous DNA Sequence 6.3. The Human Genome Projects

6.1. The Methodology for DNA Sequencing

Figure 6.1. Polyacrylamide gel electrophoresis can resolve single-stranded DNA molecules that differ in length by just one nucleotide. The banding pattern is produced after separation of single-stranded DNA molecules by denaturing polyacrylamide gel electrophoresis. The molecules are labeled with a radioactive marker and the bands visualized by autoradiography. The bands gradually get closer together towards the top of the ladder. In practice, molecules up to about 1500 nucleotides in length can be separated if the electrophoresis is continued for long enough.

Figure 6.2. Chain termination DNA sequencing. (A) Chain termination sequencing involves the synthesis of new strands of DNA that are complementary to a single-stranded template. (B) Strand synthesis does not proceed indefinitely because the reaction mixture contains small amounts of a dideoxynucleotide, which blocks further elongation because it has a hydrogen atom rather than a hydroxyl group attached to its 3′-carbon. (C) Strand synthesis in the presence of ddATP results in chains that are terminated opposite Ts in the template. This ‘A' family of terminated chains is loaded into one lane of a polyacrylamide gel, alongside the families of terminated chains from the T, G and C reactions. (D) In the methodology shown here, the banding pattern is visualized by autoradiography, the terminated chains having become radioactively labeled by inclusion of a labeled dNTP in the strand synthesis reactions. The sequence, shown on the right, is read by noting which lane each band lies in, starting at the bottom of the autoradiograph and moving band by band towards the top

Figure 6.3. Obtaining single-stranded DNA by cloning in a bacteriophage M13 vector. M13 vectors can be obtained in two forms: the double-stranded replicative molecule and the single-stranded version found in bacteriophage particles. The replicative form can be manipulated in the same way as a plasmid cloning vector (Section 4.2.1) with new DNA inserted by restriction followed by ligation. The recombinant vector is introduced into Escherichia coli cells by transfection. Once inside an E. coli cell, the double-stranded vector replicates and directs synthesis of single-stranded copies, which are packaged into phage particles and secreted from the cell. The phage particles can be collected from the culture medium after centrifuging to pellet the bacteria. The protein coats of the phages are removed by treating with phenol, and the single-stranded version of the recombinant vector is purified for use in DNA sequencing.

Figure 6.4. One way of using PCR to prepare template DNA for chain termination sequencing. The PCR is carried out with one normal primer (shown in red), and one primer that is labeled with a metallic bead (shown in brown). After PCR, the labeled strands are purified with a magnetic device. For more details about PCR, see Section 4.3.

Figure 6.5. Different types of primer for chain termination sequencing. (A) A universal primer anneals to the vector DNA, adjacent to the position at which new DNA is inserted. A single universal primer can therefore be used to sequence any DNA insert, but only provides the sequence of one end of the insert. (B) One way of obtaining a longer sequence is to carry out a series of chain termination experiments, each with a different internal primer that anneals within the DNA insert.

Figure 6.6. Thermal cycle sequencing. PCR is carried out with just one primer and with a dideoxynucleotide present in the reaction mixture. The result is a family of chain-terminated strands - the ‘A' family in the reaction shown. These strands, along with the products of the C, G and T reactions, are electrophoresed as in the standard methodology (see Figure 6.2D).

Figure 6.7. Automated DNA sequencing with fluorescently labeled dideoxynucleotides. (A) The chain termination reactions are carried out in a single tube, with each dideoxynucleotide labeled with a different fluorophore. In the automated sequencer, the bands in the electrophoresis gel move past a fluorescence detector, which identifies which dideoxynucleotide is present in each band. The information is passed to the imaging system. (B) The printout from an automated sequencer. The sequence is represented by a series of peaks, one for each nucleotide position. In this example, a green peak is an ‘A', blue is ‘C', black is ‘G', and red is ‘T'.

Figure 6.8. Pyrosequencing. The strand synthesis reaction is carried out in the absence of dideoxynucleotides. Each dNTP is added individually, along with a nucleotidase enzyme that degrades the dNTP if it is not incorporated into the strand being synthesized. Incorporation of a nucleotide is detected by a flash of chemiluminescence induced by the pyrophosphate released from the dNTP. The order in which nucleotides are added to the growing strand can therefore be followed.

Figure 6.9. A possible way of using chip technology in DNA sequencing. The chip carries an array of every possible 8-mer oligonucleotide. The DNA to be sequenced is labeled with a fluorescent marker and applied to the chip, and the positions of hybridizing oligonucleotides determined by confocal microscopy. Each hybridizing oligonucleotide represents an 8-nucleotide sequence motif that is present in the probe DNA. The sequence of the probe DNA can therefore be deduced from the overlaps between the sequences of these hybridizing oligonucleotides. See Technical Note 5.1 for more information on DNA chips.

6.2. Assembly of a Contiguous DNA Sequence

Figure 6.10. The way in which the shotgun approach was used to obtain the DNA sequence of the Haemophilus influenzae genome. H. influenzae DNA was sonicated and fragments with sizes between 1.6 and 2.0 kb purified from an agarose gel and ligated into a plasmid vector to produce a clone library. End sequences were obtained from clones taken from this library, and a computer used to identify overlaps between sequences. This resulted in 140 sequence contigs, which were assembled into the complete genome sequence, as shown in Figure 6.11. For further details, see Fleischmann et al. (1995).

Figure 6.11. Assembly of the complete Haemophilus influenzae genome sequence by spanning the gaps between individual sequence contigs. (A) ‘Sequence gaps' are ones which can be closed by further sequencing of clones already present in the library. In this example, the end-sequences of contigs 1 and 2 lie within the same plasmid clone, so further sequencing of this DNA insert with internal primers (see Figure 6.5B) will provide the sequence to close the gap. (B) ‘Physical gaps' are stretches of sequence that are not present in the clone library, probably because these regions are unstable in the cloning vector that was used. Two strategies for closing these gaps are shown. On the left, a second clone library, prepared with a bacteriophage λ vector rather than a plasmid vector, is probed with oligonucleotides corresponding to the ends of the contigs. Oligonucleotides 1 and 7 both hybridize to the same clone, whose insert must therefore contain DNA spanning the gap between contigs 1 and 4. On the right, PCRs are carried out with pairs of oligonucleotides. Only numbers 1 and 7 give a PCR product, confirming that the contig ends represented by these two oligonucleotides are close together in the genome. The PCR product or the insert from the λ clone could be sequenced to close the gap between contigs 1 and 4.

Figure 6.12. Chromosome walking. The library comprises 96 clones, each containing a different insert. To begin the walk, the insert from one of the clones is used as a hybridization probe against all the other clones in the library. In the example shown, clone A1 is the probe; it hybridizes to itself and to clones E7 and F6. The inserts from the last two clones must therefore overlap with the insert from clone A1. To continue the walk, the probing is repeated but this time with the insert from clone F6. The hybridizing clones are A1, F6 and B12, showing that the insert from B12 overlaps with the insert from F6.

Figure 6.13. Chromosome walking by PCR. The two oligonucleotides anneal within the end region of insert number 1. They are used in PCRs with all the other clones in the library. Only clone 15 gives a PCR product, showing that the inserts in clones 1 and 15 overlap. The walk would be continued by sequencing the fragment from the other end of clone 15, designing a second pair of oligonucleotides, and using these in a new set of PCRs with all the other clones.

Figure 6.14. Combinatorial screening of clones in microtiter trays. In this example, a library of 960 clones has to be screened by PCR. Rather than carrying out 960 individual PCRs, the clones are grouped as shown and just 296 PCRs are performed. In most cases, the results enable positive clones to be identified unambiguously. In fact, if there are few positive clones, then sometimes they can be identified by just the ‘row' and ‘column' PCRs. For example, if positive PCRs are obtained with tray 2 row A, tray 6 row D, tray 2 column 7, and tray 6 column 9, then it can be concluded that there are two positive clones, one in tray 2 well A7 and one in tray 6 well D9. The ‘well' PCRs are needed if there are two or more positive clones in the same tray.

Figure 6.15. Four clone fingerprinting techniques.

Figure 6.16. Avoiding errors when the whole-genome shotgun approach is used. In Figure 5.2B, we saw how easy it would be to ‘jump' between repeat sequences when assembling the master sequence by the standard shotgun approach. The result of such an error would be to lose all the sequence between the two repeats that had mistakenly been linked together. This type of error is avoided in the whole-genome shotgun approach by ensuring that the two end-sequences of a cloned DNA fragment (of 10 kb or so) both appear on the master sequence at their expected positions in the unique DNA sequences to either side of a genome-wide repeat. If one of the end-sequences is missing, then an error has been made when assembling the master sequence.

Figure 6.17. Scaffolds are intermediates in sequence assembly by the whole-genome shotgun approach. Two scaffolds are shown. Each comprises a series of sequence contigs separated by sequence gaps, with the scaffolds themselves separated by physical gaps.

Figure 6.18. The random nature of sequence generation by the whole-genome shotgun approach means that some parts of the genome are covered by more mini-sequences than other parts

6.3. The Human Genome Projects

Figure 6.19. Some YAC clones contain segments of DNA from different parts of the human genome.

6 . Sequencing Genomes