1 / 56

Mid-term Examination

Mid-term Examination. On October 14 th , 2011 mid-term One hour in class exam, 100 points Closed book Essay type questions plus some short answers Will cover ALL chapters done until then Will count towards 30% of the final grade ( No chapter by chapter exams/quizzes until midterm ).

erelah
Download Presentation

Mid-term Examination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mid-term Examination • On October 14th,2011 mid-term • One hour in class exam, 100 points • Closed book • Essay type questions plus some short answers • Will cover ALL chapters done until then • Will count towards 30% of the final grade • (No chapter by chapter exams/quizzes until midterm) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  2. Class paper • Each discussion group will consist of four-five students (you will form your own group and let me know by email) • Each of you will see a movie that uses DNA, genes, genomics or genetic engineering as a theme (e.g. Jurassic park) and write a 3-5 page overview of that movie and submit to me electronically by November 4th, 2011. • You will discuss the movie that you selected with the group • You ALL in each group will select only ONE movie that you want to present to the class and only ONE of you will present it • One representative per group will present a 10 minute powerpoint talk sometime between December 5-9, 2011. • Tell the class about your movie selection: its main theme, the plot and how it fits with the topic of the class. • Provide your interpretations about accuracies and discrepancies of science depicted in those movies. • If you were the writer/director, how would you improve it to portray the science more accurately (but not making it a complete flop) • Questions? If needed, I could email you a list of possible movies that you could use for this. I have list of some 30-40 movies. © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  3. Chapter 4Genome Sequencing Strategies and procedures for sequencing entire genomes © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  4. Contents • The Human Genome Project • Sequencing strategies • Large-scale sequencing • Accuracy and coverage • EST sequencing • Sequence annotation © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  5. Background • Field of genomics began with decision to sequence human genome • Size of human genome is 3 billion base pairs, which necessitated new ways to do sequencing • Approaches to sequencing the human genome • Scale up existing techniques • Develop new sequencing techniques • Start with smaller genomes used as a warm-up projects © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  6. Goals of the Human Genome Project • Sequence the entire genome • Not just transcribed or disease genes: discussion on regulation of gene expression (see next slide) • Sequencing should be performed with a high level of accuracy • One error in 10,000 bases • Develop genomic resources that would be useful for ALL genes • Example: collections of physical markers • Develop economies of scale: a few big centers © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  7. Central Dogma in Molecular Biology © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  8. Scale-up of existing technologies • There has been remarkable improvement in sequencing efficiency since the invention of DNA sequencing in 1975 • The amount of sequencing that one person can perform has increased dramatically • 1980: 0.1– 1 kb per year • 1985: 2–10 kb per year • 1990: 25–50 kb per year • 1996: 100–200 kb per year • 2000: 500–1,000 kb per year • Almost all large-scale sequencing is still based on Sanger chain-termination technology © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  9. New technologies • A high-priority goal at the beginning of the Human Genome Project was to develop new mapping and sequencing technologies • To date, no major breakthrough technology has been developed • Possible exception: whole-genome shotgun sequencing applied to large genomes, Celera • Pyrosequencing 454 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  10. Automated sequencers • Perhaps the most important contribution to large-scale sequencing was the development of automated sequencers • Most use Sanger sequencing method • Fluorescently labeled reaction products • Capillary electrophoresis for separation • Most commonly used automated sequencers are the following: • ABI • MegaBACE (GE Healthcare) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  11. Automated sequencers: ABI 3700 • Made by Applied Biosystems • Most widely used automated sequencers: • 96 capillaries • robot loading from 384-well plates (4X) • Two to three hours per run • 600–700 bases per run • 268,800 bp per total run • (700 X96X4) robotic arm and syringe 96 glass capillaries 96–well plate load bar © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  12. Automated sequencers: MegaBACE • Made by Amersham • 96 capillaries • Robotic loading from 384–well plate • Two to four hours per run • Can read up to 800 bases © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  13. Automatic gel reading • Top image: confocal detection by the MegaBACE sequencer of fluorescently labeled DNA • Bottom image: computer image of sequence read by automated sequencer © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  14. Steps in genomic sequencing • Library making • Large-insert library from genome/chromosome • Production sequencing • Generate fragments to be sequenced • Perform sequencing reactions • Determine sequence • Finishing • Assemble into continuous sequence • Fill gaps © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  15. FACS: Fluorescence activated cell sorter © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  16. Library making • Library of genomic fragments made in vector • BAC, PAC, or YAC • Usually have several-fold coverage (representation) • Every DNA sequence on five to eight different clones • Difficult and inefficient to sequence straight from large fragment • Need to break into manageable pieces • Random shearing to make 1-2 kb fragments • By nebulization or sonication © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  17. Fragments for sequencing • Generally use 2–10 kb pieces for sequencing • Clone into sequencing vector • Contains binding sites for sequencing primers • Can be single stranded or double stranded © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  18. Sequence assembly • Random sequences • First assemble into overlapping sequence • Then create one continuous sequence • Program used for this operation named PHRAP • Analyzes each position to determine the following: • Quality of sequence • Consistency of sequence of same region • Acquired from different random fragments © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  19. Sequence assembly readout Consensus building © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  20. Finishing I • Process of assembling raw sequence reads into accurate contiguous sequence • Required to achieve 1/10,000 accuracy • Manual process • Look at sequence reads at positions where programs can’t tell which base is the correct one • Fill gaps • Ensure adequate coverage Gap Single stranded © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  21. Finishing II • To fill gaps in sequence, design primers and sequence from primer • To ensure adequate coverage, find regions where there is not sufficient coverage and use specific primers for those areas GAP Primer Primer © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  22. Verification • Region verified for the following: • Coverage • Sequence quality • Contiguity • Determine restriction-enzyme cleavage sites • Generate restriction map of sequenced region • Must agree with fingerprint generated of clone during mapping step © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  23. Map-based sequencing I • Human Genome Project adopted a map-based strategy • Start with well-defined physical map • Produce shortest tiling path for large-insert clones • Assemble the sequence for each clone • Then assemble the entire sequence, based on the physical map © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  24. Map-based sequencing II Constructclone map and select mapped clones Generate several thousand sequence reads per clone Assemble © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  25. Whole-genome shotgun sequencing I • Developed by Celera • Subsidiary of Applied Biosystems, maker of automated sequencers • No mapping • Instead, the whole genome is sheared • Randomly sequenced © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  26. Whole-genome shotgun sequencing II Generate tens of millions of sequence reads Assemble © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  27. Whole-genome shotgun sequencing III • Major challenge: assembly • Repetitive elements are the biggest problem • Performed on very high-speed computers, using a novel software • Key to assembly is paired reads from both sides • Sequence both ends of each clone © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  28. Controversy: Map-based sequencing vs. whole-genome shotgun sequencing • Celera used publicly funded sequence to produce its published draft of the human genome • Scientists who worked on the map-based effort claimed Celera couldn’t have produced a draft without access to the public sequence • Celera scientists claim that they could have produced an accurate draft even without the public sequence: Drosophila genome © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  29. Hybrid approach • Combines aspects of both map-based and whole-genome shotgun approaches • Map clones • Sequence some of the mapped clones • Do whole-genome sequencing • Combine information from both methods • Use sequence from mapped clones as scaffold to assemble whole-genome shotgun reads • Used for sequencing the mouse genome © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  30. © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  31. Completed genomes as of 2002 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  32. New genomes finishedhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj • Over 400 genomes are sequenced (Science 313: 1897, 2006) • About 1600 genomes are being sequenced • Poplar (Black cottonwood) five years back • Rat • Chicken • Dog • Chimpanzee • (http://www.nslij-genetics.org/seq/ ) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  33. © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  34. Completed genomes as of today http://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_genomes • Plants: 17 • Arabidopsis, red algae, green algae, rice, poplar, grape, papaya, bryophyte, cucumber, corn, apple and soybean (brachypodium, A. lyrata, potato, date palm) • Animals: >50 • Mosquito, honeybee, silk worm, dog, nematode, tunicate, fruit fly, chicken, human, opossum, mouse, chimpanzee, rat, sea urchin, puffer fish. • Hundreds of bacteria, fungi etc © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  35. Sizes of genomes and numbers of genes © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  36. Sequencing parameters • Difficulty and cost of large-scale sequencing projects depend on the following parameters: • Accuracy • How many errors are tolerated • Coverage • How many times the same region is sequenced • The two parameters are related • More coverage usually means higher accuracy • Accuracy is also dependent on the finishing effort © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  37. Sequence accuracy • Highly accurate sequences are needed for the following: • Diagnostics • e.g., Forensics or identifying disease alleles in a patient • Protein coding prediction • One insertion or deletion changes the reading frame • Lower accuracy sufficient for homology searches • Differences in sequence are tolerated by search programs © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  38. Sequence accuracy and sequencing cost • Level of accuracy determines cost of project • Increasing accuracy from one error in 100 to one error in 10,000 increases costs three to five-folds • Need to determine appropriate level of accuracy for each project • If reference sequence already exists, then a lower level of accuracy should suffice • Can find genes in genome, but not their exact position © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  39. Sequencing coverage • Coverage is the number of times the same region is sequenced • Ideally, one wants an equal number of sequences in each direction • To obtain accuracy of one error in 10,000 bases, one needs the following: • 10x coverage • Stringent finishing • Complete sequence • Base-perfect sequencing = 1 error in 10K bp © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  40. Rough-draft and skimming sequence • Rough-draft sequence refers to an average of 5x coverage • Skimming is 1–3x coverage • Obtains 67%–97% of the sequence • On average, 99% accurate • Of greatest use when can compare the sequence to a reference sequence • For example, chimpanzee genome compared with human genome © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  41. Industrialization of sequencing • Most large-scale sequencing projects divide tasks among different teams • Large-insert libraries • Production sequencing • Finishing • Sequencing machines run 24/7 • Many tasks performed by robots © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  42. Book of Tree Life Tuskan et al. 2006 Science 313:1596-1604, 2006 Chapter 3: Poplar, > 45,000 genes Genome size: 480 MBP in 19 LGs Genome duplications: 8,000 gene pairs Overrepresentation of cell wall genes © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  43. EST sequencing I • Idea: sequence only “important” genes • Those genes expressed in a particular tissue • Sequence random cDNAs made from RNA extracted from tissue of interest Muscle mRNA cDNA libraries “New” Biolims BIOlogical Laboratory Information Management System Robotized stations DNA sequencers © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  44. EST sequencing II • Make cDNA library • Select clones at random • Sequence in from one or both ends • One-pass sequencing • The resulting sequence = expressed sequence tag (EST) cDNA 3’ 5’ Partial sequence = EST © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  45. Advantages Relatively inexpensive Certainty that sequence comes from transcribed gene Information about tissue and developmental stage Disadvantages No regulatory (promoter) information Usually less than 60% of genes found in EST collections (Unknown functions) Location of sequence in genome unknown EST sequencing: pros and cons Snapshot of transcription activity Mixture of cells © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  46. Sequence annotation • Annotation performed on completed sequence • Computer programs used to find the following: • Genes • Exons and introns • Regulatory sequences • Repetitive elements © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  47. Interesting facts about human genome project(http://www.genome.gov/11006943) • Whose DNA was sequenced for the Human Genome Project? This is intentionally not known to protect the volunteers who provided DNA samples for sequencing. The sequence is derived from the DNA of several volunteers. To ensure that the identities of the volunteers cannot be revealed, a careful process was developed to recruit the volunteers and to collect and maintain the blood samples that were the source of the DNA. The volunteers responded to local public advertisements near the laboratories where the DNA "libraries" were prepared. Candidates were recruited from a diverse population. The volunteers provided blood samples after being extensively counseled and then giving their informed consent. About 5 to 10 times as many volunteers donated blood as were eventually used, so that not even the volunteers would know whether their sample was used. All labels were removed before the actual samples were chosen. © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  48. Is the human genome completely sequenced? Yes - within the limits of today's technology, the human genome is as complete as it can be. Small gaps that are unrecoverable in any current sequencing method remain, amounting for about 1 percent of the gene-containing portion of the genome, or euchromatin. New technologies will have to be invented to obtain the sequence of these regions. • However, the gene-containing portion of the genome is complete in nearly every functional way for the purposes of scientific research and is freely and publicly available. Even though the Human Genome Project is now completed, scientists will continue to develop and apply new technologies to the few remaining refractory problems. For its part, NHGRI will continue to support a wide range of research to develop new sequencing technologies, to interpret the human sequence and to use the newfound understanding of the human genome to improve human health. © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  49. How much did the Human Genome Project cost U.S. taxpayers? • In 1990, Congress established funding for the Human Genome Project and set a target completion date of 2005. Although estimates suggested that the project would cost a total of $3 billion over this period, the project ended up costing less than expected, about $2.7 billion in FY 1991 dollars. Additionally, the project is being completed more than two years ahead of schedule. • It is also important to consider that the Human Genome Project will likely pay for itself many times over on an economic basis - if one considers that genome-based research will play an important role in seeding biotechnology and drug development industries, not to mention improvements in human health. © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

  50. What will the next 50 years of medical science look like? • Having the essentially complete sequence of the human genome is similar to having all the pages of a manual needed to make the human body. The challenge to researchers and scientists now is to determine how to read the contents of all these pages and then understand how the parts work together and to discover the genetic basis for health and the pathology of human disease. In this respect, genome-based research will eventually enable medical science to develop highly effective diagnostic tools, to better understand the health needs of people based on their individual genetic make-ups, and to design new and highly effective treatments for disease. • Individualized analysis based on each person's genome will lead to a very powerful form of preventive medicine. We'll be able to learn about risks of future illness based on DNA analysis. Physicians, nurses, genetic counselors and other health-care professionals will be able to work with individuals to focus efforts on the things that are most likely to maintain health for a particular individual. That might mean diet or lifestyle changes, or it might mean medical surveillance. But there will be a personalized aspect to what we do to keep ourselves healthy. Then, through our understanding at the molecular level of how things like diabetes or heart disease or schizophrenia come about, we should see a whole new generation of interventions, many of which will be drugs that are much more effective and precise than those available today. © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

More Related