Binf 101 introduction to bioinformatics
Download
1 / 69

Slides1 - PowerPoint PPT Presentation


  • 270 Views
  • Uploaded on

BINF 101 Introduction to Bioinformatics. Arthur W. Chou Dept of Math and Computer Science Clark University February 11, 2008. Human Genome Project. Historical context Goals of the HGP Strategy Results Impact on biomedical research. February 2001. « Finished » sequence April 2003.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Slides1' - arleen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Binf 101 introduction to bioinformatics l.jpg

BINF 101 Introduction to Bioinformatics

Arthur W. Chou

Dept of Math and Computer Science

Clark University

February 11, 2008


Human genome project l.jpg
Human Genome Project

  • Historical context

  • Goals of the HGP

  • Strategy

  • Results

  • Impact on biomedical research


Slide4 l.jpg

February 2001

« Finished » sequence

April 2003


Brief history of hgp l.jpg
Brief history of HGP

  • 1984 to 1986 - first proposed at US DOE meetings

  • 1988 - endorsed by US National Research Council

    (Funded by NIH and US DOE $3 billion set aside)

  • 1990 - Human Genome Project started (NHGRI)

    Later – UK, France, Japan, Germany, China

  • 1998 - Celera announces a 3-year plan to complete the project years early

  • First draft published in Science and Nature in February, 2001

  • Finished Human Genome sequence published in Nature 2003.


Goals of hgp l.jpg
Goals of HGP

  • Create a genetic and physical map of the 24 human chromosomes (22 autosomes, X & Y)

  • Identify the entire set of genes & map them all to their chromosomes

  • Determine the nucleotide sequence of the estimated 3 billion base pairs

  • Analyze genetic variation among humans

  • Map and sequence the genomes of model organisms


Model organisms l.jpg
Model organisms

  • Bacteria (E. coli, influenza, several others)

  • Yeast (Saccharomyces cerevisiae)

  • Plant (Arabidopsis thaliana)

  • Round worm (Caenorhabditis elegans)

  • Fruit fly (Drosophila melanogaster)

  • Mouse (Mus musculus)


Goals of hgp ii l.jpg
Goals of HGP (II)

  • Develop new laboratory and computing technologies to make all this possible

  • Disseminate genome information

  • Consider ethical, legal, and social issues associated with this research


Two competing strategies for human genome l.jpg
Two Competing Strategies for Human Genome

  • Hierarchical shotgun [Public human genome project]

    Map First, Sequence Later

    • Create a set of mapped large-insert clones to use as sequencing substrate

  • Whole-genome Shotgun [Celera project]

    Sequence First, Map Last

    • Create a genomic library (or libraries), sequence the clone ends at random, and use a computational approach to assemble the random fragments into contiguous stretches of sequence



Map first sequence later l.jpg
Map First Sequence Later

  • Sort chromosomes

  • For each chromosome clone large fragments of DNA

  • Map clones

  • Identify set of clones that span the chromosome

  • Shotgun sequence each clone

  • Finish (close gaps)


Slide18 l.jpg

Assembling Genome Sequencing Data

STS: Sequence Tag Sites


Slide19 l.jpg

Sequenced-clone contigs are merged to form

scaffolds of known order and orientation


Sequence first map last whole genome shotgun l.jpg
Sequence First Map Last (Whole Genome Shotgun)

  • Isolate genomic DNA

  • Construct clone libraries of varying sizes

    • Make sure library is random or nearly so

  • Sequence both ends of each clone

  • Assemble the sequences computationally

  • Finish (close gaps)


Whole genome shotgun sequencing l.jpg
Whole-genome shotgun sequencing

  • Whole genome randomly sheared three times

    • Plasmid library constructed with ~ 2kb inserts

    • Plasmid library with ~10 kb inserts

    • BAC library with ~ 200 kb inserts

  • Computer program assembles sequences into chromosomes

  • No physical map construction


Slide22 l.jpg

Genome Sequencing Strategies

WGS

Restrict and make small and

large-insert clone libraries

End-sequence all

clones and retain

pairing information

“mate-pairs”

Find sequence overlaps

among all clone end sequences

Collapse overlaps into contigs

WGS contigs


Slide23 l.jpg

Genome Sequencing Strategies

Constructing Supercontigs (scaffolds)


Working draft sequence l.jpg

gaps

Working Draft Sequence


The genome is who we are on the inside l.jpg
The Genome is Who We Are on the inside!

Information coded in DNA

  • Chromosomes consist of DNA

    • molecular strings of A, C, G, & T

    • base pairs, A-T, C-G

  • Genes

    • DNA sequences that encode proteins

    • less than 3% of human genome


Slide29 l.jpg

5000 bases per page

CACACTTGCATGTGAGAGCTTCTAATATCTAAATTAATGTTGAATCATTATTCAGAAACAGAGAGCTAACTGTTATCCCATCCTGACTTTATTCTTTATG AGAAAAATACAGTGATTCC

AAGTTACCAAGTTAGTGCTGCTTGCTTTATAAATGAAGTAATATTTTAAAAGTTGTGCATAAGTTAAAATTCAGAAATAAAACTTCATCCTAAAACTCTGTGTGTTGCTTTAAATAATC

AGAGCATCTGC TACTTAATTTTTTGTGTGTGGGTGCACAATAGATGTTTAATGAGATCCTGTCATCTGTCTGCTTTTTTATTGTAAAACAGGAGGGGTTTTAATACTGGAGGAACAA

CTGATGTACCTCTGAAAAGAGA AGAGATTAGTTATTAATTGAATTGAGGGTTGTCTTGTCTTAGTAGCTTTTATTCTCTAGGTACTATTTGATTATGATTGTGAAAATAGAATTTATCC

CTCATTAAATGTAAAATCAACAGGAGAATAGCAAAAACTTATGAGATAGATGAACGTTGTGTGAGTGGCATGGTTTAATTTGTTTGGAAGAAGCACTTGCCCCAGAAGATACACAAT

GAAATTCATGTTATTGAGTAGAGTAGTAATACAGTGTGTTCCCTTGTGAAGTTCATAACCAAGAATTTTAGTAGTGGATAGGTAGGCTGAATAACTGACTTCCTATC ATTTTCAGGTT

CTGCGTTTGATTTTTTTTACATATTAATTTCTTTGATCCACATTAAGCTCAGTTATGTATTTCCATTTTATAAATGAAAAAAAATAGGCACTTGCAAATGTCAGATCACTTGCCTGTGGT

CATTCGGGTAGAGATTTGTGGAGCTAAGTTGGTCTTAATCAAATGTCAAGCTTTTTTTTTTCTTATAAAATATAGGTTTTAATATGAGTTTTAAAATAAAATTAATTAGAAAAAGGCAA

ATTACTCAATATATATAAGGTATTGCATTTGTAATAGGTAGGTATTTCATTTTCTAGTTATGGTGGGATATTATTCAGACTATAATTCCCAATGAAAAAACTTTAAAAAATGCTAGTGA

TTGCACACTTAAAACACCTTTTAAAAAGCATTGAGAGCTTATAAAATTTTAATGAGTGATAAAACCAAATTTGAAGAGAAAAGAAGAACCCAGAGAGGTAAGGATATAACCTTACC

AGTTGCAATTTGCCGATCTCTACAAATATTAATATTTATTTTGACAGTTTCAGGGTGAATGAGAAAGAAACCAAAACCCAAGACTAGCATATGTTGTCTTCTTAAGGAGCCCTCCCCT

AAAAGATTGAGATGACCAAATCTTATACTCTCAGCATAAGGTGAACCAGACAGACCTAAAGCAGTGGTAGCTTGGATCCACTACTTGGGTTTGTGTGTGGCGTGACTCAGGTAATCT

CAAGAATTGAACATTTTTTTAAGGTGGTCCTACTCATACACTGCCCAGGTATTAGGGAGAAGCAAATCTGAATGCTTTATAAAAATACCCTAAAGCTAAATCTTACAATATTCTCAAG

AACACAGTGAA ACAAGGCAAAATAAGTTAAAATCAACAAAAACAACATGAAACATAATTAGACACACAAAGACTTCAAACATTGGAAAATACCAGAGAAAGATAATAAATAT

TTTACTCTTTAAAAATTTAGTTAAAAGCTTAAACTAATTGTAGAGAAAA AACTATGTTAGTATTATATTGTAGATGAAATAAGCAAAACATTTAAAATACAAATGTGATTACTTAAAT

TAAATATAATAGATAATTTACCACCAGATTAGATACCATTGAAGGAATAATTAATATACTGAAATACAGGTCAGTAGAATTTTTTTCAATTCAGCATGGAGATGTAAAAAATGAAAA

TTAATGCAAAAAATAAGGGCACAAAAAGAAATGAGTAATTTTGATCAGAAATGTATTAAAATTAATAAACTGGAAATTTGACATTTAAAAAAAGCATTGTCATCCAAGTAGATGTG

TCTATTAAATAGTTGTTCTCATATCCAGTAATGTAATTATTATTCCCTCTCATGCAGTTCAGATTCTGGGGTAATCTTTAGACATCAGTTTTGTCTTTTATATTATTTATTCTGTTTACTAC

ATTTTATTTTGCTAATGATATTTTTAATTTCTGACATTCTGGAGTATTGCTTGTAAAAGGTATTTTTAAAAATACTTTATGGTTATTTTTGTGATTCCTATTCCTCTATGGACACCAAGGCT

ATTGACATTTTCTTTGGTTTCTTCTGTTACTTCTATTTTCTTAGTGTTTATATCATTTCATAGATAGGATATTCTTTATTTTTTATTTTTATTTAAATATTTGGTGATTCTTGGTTTTCTCAGCC

ATCTATTGTCAAGTGTTCTTATTAAGCATTATTATTAAATAAAGATTATTTCCTCTAATCACATGAGAATCTTTATTTCCCCCAAGTAATTGAAAATTGCAATGCCATGCTGCCATGTGG

TACAGCATGGGTTTGGGCTTGCTTTCTTCTTTTTTTTTTAACTTTTATTTTAGGTTTGGGAGTACCTGTGAAAGTTTGTTATATAGGTAAACTCGTGTCACCAGGGTTTGTTGTACAGATCA

TTTTGTCACCTAGGTACCAAGTACTCAACAATTATTTTTCCTGCTCCTCTGTCTCCTGTCACCCTCCACTCTCAAGTAGACTCCGGTGTCTGCTGTTCCATTCTTTGTGTCCATGTGTTCTC

ATAATTTAGTTCCCCACTTGTAAGTGAGAACATGCAGTATTTTCTAGTATTTGGTTTTTTGTTCCTGTGTTAATTTGCCCAGTATAATAGCCTCCAGCTCCATCCATGTTACTGCAAAGAA

CATGATCTCATTCTTTTTTATAGCTCCATGGTGTCTATATACCACATTTTCTTTATCTAAACTCTTATTGATGAGCATTGAGGTGGATTCTATGTCTTTGCTATTGTGCATATTGCTGCAAG

AACATTTGTGTGCATGTGTCTTTATGGTAGAATGATATATTTTCTTCTGGGTATATATGCAGTAATGCGATTGCTGGTTGGAATGGTAGTTCTGCTTTTATCTCTTTGAGGAATTGCCATG

CTGCTTTCCACAATAGTTGAACTAACTTACACTCCCACTAACAGTGTGTAAGTGTTTCCTTTTCTCCACAACCTGCCAGCATCTGTTATTTTTTGACATTTTAATAGTAGCCATTTTAACT

GGTATGAAATTATATTTCATTGTGGTTTTAATTTGCATTTCTCTAATGATCAGTGATATTGAGTTTGTTTTTTTTCACATGCTTGTTGGCTGCATGTATGTCTTCTTTTAAAAAGTGTCTGTT

CATGTACTTTGCCCACATTTTAATGGGGTTGTTTTTCTCTTGTAAATTTGTTTAAATTCCTTATAGGTGCTGGATTTTAGACATTTGTCAGACGCATAGTTTGCAAATAGTTTCTCCCATTC

TGTAGGTTGTCTGTTTATTTTGTTAATAGTTTCTTTTGCTATGCAGAAGCTCTTAATAAGTTTAATGAGATCCTGATATGTTAGGCTTTGTGTCCCCACCCAAATCTCATCTTGAATTATA

TCTCCATAATCACCACATGGAGAGACCAGGTGGAGGTAATTGAATCTGGGGGTGGTTTCACCCATGCTGTTCTTGTGATAGTGAATGAGTTCTCACGAGATCTAATGGTTTTATGAGG

GGCTCTTCCCAGCTTTGCCTGGTACTTCTCCTTCCTGCCGCTTTGTGAAAAAGGTGCATTGCGTCCCTTTCACCTTCTTCTATAATTGTAAGTTTCCTGAGGCCTTCCCAGCCATGCTGAA

CTTCAAGTCAATTAAACCTTTTTCTTTATAAATTACTCAGTCTCTGGTGGTTCTTTATAGCAGTGTGAAAATGGACTAATGAAGTTCCCATTTATGAATTTTTGCTTTTGTTGCAATTGCTT

TTGACATCTTAGTCATGAAATCCTTGCCTGTTCTAAGTACAGGACGGTATTGCCTAGGTTGTCTTCCAGGGTTTTTCTAATTTTGTGTTTTGCATTTAAGTGTTTAATCCATCTTGAGTTGA

TTTTTGTATATTGTGTAAGGAAGGGGTCCAGTTTCAATCTTTTGCATATGGCTAGTTAGTTATCCCAGTACCATTTATTGAAAAGACAGTCTTTTCCCCATCGCTCGTTTTTGTCAGTTTT

ATTGATGATCAGATAATCATAGCTGTGTGGCTTTATTTCTGGGTTCTTTATTCTGTTCTATTGGTTTATGTCCCTGTTTTTGTGCCAGTACCATGCTGTTTTGGTTAACATAGCCCTGTAGT

ATAGTTTGAGGTCAGATAGCCTGATGCTTCCAGCTTTGTTCTTTTTCTTAAGATTGCCTTGGCTATTTGGCCTCTTTTTTGGTTCCACATGAATTTTAAAACAGTTGTTTCTAGTTTTTGAA

GAATGTCATTGGTAGTTTGATAGAAATAGCATTTAATCTGTAAATTGATTTGTGCAGTATGGCCTTTTAATGATATTGATTCTTCCTATCCATGAGCATGATATGTTTTCCATTTTGTTTG

TATCCTCTCTGATTTCTTTGTGCAGTGTTTTGTAATTCTCAT TGTAGAGATTTTTCACCTCCCTGGTTAGTTGTATTTTACCCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT

TGCCTTCCTGATTTGACTGC CAGCTTGGTTACTGTTGGTTTATAGAAATGCTAGTGATTTTTGTACATTG ATTTTCTTTCTAAAACTTTGCTGAAGTTTTTTTTATTAGCAGAAGGAGCT

TTGGGGCTGAGACTATGGGGTTTTCTAGATATAGAATCATGTCAGCTTCAAATAGGGATAATTTTACTTCCTCTCTTCCTATTTGGATGCCCTTTATTTCTTTCTCTTGCCTGATTACTCTG

GCTGGGATTTCCTATGTTGAATAGGAGT CATGAGAGAGGGCATCAAATCTACACATATCAAATACTAACCTTGAATGTCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT


How much data make up the human genome l.jpg
How much data make up the human genome?

  • 3 pallets with 40 boxes per pallet x 5000 pages per box x 5000 bases per page = 3,000,000,000 bases!

  • To get accurate

    sequence requires

    6-fold coverage.

  • Now: Shred 18 pallets

    and reassemble.


Important features of human genome l.jpg
Important features of Human Genome

  • 20,000 – 25,000 protein-coding genes (2006)

  • Proteome (full set of proteins) more complex than those of invertebrates.

    • pre-existing components arranged into a richer architectures.

  • Hundreds of genes seem to come from horizontal transfer from bacteria.


Human races have similar genes l.jpg
Human races have similar genes

  • Genome sequence centers have sequenced significant portions of at least three races.

    • (DNA from 5 humans: 2 males, 3 females,

    • 2 Caucasians, one each of African, Asian, Hispanic)

  • Range of polymorphisms within a race can be much greater than the range of differences between any two individuals of different race.

  • Very few genes are race specific.




Questions remain about the human genome l.jpg
Questions Remain about the Human Genome

  • Difficult to precisely estimate number of genes at this time

    • Small genes are hard to identify

    • Some genes are rarely expressed and do not have normal codon usage patterns – thus hard to detect


Slide41 l.jpg

Annotation

Data Integration

Trends in Genomics

Data Acquisition

Computation

Now:

Next:


Impact of human genome on biomedical domain l.jpg

Impact of Human Genome on Biomedical domain


Applications to medicine and biology l.jpg
Applications to medicine and biology

  • Disease genes

    • human genomic sequence in public databases allows rapid identification of disease genes in silico

  • Drug targets

    • pharmaceutical industry has depended upon a limited set of drug targets to develop new therapies

    • now can find new target in silico

  • Basic biology


Genomic medicine l.jpg
Genomic Medicine

  • Anticipatory, not reactive

  • Predictive, preventive and personalized

  • Knowledge from genomics and derivative disciplines

  • Screening of individuals and populations

  • New analytical technologies and bioinformatics approaches


Genomic medicine challenges l.jpg
Genomic Medicine: Challenges

  • Requires change in culture

    • Practitioners

      • Preventive medicine approach

      • Less independence

    • Population

      • Lifestyle

      • Participation in large clinical trials


Genomic medicine challenges48 l.jpg
Genomic Medicine: Challenges

  • Requires change in medical data analysis

    • Driven by lab data

    • Huge amount of information

    • Pattern recognition


Genomic medicine challenges49 l.jpg
Genomic Medicine: Challenges

  • Requires change in diagnostic approach

    • Trust in computer algorithms

    • Evaluation of matrices and dynamic complex systems, not linear pathways


Genomic medicine consequences l.jpg
Genomic Medicine: Consequences

  • Drugs tailored to individual’s genetic make-up to improve efficacy and reduce side effects

  • Reduction of the burden of chronic illness

  • Decrease in the prevalence of common complex diseases


Genomic medicine consequences51 l.jpg
Genomic Medicine: Consequences

  • Increase in health disparities between those with and without access to genomic medicine

  • Technological spin-offs in other areas of medicine including infectious diseases


Personalized medicine l.jpg
Personalized Medicine

  • Personalized medicine may be in our future (August 25, 2006 , Baltimore Sun):

    • An example of personalized medicine is Herceptin, a breast cancer treatment that blocks cancer-promoting proteins. Herceptin can be effective when given to the 20 percent to 30 percent of women whose tumors produce high levels of targeted proteins, and those women can be identified by genetic testing before treatment.

    • But most breast cancer patients would not benefit from Herceptin therapy, so it is not routinely prescribed.

    • The smaller the potential number of beneficiaries, the more expensive it is to develop drugs and procedures for them, which can mean that there are therapies out there ready to help patients that cannot reach them.


Personalized medicine continued l.jpg
Personalized Medicine (continued)

  • Another example of personalized medicine, according to a recent report in the Proceedings of the National Academy of Sciences, is bucindolol, a drug once thought to be useful in treating heart failure that initially failed but now may find new life.

    While bucindolol flunked clinical trials with heart failure patients, two academic researchers found that it helps a subset of patients who share a particular genetic variation.


Gene therapy l.jpg
Gene Therapy

Nine of eleven French children with severe combined immune deficiency (SCID or the “bubble boy syndrome”) were treated with gene therapy. Their signs and symptoms resolved, and they were able to have normal lives. Two of the nine developed leukemia.


Gene therapy55 l.jpg
Gene Therapy

  • Patient with mild ornithine trans-carbamylase (OTC) deficiency died in adenoviral gene therapy clinical trial.

  • Hemophilia trial with adeno-associated virus vector

    • Evidence of gene in participant’s semen

    • Gene not found in sperm cells (chance of fathering a genetically altered child ), but Recombinant-DNA Advisory Committee (RAC) could not rule out possibility.


Gene therapy56 l.jpg
Gene Therapy

  • Significant concerns have been raised about gene therapy.

  • Alternative strategies are being explored.

    • Protein-replacement therapy

    • Small molecule treatments

    • Therapeutic stem cell interventions


New hope l.jpg
New Hope

  • Biotech's bright hope: Scientists are newly optimistic that gene therapy will help fight the most serious diseases. ( LA Times, August 28, 2006 )

    • To the shrill whine of a high-speed drill, neurosurgeon Dr. Paul Larson makes two nickel-sized holes in Shirley Cooper's skull. Guided by a computerized MRI map, he plunges a long, thin needle through one hole and deep into the brain — and empties the syringe.

    • The experimental treatment Cooper is undergoing is intended to reverse the process of Parkinson’s disease. Parkinson's destroys cells in the brain that make dopamine. scientists have engineered a harmless, stripped-down virus to carry a gene that will boost brain dopamine through the enzyme it encodes: amino acid decarboxylase, or AADC.


New hope continued l.jpg
New Hope (continued)

  • Gene therapy is making a comeback after a series of serious setbacks that threatened to permanently derail human tests. In recent years, European scientists have cured more than two dozen patients suffering from three rare, and in some cases lethal, immune disorders.

  • Spurred by this success, plus the development of new techniques aimed at making the therapy safer and more effective, more than 300 gene therapy trials, including the one for Parkinson's at UC San Francisco, are underway in the U.S. and abroad.


Cloning humans l.jpg
Cloning Humans

On December 27, 2002, at a press conference at a Holiday Inn in Hollywood, FL, the birth of a healthy 7 lb baby girl on December 26, 2002, was announced. Nicknamed Eve, the baby was born at an undisclosed location outside of the USA to a 31 year old American woman from whom Eve was cloned.


Cloning humans60 l.jpg
Cloning Humans

The announcement of the birth of the first cloned human was made by Dr. Brigitte Boisselier, a chemist and the head of a private company, Clonaid. She said that four other women were pregnant with Clonaid-created clones, and 20 more women were scheduled for implantation of cloned fetuses in January, 2003.


Cloning humans61 l.jpg
Cloning Humans

Clonaid was founded by Claude Vorilhon, a former race car driver. He also formed the Raëlians, a worldwide “atheistic religion” of 55,000 followers. They believe that memories and consciousness from an individual to their clone, and therefore cloning is the path to eternal life.


Cloning humans62 l.jpg
Cloning Humans

  • Two additional groups have claimed that they they are close to creating human clones.

  • No other primate has been cloned.

  • Health risks well documented for other mammals.

    • Clones

    • Mothers


Cloning humans63 l.jpg
Cloning Humans

  • The Institute of Medicine (IOM) committee report :

    • Reproductive cloning should be banned with criminal penalties because of risks.

    • Therapeutic cloning should be encouraged.

    • The issue of cloning and the data from other mammals should be re-evaluated in 3-5 years.


Biology s industrial revolution l.jpg

Traditional biology

Individual investigators

Hypothesis-driven

Data hard to get

Experimental design

Iterative solution

Genomics

Large, interdisciplinary teams

Data-driven

Abundant data

Experimental design

Experimental protocol

Data management

Data analysis

Biology’s Industrial Revolution


The paradigm shift in biology l.jpg
The Paradigm Shift in Biology

The new paradigm, now emerging, is that all the ‘genes’ will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical. An individual scientist will begin with a theoretical conjecture, only then turning to experiment to follow or test that hypothesis.

- Walter Gilbert. Toward a paradigm shift in biology.

Nature, 349:99 (1991).


Sequence based biology l.jpg
Sequence-based Biology

  • Genome sequences provide the basis for “sequence-based biology”

    • Description of every gene and gene product

    • Assignment of function

    • Insights into non-coding regulatory regions

    • Comparative genomics

    • Variation within a species


Credits l.jpg
Credits

D. Whitman, P. Hubbard, C. Bult, D. Church,

M. Zorn, C. Stoeckert, M. Gerstein,

P. Green, D. Kulp, T. O’Brien


ad