1 / 60

bud mishra - PowerPoint PPT Presentation

  • Updated On :

(I). Computational Systems Biology of Cancer:. Professor of Computer Science, Mathematics and Cell Biology ¦ Courant Institute, NYU School of Medicine, Tata Institute of Fundamental Research, and Mt. Sinai School of Medicine. Bud Mishra. War on Cancer.

Related searches for bud mishra

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'bud mishra' - Pat_Xavi

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg


Computational Systems Biology of Cancer:

Bud mishra l.jpg

Professor of Computer Science, Mathematics and Cell Biology


Courant Institute, NYU School of Medicine, Tata Institute of Fundamental Research, and Mt. Sinai School of Medicine

Bud Mishra

War on cancer l.jpg
War on Cancer

  • “Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know.”

    • US Secretary of Defense, Mr. Donald Rumsfeld, Quoted completely out of context.

Introduction cancer and genomics l.jpg

Introduction: Cancer and Genomics:

What we know & what we do not

“Cancer is a disease of the genome.”

Outline l.jpg

  • Genomics

  • Genome Modification & Repair

  • Segmental Duplications Models

Genomics l.jpg

  • Genome:

    • Hereditary information of an organism is encoded in its DNA and enclosed in a cell (unless it is a virus). All the information contained in the DNA of a single organism is its genome.

  • DNA molecule can be thought of as a very long sequence of nucleotides or bases:

    S = {A, T, C, G}

Complementarity l.jpg

  • DNA is a double-stranded polymer and should be thought of as a pair of sequences over S.

  • However, there is a relation of complementarity between the two sequences:

    • A , T, C , G

Dna structure l.jpg
DNA Structure.

  • The four nitrogenous bases of DNA are arranged along the sugar- phosphate backbone in a particular order (the DNA sequence), encoding all genetic instructions for an organism.

  • Adenine (A) pairs with thymine (T), while cytosine (C) pairs with guanine (G).

  • The two DNA strands are held together by weak bonds between the bases.

Structure and components l.jpg
Structure and Components

  • Complementary base pairs

    (A-T and C-G)

    • Cytosine and thymine are smaller (lighter) molecules, called pyrimidines

    • Guanine and adenine are bigger (bulkier) molecules, called purines.

    • Adenine and thymine allow only for double hydrogen bonding, while cytosine and guanine allow for triple hydrogen bonding.

Inert rigid l.jpg
Inert & Rigid

  • Thus the chemical (hydrogen bonding) and the mechanical (purine to pyrimidine) constraints on the pairing lead to the complementarity and makes the double stranded DNA both chemically inert and mechanically quite rigid and stable.

The central dogma l.jpg






The Central Dogma

  • The central dogma(due to Francis Crick in 1958) states that these information flows are all unidirectional:

    “The central dogma states that once `information' has passed into protein it cannot get out again.”

The central dogma13 l.jpg






The Central Dogma

“…The transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein, may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.”

The new synthesis l.jpg




Genome Evolution






Part-lists, Annotation, Ontologies

The New Synthesis

Cancer initiation and progression l.jpg
Cancer Initiation and Progression

Mutations, Translocations, Amplifications, Deletions

Epigenomics (Hyper & Hypo-Methylation)

Alternate Splicing

Cancer Initiation and Progression

Proliferation, Motility, Immortality, Metastasis, Signaling

Multi step nature of cancer l.jpg
Multi-step Nature of Cancer:

  • Cancer is a stepwise process, typically requiring accumulation of mutations in a number of genes.

  • ~6-7 independent mutations typically occur over several decades:

    • Conversion of proto-oncogenes to oncogenes

    • Inactivation of tumor suppressor gene

Amplifications deletions l.jpg
Amplifications & Deletions

Mutation in a TSG


Conversion of a


Deletion of a TSG

Deletion of a TSG

The cancer genome atlas l.jpg
The Cancer Genome Atlas

  • Obtain a comprehensive description of the genetic basis of human cancer.

    • Identify and characterize all the sites of genomic alteration associated at significant frequency with all major types of cancers.

The cancer genome atlas20 l.jpg
The Cancer Genome Atlas

  • Increase the effectiveness of research to understand

    • tumor initiation and progression,

    • susceptibility to carcinogensis,

    • development of cancer therapeutics,

    • approaches for early detection of tumors &

    • the design of clinical trials.

Specific goals l.jpg
Specific Goals

  • Identify all genomic alterations significantly associated with all major cancer types.

  • Such knowledge will propel work by thousands of investigators in cancer biology, epidemiology, diagnostics and therapeutics.

To achieve this goal l.jpg

Create large collection of appropriate, clinically annotated samples from all major types of cancer; and

Characterize each sample in terms of:

All regions of genomic loss or amplification,

All mutations in the coding regions of all human genes,

All chromosomal rearrangements,

All regions of aberrant methylation, and

Complete gene expression profile, as well as other appropriate technologies.

To Achieve this goal …

Biomedical rationale l.jpg
Biomedical Rationale samples from all major types of cancer; and

  • Cancer is a heterogeneous collection of heterogeneous diseases.

    • For example, prostate cancer can be an indolent disease remaining dormant throughout life or an aggressive disease leading to death.

    • However, we have no clear understanding of why such tumors differ.

Biomedical rationale24 l.jpg
Biomedical Rationale samples from all major types of cancer; and

  • Cancer is fundamentally a disease of genomic alteration.

    • Cancer cells typically carry many genomic alterations that confer on tumors their distinctive abilities (such as the capacity to proliferate and metastasize, ignoring the normal signals that block cellular growth and migration) and liabilities (such as unique dependence on certain cellular pathways, which potentially render them sensitive to certain treatments that spare normal cells).

History l.jpg
History samples from all major types of cancer; and

  • 1960s

    • The genetic basis of cancer was clear from cytogenetic studies that showed consistent translocations associated with specific cancers (notably the so-called Philadelphia chromosome in chronic myelogenous leukemia).

  • 1970s

    • Recognize specific cancer-causing mutations through recombinant DNA revolution of the 1970s.

    • The identification of the first vertebrate and human oncogenes and the first tumor suppressor genes,

    • These discoveries have elucidated the cellular pathways governing processes such as cell-cycle progression, cell-death control, signal transduction, cell migration, protein translation, protein degradation and transcription.

  • For no human cancer do we have a comprehensive understanding of the events required.

Scientific foundation for a human cancer genome project l.jpg

Gene resequencing samples from all major types of cancer; and.

Specific gene classes (such as kinases and phosphatases) in particular cancer types.

Epigenetic changes.

Loss of function of tumor suppressor genes by epigenetic modification of the genome — such as DNA methylation and histone modification.

Genomic loss and amplification.

Consistent association with genomic loss or amplification in many specific regions, indicating that these regions harbor key cancer associated genes

Chromosome rearrangements.

Activate kinase pathways through fusion proteins or inactivating differentiation programs through gene disruption.

Hematological malignancies: a single stereotypical translocation in some diseases (such as CML) and as many as 20 important translocations in others (such as AML).

Adult solid tumors have not been as well characterized, in part owing to technical hurdles.

Scientific Foundation for a Human Cancer Genome Project

Human genome structure l.jpg

Human Genome Structure samples from all major types of cancer; and

Slide28 l.jpg
EBD samples from all major types of cancer; and

  • J.B.S. Haldane (1932):

    • “A redundant duplicate of a gene may acquire divergent mutations and eventually emerge as a new gene.”

  • Susumu Ohno (1970):

    • “Natural selection merely modified, while redundancy created.”

Evolution by duplication l.jpg

UTP samples from all major types of cancer; and





Living cells


Functional modules




Motifs in cellular etworks

Metabolic pathways

Regulatory motifs

Interaction clusters

Non-random distribution in Genomic data


Short words frequency

Protein family size

Long range orrelation

Genome Evolution

Evolution by Duplication

Human condition l.jpg
Human Condition samples from all major types of cancer; and

Mer scape l.jpg
Mer-scape… samples from all major types of cancer; and

  • Overlapping words of different sizes are analyzed for their frequencies along the whole human genome

    • Red: 24-mers,

    • Green: 21-mers

    • Blue:18 mers

    • Gray:15 mers

  • To the very left is a ubiquitous human transposon Alu. The high frequency is indicative of its repetitive nature.

  • To the very right is the beginning of a gene. The low frequency is indicative of its uniqueness in the whole genome.

Doublet repeats l.jpg
Doublet Repeats samples from all major types of cancer; and

  • Serendipitous discovery of a new uncataloged class of short duplicate sequences; doublet repeats.

    • almost always < 100 bp

  • (Top) . The distance between the two loci of a doublet is plotted versus the chromosomal position of the first locus.

  • (Bottom) : Distribution of doublets (black) and segmental duplications (red) across human chromosome 2

Segmental duplications l.jpg
Segmental Duplications samples from all major types of cancer; and

  • 3.5% ~ 5% of the human genome is found to contain

    • segmental duplications, with length > 5 or 1kb, identity > 90%.

  • These duplications are estimated to have emerged about 40Mya under neutral assumption.

  • The duplications are mostly interspersed (non-tandem), and happen both inter- and intra-chromosomally.


From [Bailey, et al. 2002]

The model l.jpg

f samples from all major types of cancer; and- -

f - -

deletion or mutation


f + -

f + -

Duplication by recombination between other repeats or other mechanisms

deletion or mutation


f ++

f ++

Duplication by recombination between repeats

Mutation accumulation in the duplicated sequences

The Model

The mathematical model l.jpg

Time after duplication samples from all major types of cancer; and









f - -












f + -













f ++






The Mathematical Model

0 ≤ d < ε

ε ≤ d < 2ε

(k-1)ε ≤ d < kε

h1: proportion of duplications by repeat recombination;

h1++: proportion of duplications by recombination of the specific repeat;

h1- -: proportion of duplications by recombination of other repeats;

h0: proportion of duplications by other repeat-unrelated mechanism;

h0++: proportion of h0 with common specific repeat in the flanking regions;

h0+-: proportion of h0 with no common specific repeat in the flanking regions;

h0- -: proportion of h0 with no specific repeat in the flanking regions;

α: mutation rate in duplicated sequences;

β: insertion rate of the specific repeat;

γ: mutation rate in the specific repeat;

d: divergence level of duplications;

ε: divergence interval of duplications.

Model fitting l.jpg

Alu samples from all major types of cancer; and


f - -

f - -

f + -

f + -

f ++

f ++



Model Fitting

  • The model parameters (αAlu, βAlu, γAlu, αL1, βL1, γL1) are estimated from the reported mutation and insertion rates in the literature.

  • The relative strengths of the alternative hypotheses can be estimated by model fitting to the real data.

  • h1Alu ≈ 0.76; h1++Alu≈0.3; h1L1 ≈ 0.76; h1++L1≈0.35.

Polya s urn l.jpg

Random drawn samples from all major types of cancer; and

Put back


Polya’s Urn

Repetitive random eccentric god l.jpg

F1 (decide an initial position) samples from all major types of cancer; and

F3 (decide copy number)

F2 (decide selected length)

k copies

F4 (decide insertion positions)

Repetitive Random Eccentric GOD

  • Genome Organizing Devices (GOD)

  • Polya’s Urn Model:

    • F’s: functions deciding probability distributions

Dna polymerase stuttering replication slippage l.jpg

Insertion (duplication): samples from all major types of cancer; and


DNA polymerase stuttering(replication slippage)

normal replication

DNA polymerase

normal replication

polymerase pausing and dissociation

polymerase pausing and dissociation

3’ realignment and polymerase reloading

3’ realignment and polymerase reloading

Transposons causes deletions and duplications l.jpg
Transposons samples from all major types of cancer; and~ causes: deletions and duplications

  • Transposon actions in

  • genomic DNA:

Donor DNA

  • Duplication:


DNA intermediates

Target DNA

Transposase cuts in target DNA

  • Deletion:




Transposon looping out

Transposon inserted

Transposon deleted

DNA is repaired-resulting in a duplication of the transposon and target site

Dna mismatch repair mechanism prevents duplications and deletions l.jpg

MSH6 samples from all major types of cancer; and











DNA mismatch repair mechanism~ prevents duplications and deletions

Mismatch recognition on daughter strand

Degradation of the mismatched daughter strand in the a-loop

DNA a-loop formation by translocation through the proteins

Refilling the gap by DNA polymerase

Corrected daughter strand

DNA polymerase

Graph model l.jpg
Graph Model samples from all major types of cancer; and

A. Deletion: With probability p0

  • Graph description

  • G = {V, E}, G is a directed multi-graph;

  • V  {Vi, all n-mer’s, i = 1…4n}, ;

  • E  { (Vi , Vj ), when Vi represents the n-mer immdiately upstream of the n-mer represented by Vj in the genomic sequence};

  • ki = incoming (or outgoing) degree of node i (Vi) = copy number of the n-mer represented by Vi;

  • During the graph evolution, at each iteration, one of the following happens: deletion (with probability p0), duplication (with probability p1) or substitution (with probability q). p0 + p1 + q = 1.

  • For an arbitrary node Vi , the probabilities of one of the above events happens is as follows:

B. Duplication:With probability p1

C. Substitution: With probability q

Model fitting43 l.jpg

Real distribution from genome analysis samples from all major types of cancer; and

Expected distribution from random sequences

Model fitting results

Model Fitting

Model fitted parameter l.jpg
Model Fitted Parameter samples from all major types of cancer; and

  • The substitution rate, q, increases with the sizes of mer’s.

  • The ratio between duplication and deletion rate, p1/p0, increases with sizes of mer’s.

  • The substitution rate, q, tends to decrease when the genome sizes are larger. Especially, q is much smaller in eukaryotic genomes than in prokaryotic genomes.

J b s haldane l.jpg
J.B.S. Haldane samples from all major types of cancer; and

  • “If I were compelled to give my own appreciation of the evolutionary process…, I should say this: In the first place it is very beautiful. In that beauty, there is an element of tragedy…In an evolutionary line rising from simplicity to complexity, then often falling back to an apparently primitive condition before its end, we perceive an artistic unity …

  • “To me at least the beauty of evolution is far more striking than its purpose.”

    • J.B.S. Haldane, The Causes of Evolution. 1932.

Human cancer genome l.jpg

Human Cancer Genome samples from all major types of cancer; and

Cancer l.jpg

Normal epithelial mucosa samples from all major types of cancer; and

Neoplastic polyp


A challenge l.jpg
A Challenge samples from all major types of cancer; and

  • “At present, description of a recently diagnosed tumor in terms of its underlying genetic lesions remains a distant prospect. Nonetheless, we look ahead 10 or 20 years to the time when the diagnosis of all somatically acquired lesions present in a tumor cell genome will become a routine procedure.”

    • Douglas Hanahan and Robert Weinberg

      • Cell, Vol. 100, 57-70, 7 Jan 2000

Karyotyping l.jpg
Karyotyping samples from all major types of cancer; and

Cgh comparative genomic hybridization l.jpg
CGH: samples from all major types of cancer; andComparative Genomic Hybridization.

  • Equal amounts of biotin-labeled tumor DNA and digoxigenin-labeled normal reference DNA are hybridized to normal metaphase chromosomes

  • The tumor DNA is visualized with fluorescein and the normal DNA with rhodamine

  • The signal intensities of the different fluorochromes are quantitated along the single chromosomes

  • The over-and underrepresented DNA segments are quantified by computation of tumor/normal ratio images and average ratio profiles



Cgh comparative genomic hybridization51 l.jpg
CGH: Comparative Genomic Hybridization. samples from all major types of cancer; and

Microarray analysis of cancer genome l.jpg

Normal DNA samples from all major types of cancer; and

Tumor DNA

Normal LCR

Tumor LCR



Microarray Analysis of Cancer Genome

  • Representations are reproducible samplings of DNA populations in which the resulting DNA has a new format and reduced complexity.

    • We array probes derived from low complexity representations of the normal genome

    • We measure differences in gene copy number between samples ratiometrically

    • Since representations have a lower nucleotide complexity than total genomic DNA, we obtain a stronger specific hybridization signal relative to non-specific and noise

Copy number fluctuation l.jpg
Copy Number Fluctuation samples from all major types of cancer; and










Measuring gene copy number differences between complex genomes l.jpg

samples from all major types of cancer; and

Genomic DNA patient samples

Hybridize to 500,000 features microarray

Statsitical Analysis

Measuring gene copy number differences between complex genomes

  • Compare the genomes of diseased and normal samples

  • Error Control:

    • The use of representations augmenting microarrays

    • Representations reproducibly sample the genome thereby reducing its complexity. This increases the signal-to-noise ratio and improves sensitivity

    • Statistical Modeling the sources of Noise

    • Bayesian Analysis

Nattering nabobs of negativism l.jpg
Nattering Nabobs of Negativism samples from all major types of cancer; and

  • Some scientists are concerned about the cost and the possibility that a project of this scale could take money away from smaller ones…

    • Craig Venter, who led a private project to determine the human DNA blueprint in competition with the human genome project, said it would make more sense to look at specific families of genes known to be involved in cancer.

    • Lee Hood, president of the Institute for Systems Biology, has called the premise of the Cancer Genome Project “naïve,” suggesting that signal-to-noise issues its researchers are likely to encounter will be “absolutely enormous.”

Challenges l.jpg
Challenges samples from all major types of cancer; and

  • ArrayCGH data is noisy!

    • Better computational biology algorithms

    • Better statistical modeling…In some technologies, the noise is systematic (e.g., affy SNP-chips)

    • Better bio-technologies (GRIN: Genomics, Robotics, Informatics, Nanotechnology)

  • No efficient technology for epigenomics, translocations or de novo mutations.

  • Algorithms for multi-locus association studies

  • Systems view of cancer by integrating data from multiple sources:

    • Genomic, Epigenomic, Transcriptomic, Metabolomic & Proteomic.

    • Regulatory, Metabolic and Signaling pathways

Mishra s mystical 3m s l.jpg
Mishra’s Mystical 3M’s samples from all major types of cancer; and

  • Rapid and accurate solutions

    • Bioinformatic, statistical, systems, and computational approaches.

    • Approaches that are scalable, agnostic to technologies, and widely applicable

  • Promises, challenges and obstacles—




Discussions l.jpg

Discussions samples from all major types of cancer; and


Answer to cancer l.jpg
Answer to Cancer samples from all major types of cancer; and

  • “If I know the answer I'll tell you the answer, and if I don't, I'll just respond, cleverly.”

    • US Secretary of Defense, Mr. Donald Rumsfeld.

To be continued l.jpg

To be continued… samples from all major types of cancer; and