My master s work
This presentation is the property of its rightful owner.
Sponsored Links
1 / 73

My Master ’ s Work PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

My Master ’ s Work. Richa Tiwari. Outline of the talk. Analysis of Phylogeny Tree Evaluation Approaches (Project done in CS641). Proteomics and 2-D Gel Electrophoresis (Study done for CS)

Download Presentation

My Master ’ s Work

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

My master s work

My Master’s Work

Richa Tiwari

Outline of the talk

Outline of the talk

  • Analysis of Phylogeny Tree Evaluation Approaches (Project done in CS641).

  • Proteomics and 2-D Gel Electrophoresis (Study done for CS)

  • Coexpression analysis of dimerization between bZIP proteins in groups C, S1 and S2 in Arabidopsis Thaliana, under the conditions of differential light and CO2 levels (Project done for BST676).

Analysis of phylogeny tree evaluation approaches

Analysis of Phylogeny Tree Evaluation Approaches

Phylogenetic analysis

Phylogenetic Analysis

  • Alignment of the sequences

  • Determining the presence of relationship between sequences

  • Decision of most appropriate tree building algorithm

  • Scrutinize the tree to determine level of confidence

My master s work

Algorithmic Method

  • Defines an algorithm that leads to the determination of a tree.

    Criteria Based Method

  • Defines a criterion for comparing different phylogenies and thereforephylogenies can be ranked, and comparison possible.

Maximum parsimony method

Maximum Parsimony Method

  • “Most parsimonious tree will explain the observed character distribution with a tree that have the minimum tree length.”

  • Tree selection criterion - Minimum tree length

    (Fewest character state transformation)

Maximum likelihood ml

Maximum Likelihood (ML)

  • ML evaluates the probability that the chosen evolutionary model will have generated the observed sequences.

  • Evolutionary Model: Accounts for the changes in sequences.

  • Phylogenies are then inferred by finding those trees that yield the highest likelihood.

Distance based method

Distance Based Method

  • Distance-based methods attempts to find the distance that is the total changes between the two taxons from the point where they last shared an ancestor.

  • It is a cluster based method.

Software used

Software used….


To compare the three phylogeny methods. Programs used from the package are:

  • Maximum Parsimony: DNAPARS

  • Maximum Likelihood DNAML

  • Distance-based DNADIST and Neighbor

  • Tree constructed using : DRAWGRAM

  • Consensus tree constructed using: CONSENSUS

Using sample data

Using Sample data…

Maximum parsimony

Maximum likelihood

Distance Based


My master s work

Consensus tree for given example…



+--1.0-| +------Chimp

| |

+------| +-------------Orang

| |

| +--------------------Rhesus





| +------Orang


| | +------Rhesus

| +--1.0-|

| +------Gorilla



Parsimony Method

Maximum Likelihood



| | +------Chimp

+------| +--1.0-|

| | +------Human

| |

| +--------------------Rhesus



Distance Based/Neighbor joining



  • Reliability of branch length estimates

    NJ and ML> MP

  • Computational speed (n>500)

    NJ/DNADIST: 0.005 seconds

    DNAPARS: 0.5 seconds

    DNAML: 230.0 seconds



  • Our experiments and the results obtained indicate that the Distance Based method is better than the other two methods in terms of Fastness, Simplicity and good performance for high number of taxa.

  • Also we can say that if you have a fast computer and large dataset Maximum likelihood method is better than Maximum parsimony.

Proteomics and 2 d gel electrophoresis

Proteomics and 2-D gel Electrophoresis



  • The entire set of proteins expressed by the genome in a cell, organ or organism is referred to as the proteome.

  • Proteomics : Methods that discover and quantify proteins and their biochemical changes.

Application of proteomics

Application of Proteomics

  • Protein Mining

  • Network Mapping

  • Mapping Protein Modifications

Proteomics analysis

Proteomics Analysis


2 d gel electrophoresis

2-D Gel Electrophoresis

The horizontal position tells us about the charge of a protein, whereas the intensity of the gel spot tells us about the amount of that protein in the system.


1. Prepare protein sample in solution

2. Separate proteins (in each dimension)

I. Based on pH

Using isoelectric focusing (IEF)

Using immobilized pH gradient (IPG) strips

II. Based on molecular weight (size)

Using gel electrophoresis

3. Stain proteins to enable visualization.

Introduction to the project

Introduction to the project

  • This project focuses on 2D gel electrophoretic separation of proteins.

  • We analyzed few random spots from the 2D gels of rat mammary tissue.

  • Statistical methods to find the variance in pI of the same protein in different gels.

  • Analyzed the reasons for these differences.

  • Inferred the relationship between the experimental values and the predicted values.

Images of the gels used in the project

Images of the gels used in the project.

One of the gels with protein spots

One of the gels with Protein Spots

My master s work

The Gels we used were from an already done experiment.28 Random protein spots were selected based on the their intensity from each of the three gels.

Mass Spectrometry

Differentially expressed proteins identified by image analysis were excised from 2D gels and trypsin digested. The resulting peptide fragments were analyzed on a MALDI mass spectrometer (MS). The MALDI spectra displays a “peptide fingerprint” of the protein usingcorresponding peptide masses.

Maldi tof ms


My master s work

Proteins were identified by entering the masses (ions from MALDI spectrum) of the peptides into a peptide mapping database. Some examples of such protein search engine are-

  • Mascot - very popular and also used in this project

  • Sequest

  • Aldente

  • ProteinLynx

  • Phenyx

Image of a search data base

Image of a search data base



  • We tabulated the result obtained from the database internet search and the one we obtained from the experiment.

  • We observed that the pI values as well as the molecular weight were not same in all gels for same protein.

  • The pI values of the three gels were quite similar but they were different from the predicted pI values.

My master s work

  • In a 2D gel the position of protein spot can change due to various reasons and because of which the molecular weight and pI values may also differ.

Graphical representation of pi values of three gels

Graphical representation of pI values of three gels

Graph showing the variance among the predicted pi and observed pi

Graph showing the variance among the predicted pI and observed pI



  • We saw that the difference between the pI values of the three gels that is the experimental values are not very different from each other.

  • So we can interpret that the difference due to non biological reason is very less in the experiment.

  • There were few protein spots for which internet search revealed the same result as same protein name. But our experiment gave different results which can be because of different group (like phosphate or sulphate) getting attached to it. There can be other reasons for it too.

My master s work

  • Average deviations between the three observed proteins and the predicted pI values were calculated as –

    {(pI (gel 12_5)- pred. pI) + (pI (gel 12_5)- pred. pI) + (pI (gel 12_5)- pred. pI)} / 3

    This gave the results shown in the next slide. We obtained positive as well as negative values for the deviations.

Average deviations between the three gels and the predicted pi

Average deviations between the three gels and the predicted pI

My master s work

  • We can interpret that the proteins were modified more by negatively charged group such that there pI values decreased.

  • The addition of one phosphate groups to serine, threonine, and tyrosine residues typically decreases their isoelectric points by 0.1 pH unit.

Regression results

Regression results

  • A statistical analysis test was performed to determine which of the three gels were closest to the predicted pI values. That is in which of the three gels had the proteins being least modified.

  • The test was Clibration test. We prepared a regression model for each gel. The inverse regression equation used was –

    Predicted pI = {Observed pI from Gel – Intercept } slope

Predicted pi values from the calibration test and internet database

Predicted pI values from the Calibration test and internet database

My master s work

  • The result we obtained showed us that all the three gels predicted almost same pI values and they were quite away from the original predicted pI values.

  • All these similarities between the three gels show us that the difference between the pI values of proteins between the predicted and the experimented values is not very much because of non biological factors, but because of chemical modifications in the proteins.

My master s work

Coexpression analysis of dimerization between bZIP proteins in groups C, S1 and S2 in Arabidopsis Thaliana, under the conditions of differential light and CO2 levels.

Introduction transcription factor

IntroductionTranscription factor

  • Transcription factor are proteins involved in the regulation of gene expression, that bind to promoter region upstream of genes.

  • They are composed of two essential functional regions:

    DNA binding domain – It binds to DNA.

    Activator Domain – It interacts with other regulatory proteins there by affecting the efficiency of DNA binding.

Bzip proteins

bZIP proteins

  • bZIP proteins are a class of transcription factor which has leucine zipper motif consisting of a periodic repetition of a leucine residue at every seventh position forming an alpha-helical confirmation.

  • The segment that comprises the basic region and the periodic array of leucine residues is referred to as ‘basic-region leucine zipper’ or bZIP motif.

Some facts

Some facts

  • There are 792 bZIP proteins recorded in nonredundant database.

  • The no of bZIP proteins in the cell of selected organisms are as follows

    yeast – 16

    fruitfly – 110

    plant (Arabidopsis thaliana) – 75

    Human - 114



  • The Arabidopsis genome sequence contains 75 distinct members of the bZIP family, of which ~50 of them are not well studied.

  • Using common domains the bZIP family can be subdivided into 10 groups: Groups A - S.

C s protein interaction

C & S protein interaction

  • Elhert et al measured interactions between C and S proteins.

  • C and S1 heterodimerized

  • Two S2 proteins dimerized.

Effect of light co2 on c s proteins

Effect of Light & CO2 on C & S proteins

  • Carbohydrate signaling

    Increase of carbohydrate partitioning in elevated CO2, and a decrease in low light.

  • Seed development

    Photosensory system detects the quality, quantity, direction and duration of light. Controls developmental pattern.

  • Stress

    Light dependent generation of active oxygen species is a type of stress called photo oxidative stress.

Experiment selection criteria

Experiment Selection Criteria

  • a) Chose C and S bZIP proteins

    • Coexpression Engine:

  • b) Selected tissue and array type

  • c) Chose specific experiment

A chose c and s bzip proteins

a) Chose C and S bZIP proteins

B selected tissue and array type

b) Selected tissue and array type

C chose specific experiment nasc experiments

c) Chose specific experimentNASC Experiments



  • Biologically feasible comparisons due to similar:

    • Tissue types

    • Experiment conditions

  • Statistical:

    • Measurement protocol

  • The tool used

    The tool used

    • Co-expression Analysis Tool, version 2.0 developed at the Section on Statistical Genetics, UAB

      mainly built to analyze the co-expression in Arabidopsis plant.

    • NASC Experiments to study affymetrix gene chip profiling of light and CO2 effect in leaf development in Arabidopsis used.

    My master s work

    • Uses the database built from Nottingham Arabidopsis Stock Center (NASC) AffyWatch Service.

    • Version 2 used in this project contains total of 566 microarray chips out of which 486 ATH1 micro array chips were used.

    Nasc experiments used

    NASC Experiments used

    • 4 experiments conducted to examine the effect of developing leaf insertions under varying conditions of light and CO2.

    • The sampling was done at time interval of 0th, 2nd, 4th, 12th, 24th, 48th and 96th hour using a batch of 24 plants.

    • Four replicates were produced for each of the seven time points per experiment.

    Working of the tool

    Working of the tool

    • Linear regression analysis is done on the probe sets.

    • Result of regression gives three important values- slope parameter (indicating the direction of co-expression), p-value (stating the confidence in the correlation) and R squared values (strength of correlation).



    • 4 genes of C group, 5 genes of S1 group and 3 genes of S2 group were studied in the project.

    • We submit the AGI IDs, the tissue type (here leaf) and the experiment number (in our case 156, 157 158 and 159) in the tool.

    • Our genes of interest are regressed on all the 22,810 ATH1 probe sets and a p-value, R squared value and slope parameter is obtained.

    My master s work

    • Those genes were subsequently sorted according to the R squared value and p-value and ranked such that –

      Higher the R squared value, higher is the rank.

    • An arbitrary cut-off 15% of the top ranked genes were identified as highly co-expressed.



    • Genes coding for dimerizing proteins should be coexpressed at the same time.

    • If genes in group C and S1 lead to heterodimerization then they should be coexpressed at the same time.

    My master s work

    Table 2: Mapping information between AtbZIP : AGI : ATH Probeset : AtbZIP Group Ids

    My master s work

    Table 3: Regression estimates between Group C AtbZIIP63 (245925_at)

    and Probes in Group S1, C and S2.

    My master s work

    Table 4: Regression estimates between Group C AtbZIIP25 (251848_at)

    and Probes in Group S1, C and S2.

    My master s work

    Regression estimates between Group C AtbZIIP9 (246962_s_at)

    and Probes in Group S1, C and S2.



    • bZIP1(Group S1) coexpresses well with bZIP63 (S1) under conditions of Ambient Co2 and low light but the same coexpression interaction is weak under conditions of Elevated Co2 and Ambient Light.

    • Also, very minimal interaction was found between genes of Group C (bZIP25, bZIP10, bZIP9, and bZIP63) and bZIP9 (Group C



    • This bZIP study was a good litmus test for the SSG Coexpression Tool.

    • Results presented in this study provide evidence that a good if not significant number of AtbZIP proteins interacting as heterodimers are co-regulating under varying conditions of stress.

    • This study shows evidence that coexpression patterns in genes can be studied by pooling publicly available microarray data and that the use of simple linear regression procedure is feasible.



    • Varying trends in the coexpression proposes some theories:

      • Different genes are expressed in diff tissues. Is study on leaf good enough to support our hypothesis?

      • Time-course data is valuable and should be accounted for in the analysis. However, this kind of analysis requires more observation recorded at different timepoints.

    • Linear regression is good but will a robust time-series based approach be appropriate in our study?

  • Login