my master s work
Skip this Video
Download Presentation
My Master ’ s Work

Loading in 2 Seconds...

play fullscreen
1 / 73

My Master s Work - PowerPoint PPT Presentation

  • Uploaded on

My Master ’ s Work. Richa Tiwari. Outline of the talk. Analysis of Phylogeny Tree Evaluation Approaches (Project done in CS641). Proteomics and 2-D Gel Electrophoresis (Study done for CS)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'My Master s Work' - maleah

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
my master s work

My Master’s Work

Richa Tiwari

outline of the talk
Outline of the talk
  • Analysis of Phylogeny Tree Evaluation Approaches (Project done in CS641).
  • Proteomics and 2-D Gel Electrophoresis (Study done for CS)
  • Coexpression analysis of dimerization between bZIP proteins in groups C, S1 and S2 in Arabidopsis Thaliana, under the conditions of differential light and CO2 levels (Project done for BST676).
phylogenetic analysis
Phylogenetic Analysis
  • Alignment of the sequences
  • Determining the presence of relationship between sequences
  • Decision of most appropriate tree building algorithm
  • Scrutinize the tree to determine level of confidence
Algorithmic Method
  • Defines an algorithm that leads to the determination of a tree.

Criteria Based Method

  • Defines a criterion for comparing different phylogenies and thereforephylogenies can be ranked, and comparison possible.
maximum parsimony method
Maximum Parsimony Method
  • “Most parsimonious tree will explain the observed character distribution with a tree that have the minimum tree length.”
  • Tree selection criterion - Minimum tree length

(Fewest character state transformation)

maximum likelihood ml
Maximum Likelihood (ML)
  • ML evaluates the probability that the chosen evolutionary model will have generated the observed sequences.
  • Evolutionary Model: Accounts for the changes in sequences.
  • Phylogenies are then inferred by finding those trees that yield the highest likelihood.
distance based method
Distance Based Method
  • Distance-based methods attempts to find the distance that is the total changes between the two taxons from the point where they last shared an ancestor.
  • It is a cluster based method.
software used
Software used….


To compare the three phylogeny methods. Programs used from the package are:

  • Maximum Parsimony: DNAPARS
  • Maximum Likelihood DNAML
  • Distance-based DNADIST and Neighbor
  • Tree constructed using : DRAWGRAM
  • Consensus tree constructed using: CONSENSUS
using sample data
Using Sample data…

Maximum parsimony

Maximum likelihood

Distance Based



Consensus tree for given example…



+--1.0-| +------Chimp

| |

+------| +-------------Orang

| |

| +--------------------Rhesus





| +------Orang


| | +------Rhesus

| +--1.0-|

| +------Gorilla



Parsimony Method

Maximum Likelihood



| | +------Chimp

+------| +--1.0-|

| | +------Human

| |

| +--------------------Rhesus



Distance Based/Neighbor joining

  • Reliability of branch length estimates

NJ and ML> MP

  • Computational speed (n>500)

NJ/DNADIST: 0.005 seconds

DNAPARS: 0.5 seconds

DNAML: 230.0 seconds

  • Our experiments and the results obtained indicate that the Distance Based method is better than the other two methods in terms of Fastness, Simplicity and good performance for high number of taxa.
  • Also we can say that if you have a fast computer and large dataset Maximum likelihood method is better than Maximum parsimony.
  • The entire set of proteins expressed by the genome in a cell, organ or organism is referred to as the proteome.
  • Proteomics : Methods that discover and quantify proteins and their biochemical changes.
application of proteomics
Application of Proteomics
  • Protein Mining
  • Network Mapping
  • Mapping Protein Modifications
proteomics analysis
Proteomics Analysis


2 d gel electrophoresis
2-D Gel Electrophoresis

The horizontal position tells us about the charge of a protein, whereas the intensity of the gel spot tells us about the amount of that protein in the system.


1. Prepare protein sample in solution

2. Separate proteins (in each dimension)

I. Based on pH

Using isoelectric focusing (IEF)

Using immobilized pH gradient (IPG) strips

II. Based on molecular weight (size)

Using gel electrophoresis

3. Stain proteins to enable visualization.

introduction to the project
Introduction to the project
  • This project focuses on 2D gel electrophoretic separation of proteins.
  • We analyzed few random spots from the 2D gels of rat mammary tissue.
  • Statistical methods to find the variance in pI of the same protein in different gels.
  • Analyzed the reasons for these differences.
  • Inferred the relationship between the experimental values and the predicted values.
The Gels we used were from an already done experiment.28 Random protein spots were selected based on the their intensity from each of the three gels.

Mass Spectrometry

Differentially expressed proteins identified by image analysis were excised from 2D gels and trypsin digested. The resulting peptide fragments were analyzed on a MALDI mass spectrometer (MS). The MALDI spectra displays a “peptide fingerprint” of the protein usingcorresponding peptide masses.

Proteins were identified by entering the masses (ions from MALDI spectrum) of the peptides into a peptide mapping database. Some examples of such protein search engine are-
  • Mascot - very popular and also used in this project
  • Sequest
  • Aldente
  • ProteinLynx
  • Phenyx
  • We tabulated the result obtained from the database internet search and the one we obtained from the experiment.
  • We observed that the pI values as well as the molecular weight were not same in all gels for same protein.
  • The pI values of the three gels were quite similar but they were different from the predicted pI values.
In a 2D gel the position of protein spot can change due to various reasons and because of which the molecular weight and pI values may also differ.
  • We saw that the difference between the pI values of the three gels that is the experimental values are not very different from each other.
  • So we can interpret that the difference due to non biological reason is very less in the experiment.
  • There were few protein spots for which internet search revealed the same result as same protein name. But our experiment gave different results which can be because of different group (like phosphate or sulphate) getting attached to it. There can be other reasons for it too.
Average deviations between the three observed proteins and the predicted pI values were calculated as –

{(pI (gel 12_5)- pred. pI) + (pI (gel 12_5)- pred. pI) + (pI (gel 12_5)- pred. pI)} / 3

This gave the results shown in the next slide. We obtained positive as well as negative values for the deviations.

We can interpret that the proteins were modified more by negatively charged group such that there pI values decreased.
  • The addition of one phosphate groups to serine, threonine, and tyrosine residues typically decreases their isoelectric points by 0.1 pH unit.
regression results
Regression results
  • A statistical analysis test was performed to determine which of the three gels were closest to the predicted pI values. That is in which of the three gels had the proteins being least modified.
  • The test was Clibration test. We prepared a regression model for each gel. The inverse regression equation used was –

Predicted pI = {Observed pI from Gel – Intercept } slope

The result we obtained showed us that all the three gels predicted almost same pI values and they were quite away from the original predicted pI values.
  • All these similarities between the three gels show us that the difference between the pI values of proteins between the predicted and the experimented values is not very much because of non biological factors, but because of chemical modifications in the proteins.

Coexpression analysis of dimerization between bZIP proteins in groups C, S1 and S2 in Arabidopsis Thaliana, under the conditions of differential light and CO2 levels.

introduction transcription factor
IntroductionTranscription factor
  • Transcription factor are proteins involved in the regulation of gene expression, that bind to promoter region upstream of genes.
  • They are composed of two essential functional regions:

DNA binding domain – It binds to DNA.

Activator Domain – It interacts with other regulatory proteins there by affecting the efficiency of DNA binding.

bzip proteins
bZIP proteins
  • bZIP proteins are a class of transcription factor which has leucine zipper motif consisting of a periodic repetition of a leucine residue at every seventh position forming an alpha-helical confirmation.
  • The segment that comprises the basic region and the periodic array of leucine residues is referred to as ‘basic-region leucine zipper’ or bZIP motif.
some facts
Some facts
  • There are 792 bZIP proteins recorded in nonredundant database.
  • The no of bZIP proteins in the cell of selected organisms are as follows

yeast – 16

fruitfly – 110

plant (Arabidopsis thaliana) – 75

Human - 114

  • The Arabidopsis genome sequence contains 75 distinct members of the bZIP family, of which ~50 of them are not well studied.
  • Using common domains the bZIP family can be subdivided into 10 groups: Groups A - S.
c s protein interaction
C & S protein interaction
  • Elhert et al measured interactions between C and S proteins.
  • C and S1 heterodimerized
  • Two S2 proteins dimerized.
effect of light co2 on c s proteins
Effect of Light & CO2 on C & S proteins
  • Carbohydrate signaling

Increase of carbohydrate partitioning in elevated CO2, and a decrease in low light.

  • Seed development

Photosensory system detects the quality, quantity, direction and duration of light. Controls developmental pattern.

  • Stress

Light dependent generation of active oxygen species is a type of stress called photo oxidative stress.

experiment selection criteria
Experiment Selection Criteria
  • a) Chose C and S bZIP proteins
    • Coexpression Engine:
  • b) Selected tissue and array type
  • c) Chose specific experiment
  • Biologically feasible comparisons due to similar:
      • Tissue types
      • Experiment conditions
  • Statistical:
      • Measurement protocol
the tool used
The tool used
  • Co-expression Analysis Tool, version 2.0 developed at the Section on Statistical Genetics, UAB

mainly built to analyze the co-expression in Arabidopsis plant.

  • NASC Experiments to study affymetrix gene chip profiling of light and CO2 effect in leaf development in Arabidopsis used.
Uses the database built from Nottingham Arabidopsis Stock Center (NASC) AffyWatch Service.
  • Version 2 used in this project contains total of 566 microarray chips out of which 486 ATH1 micro array chips were used.
nasc experiments used
NASC Experiments used
  • 4 experiments conducted to examine the effect of developing leaf insertions under varying conditions of light and CO2.
  • The sampling was done at time interval of 0th, 2nd, 4th, 12th, 24th, 48th and 96th hour using a batch of 24 plants.
  • Four replicates were produced for each of the seven time points per experiment.
working of the tool
Working of the tool
  • Linear regression analysis is done on the probe sets.
  • Result of regression gives three important values- slope parameter (indicating the direction of co-expression), p-value (stating the confidence in the correlation) and R squared values (strength of correlation).
  • 4 genes of C group, 5 genes of S1 group and 3 genes of S2 group were studied in the project.
  • We submit the AGI IDs, the tissue type (here leaf) and the experiment number (in our case 156, 157 158 and 159) in the tool.
  • Our genes of interest are regressed on all the 22,810 ATH1 probe sets and a p-value, R squared value and slope parameter is obtained.
Those genes were subsequently sorted according to the R squared value and p-value and ranked such that –

Higher the R squared value, higher is the rank.

  • An arbitrary cut-off 15% of the top ranked genes were identified as highly co-expressed.
  • Genes coding for dimerizing proteins should be coexpressed at the same time.
  • If genes in group C and S1 lead to heterodimerization then they should be coexpressed at the same time.

Table 3: Regression estimates between Group C AtbZIIP63 (245925_at)

and Probes in Group S1, C and S2.


Table 4: Regression estimates between Group C AtbZIIP25 (251848_at)

and Probes in Group S1, C and S2.

  • bZIP1(Group S1) coexpresses well with bZIP63 (S1) under conditions of Ambient Co2 and low light but the same coexpression interaction is weak under conditions of Elevated Co2 and Ambient Light.
  • Also, very minimal interaction was found between genes of Group C (bZIP25, bZIP10, bZIP9, and bZIP63) and bZIP9 (Group C
  • This bZIP study was a good litmus test for the SSG Coexpression Tool.
  • Results presented in this study provide evidence that a good if not significant number of AtbZIP proteins interacting as heterodimers are co-regulating under varying conditions of stress.
  • This study shows evidence that coexpression patterns in genes can be studied by pooling publicly available microarray data and that the use of simple linear regression procedure is feasible.
  • Varying trends in the coexpression proposes some theories:
    • Different genes are expressed in diff tissues. Is study on leaf good enough to support our hypothesis?
    • Time-course data is valuable and should be accounted for in the analysis. However, this kind of analysis requires more observation recorded at different timepoints.
  • Linear regression is good but will a robust time-series based approach be appropriate in our study?