Loading in 5 sec....

Welcome to Introduction to Bioinformatics Monday, 11 OctoberPowerPoint Presentation

Welcome to Introduction to Bioinformatics Monday, 11 October

- 303 Views
- Updated On :

Welcome to Introduction to Bioinformatics Monday, 11 October Characteristics of PSSMs How to make a PSSM Uncertainty and information How to score a sequence Problem sets (Blast, Modeling) Scenario 1 Prediction of regulatory site heterocysts sucrose N 2 fixation in cyanobacteria N 2

Related searches for Welcome to Introduction to Bioinformatics Monday, 11 October

Download Presentation
## PowerPoint Slideshow about 'Welcome to Introduction to Bioinformatics Monday, 11 October' - JasminFlorian

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Welcome toIntroduction to BioinformaticsMonday, 11 October

- Characteristics of PSSMs
- How to make a PSSM
- Uncertainty and information
- How to score a sequence

Problem sets (Blast, Modeling)

Prediction of regulatory site

GTA…(8)…TAC

…(20-24)…TAnnnT

Differentiation in cyanobacteriaWhat does NtcA bind to?

Herrero et al (2001) J Bacteriol 183:411-425

Differentiation in cyanobacteria

Sequence upstream from hetQ

ttctatgagaatataaaattttccttaagtttct

aaaaccgaccattctgatgaataagtccggtttt

tgctttttcgctttatttatctatatttccaagt

ggggtgacaactatcttgccaatattgtcgttat

gaaaaaatctGTAacatgagaTACacaatagcatttatatttgcttTAgtaTctctctcttgggtggg

…(20-24)…TAnnnT

GTA…(8)…TACNtcA binding site

Promoter

Differentiation in cyanobacteriaIntegration of signals through HetR

HetQ

-N

NtcA

???

Genes needed for differentiation

Position in cell cycle

HetR

Level of PatS

Level of HetN

Master regulator

Stockholm

- Did you go for it?

YES

- Did it bind NtcA?

YES

- Did killing the site prevent heterocysts?

NO

Stockholm

- Did you go for it?

YES

- Did it bind NtcA?

YES

- Did killing the site prevent heterocysts?

NO

- Fame and fortune?

NO

- Reasonable paper?

YES

If hetQ isn’t the golden link, then what is?

-N

NtcA

???

Genes needed for differentiation

HetR

- Gene preceded by NtcA-binding site

- Blocking NtcA-binding affects gene expression

- Gene product required for hetR expression

Regexps may also “overfit” the model – be too strict and miss real binding sites

Scenario 1: The aftermath

If hetQ isn’t the golden link, then what is?

-N

NtcA

???

Genes needed for differentiation

HetR

- Gene preceded by NtcA-binding site

How to find?

- Search for GTA…(N8)…TAC…(N20-24)…TA…T?

Table 1: Examples of position-specific scoring matrices from sequence alignment

A. Sequence alignmenta

A

T

T

T

A

G

T

A

T

C

A

A

A

A

A

T

A

A

C

A

A

T

T

C

G

T

T

C

T

G

T

A

A

C

A

A

A

G

A

C

T

A

C

A

A

A

A

C

A

T

T

T

T

G

T

A

G

C

T

A

C

T

T

A

T

A

C

T

A

T

T

T

A

A

G

C

T

G

T

A

A

C

A

A

A

A

T

C

T

A

C

C

A

A

A

T

C

A

T

T

T

G

T

A

C

A

G

T

C

T

G

T

T

A

C

C

T

T

T

A

Position-specific scoring matrices: A better way

A. sequence alignmentSequence alignmenta

A

T

T

T

A

G

T

A

T

C

A

A

A

A

A

T

A

A

C

A

A

T

T

C

G

T

T

C

T

G

T

A

A

C

A

A

A

G

A

C

T

A

C

A

A

A

A

C

A

T

T

T

T

G

T

A

G

C

T

A

C

T

T

A

T

A

C

T

A

T

T

T

A

A

G

C

T

G

T

A

A

C

A

A

A

A

T

C

T

A

C

C

A

A

A

T

C

A

T

T

T

G

T

A

C

A

G

T

C

T

G

T

T

A

C

C

T

T

T

A

B. Table of occurrencesa

A

3

2

0

0

1

0

0

5

2

1

3

4

3

2

2

1

1

5

0

2

4

2

2

1

C

1

0

0

2

0

0

0

0

1

4

0

0

2

0

0

2

0

0

5

2

0

0

0

2

G

1

0

1

0

0

5

0

0

1

0

1

0

0

1

1

0

0

0

0

0

0

0

0

0

T

0

3

4

3

4

0

5

0

1

0

1

1

0

2

2

2

4

0

0

1

1

3

3

2

Position-specific scoring matrices: A better way

NtcA sequence alignment

???

HetR

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG

AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT

TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC

TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC

GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC

CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA

TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA

AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG

AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA

TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG

CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC

GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG

GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT

CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA

ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA

TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT

CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA

CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG

AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC

CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA

TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT

Good match to NtcA-site

CTCCGTAAAC CTCTAAC...

Good match to NtcA-site

Good match to NtcA-site

Good match to NtcA-site

NtcA-based PSSM

Position-specific scoring matrices: A better way

Anabaena genome

Table 1: Examples of position-specific scoring matrices from sequence alignment

A. Sequence alignmenta

A

T

T

T

A

G

T

A

T

C

A

A

A

A

A

T

A

A

C

A

A

T

T

C

G

T

T

C

T

G

T

A

A

C

A

A

A

G

A

C

T

A

C

A

A

A

A

C

A

T

T

T

T

G

T

A

G

C

T

A

C

T

T

A

T

A

C

T

A

T

T

T

A

A

G

C

T

G

T

A

A

C

A

A

A

A

T

C

T

A

C

C

A

A

A

T

C

A

T

T

T

G

T

A

C

A

G

T

C

T

G

T

T

A

C

C

T

T

T

A

Position-specific scoring matrices: A better way

A. sequence alignmentSequence alignmenta

A

T

T

T

A

G

T

A

T

C

A

A

A

A

A

T

A

A

C

A

A

T

T

C

G

T

T

C

T

G

T

A

A

C

A

A

A

G

A

C

T

A

C

A

A

A

A

C

A

T

T

T

T

G

T

A

G

C

T

A

C

T

T

A

T

A

C

T

A

T

T

T

A

A

G

C

T

G

T

A

A

C

A

A

A

A

T

C

T

A

C

C

A

A

A

T

C

A

T

T

T

G

T

A

C

A

G

T

C

T

G

T

T

A

C

C

T

T

T

A

B. Table of occurrencesa

A

3

2

0

0

1

0

0

5

2

1

3

4

3

2

2

1

1

5

0

2

4

2

2

1

C

1

0

0

2

0

0

0

0

1

4

0

0

2

0

0

2

0

0

5

2

0

0

0

2

G

1

0

1

0

0

5

0

0

1

0

1

0

0

1

1

0

0

0

0

0

0

0

0

0

T

0

3

4

3

4

0

5

0

1

0

1

1

0

2

2

2

4

0

0

1

1

3

3

2

Position-specific scoring matrices: A better way

B. sequence alignmentTable of occurrencesa

A

0

1

0

0

5

2

1

3

4

3

C

2

0

0

0

0

1

4

0

0

2

G

0

0

5

0

0

1

0

1

0

0

T

3

4

0

5

0

1

0

1

1

0

C. Position-specific scoring matrix (B = 0)b

A

0

.20

0

0

1.0

.40

.20

.60

.80

.60

C

.40

0

0

0

0

.20

.80

0

0

.40

G

0

0

1.0

0

0

.20

0

.20

0

0

T

.60

.80

0

1.0

0

.20

0

.20

.20

0

Position-specific scoring matrices: A better way

Table 2: Scoring a sequence with a PSSM sequence alignment

urt-71

T

A

G

T

A

T

C

A

A

A

Score

.6

.2

1

1

1

.2

.8

.6

.8

.6

w/ps’countsb

.51

.24

.75

.79

.79

.24

.61

.51

.65

.51

Normal’db

1.6

.75

4.2

2.5

2.5

.75

3.4

1.6

2.0

1.6

Position-specific scoring matrices: A better way

Score = .60 * .20 * 1.0 * …

A. sequence alignmentSequence alignmenta

A

T

T

T

A

G

T

A

T

C

A

A

A

A

A

T

A

A

C

A

A

T

T

C

G

T

T

C

T

G

T

A

A

C

A

A

A

G

A

C

T

A

C

A

A

A

A

C

A

T

T

T

T

G

T

A

G

C

T

A

C

T

T

A

T

A

C

T

A

T

T

T

A

A

G

C

T

G

T

A

A

C

A

A

A

A

T

C

T

A

C

C

A

A

A

T

C

A

T

T

T

G

T

A

C

A

G

T

C

T

G

T

T

A

C

C

T

T

T

A

Position-specific scoring matricesIntroduction of pseudocounts

A?

qG,6 = 5 real counts

pG = ? pseudocounts

Position-specific scoring matrices sequence alignmentIntroduction of pseudocounts

Score(position,nucleotide) = (q + p) / (N + B)

p = pseudocounts = B * (overall frequency of nucleotide)

[A] = 0.32[T] = 0.32[C] = 0.18[G] = 0.18

B = Total number of pseudocounts

= Square root (N) ?

or = 0.1 ?

C. sequence alignmentPosition-specific scoring matrix (B = 0)b

A

0

.20

0

0

1.0

.40

.20

C

.40

0

0

0

0

.20

.80

G

0

0

1.0

0

0

.20

0

T

.60

.80

0

1.0

0

.20

0

D. Position-specific scoring matrix (B = N = 2.2)c

A

.099

.24

.099

.099

.79

.38

.24

C

.33

.056

.056

.056

.056

.19

.61

G

.056

.056

.75

.056

.056

.19

.056

T

.51

.65

.099

.79

.099

.24

.099

Position-specific scoring matricesIntroduction of pseudocounts

A. sequence alignmentSequence alignmenta

A

T

T

T

A

G

T

A

T

C

A

A

A

A

A

T

A

A

C

A

A

T

T

C

G

T

T

C

T

G

T

A

A

C

A

A

A

G

A

C

T

A

C

A

A

A

A

C

A

T

T

T

T

G

T

A

G

C

T

A

C

T

T

A

T

A

C

T

A

T

T

T

A

A

G

C

T

G

T

A

A

C

A

A

A

A

T

C

T

A

C

C

A

A

A

T

C

A

T

T

T

G

T

A

C

A

G

T

C

T

G

T

T

A

C

C

T

T

T

A

B. Table of occurrencesa

A

3

2

0

0

1

0

0

5

2

1

3

4

3

2

2

1

1

5

0

2

4

2

2

1

C

1

0

0

2

0

0

0

0

1

4

0

0

2

0

0

2

0

0

5

2

0

0

0

2

G

1

0

1

0

0

5

0

0

1

0

1

0

0

1

1

0

0

0

0

0

0

0

0

0

T

0

3

4

3

4

0

5

0

1

0

1

1

0

2

2

2

4

0

0

1

1

3

3

2

Position-specific scoring matricesNormalization

How to account for similarity due to similar base composition?

Compare ScorePSSM / Scorebackground frequency

0.79 / 0.32 = 2.2

E. sequence alignmentPosition-specific scoring matrix (B = 0.1)c

A

.006

.20

.006

.006

.99

.40

.20

.59

C

.40

.004

.004

.004

.004

.20

.79

.004

G

.004

.004

.98

.004

.004

.20

.004

.20

T

.59

.79

.006

.99

.006

.20

.006

.20

F. Position-specific scoring matrix: Log-odds form (B = 0.1)c,d

A

2.2

0.7

2.2

2.2

0.0

0.4

0.7

0.2

C

0.4

2.5

2.5

2.5

2.5

0.7

0.1

2.5

G

2.5

2.5

0.0

2.5

2.5

0.7

2.5

0.7

T

0.2

0.1

2.2

0.0

2.2

0.7

2.2

0.7

Position-specific scoring matricesLog odds form

Log odds = -log(score)

Score * score * score … log + log + log …

Position-specific scoring matrices sequence alignmentExpand training set through orthologs

Table 3: Training set including sequences from two Nostocsa

71-devB CATTACTCCTTCAATCCCTCGCCCCTCATTTGTACAGTCTGTTACCTTTACCTGAAACAGATGAATGTAGAATTTA

Np-devB CCTTGACATTCATTCCCCCATCTCCCCATCTGTAGGCTCTGTTACGTTTTCGCGTCACAGATAAATGTAGAATTCA

71-glnA AGGTTAATATTACCTGTAATCCAGACGTTCTGTAACAAAGACTACAAAACTGTCTAATGTTTAGAATCTACGATAT

Np-glnA AGGTTAATATAACCTGATAATCCAGATATCTGTAACATAAGCTACAAAATCCGCTAATGTCTACTATTTAAGATAT

71-hetC GTTATTGTTAGGTTGCTATCGGAAAAAATCTGTAACATGAGATACACAATAGCATTTATATTTGCTTTAGTATCTC

71-nirA TATTAAACTTACGCATTAATACGAGAATTTTGTAGCTACTTATACTATTTTACCTGAGATCCCGACATAACCTTAG

Np-nirA CATCCATTTTCAGCAATTTTACTAAAAAATCGTAACAATTTATACGATTTTAACAGAAATCTCGTCTTAAGTTATG

71-ntcB ATTAATGAAATTTGTGTTAATTGCCAAAGCTGTAACAAAATCTACCAAATTGGGGAGCAAAATCAGCTAACTTAAT

Np-ntcB TTATACAAATGTAAATCACAGGAAAATTACTGTAACTAACTATACTAAATTGCGGAGAATAAACCGTTAACTTAGT

71-urt ATTAATTTTTATTTAAAGGAATTAGAATTTAGTATCAAAAATAACAATTCAATGGTTAAATATCAAACTAATATCA

Np-urt TTATTCTTCTGTAACAAAAATCAGGCGTTTGGTATCCAAGATAACTTTTTACTAGTAAACTATCGCACTATCATCA

Position-specific scoring matrices sequence alignmentDecrease complexity through info analysis

Uncertainty (Hc) = - Sum [piclog2(pic)]

Position-specific scoring matrices sequence alignmentDecrease complexity through info analysis

Uncertainty (Hc) = - Sum [piclog2(pic)]

H1= -{[4/11 log2(4/11)] + [3/11 log2(3/11)] + [1/11 log2(1/11)] + [3/11 log2(3/11)]}

= 1.87

H31= -{[1/11 log2(1/11)] + [1/11 log2(1/11)] + [1/11 log2(1/11)] + [8/11 log2(8/11)]}

= 1.28

Information content = Sum (Hmax– Hc) (summed over all columns)

Download Presentation

Connecting to Server..