- 76 Views
- Uploaded on
- Presentation posted in: General

MATLAB Bioinformatics Tools

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

MATLAB Bioinformatics Tools

Rob Henson

The MathWorks, Inc.

- Development manager for Bioinformatics group at The MathWorks
- Natick, MA

- Software developer
- Background in algorithm design and software engineering

- Write software for bioinformatics
- Sequence analysis
- Microarray data analysis

- Some consulting
- Bioinformatics algorithm design
- Machine learning tools
- E.g. Neural networks, HMMs etc.

>> map = eye(128);

>> spy(map(seq1,seq2))

Why does this work?

How could we make this better?

- Does map need to be 128?
- What is the right value?

- Can we use less memory?
- How do we deal with bad inputs?
- Can we extend this to look for longer patterns?

- edit
- dbstop
- profiler
- Getting help
- Documentation
- Technical Support Knowledge Base
- Newsgroup

function matches = dotplot(seq1,seq2,window,stringency)

% DOTPLOT Visualize sequence matches.

% DOTPLOT(S,T) plots the sequence matches of sequences S and T.

%

% DOTPLOT(S,T,WINDOW,NUM) plots sequence matches when there

% are at least NUM matches in a window of size WINDOW. For nucleotide

% sequences a WINDOW of 11 and NUM of 7 is recommended in the

% literature.

%

% MATCHES = DOTPLOT(...) returns the number of dots in the dotplot

% matrix.

%

% Example:

% moufflon = getgenbank('AB060288','sequence',true);

% takin = getgenbank('AB060290','sequence',true);

% dotplot(moufflon,takin,11,7)

%

% This shows the similarities between prion protein (PrP) nucleotide

% sequences of two ruminants, the moufflon and the golden takin.

%

% See also NWALIGN, SWALIGN.

- Amino acid composition
- histc function

- Molecular weight
- Indexing and sum function

- Hydrophobicity

A: 89.000

R: 174.000

N: 132.000

D: 133.000

D: 121.000

Q: 146.000

E: 147.000

G: 75.000

H: 155.000

I: 131.000

L: 131.000

K: 146.000

M: 149.000

F: 165.000

P: 115.000

S: 105.000

T: 119.000

W: 204.000

Y: 181.000

V: 117.000

http://cn.expasy.org/tools/pscale/Molecularweight.html

mw = [89.0900

0

121.1500

133.1000

147.1300

165.1900

75.0700

155.1600

131.1700

0

146.1900

131.1700

149.2100

132.1200

0

115.1300

146.1500

174.2000

105.0900

119.1200

0

117.1500

204.2300

0

181.1900];

seq = ‘MATLAPEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSP’;

seqmw = mw(seq-’A’+1);

plot(seqmw)

1. Create a hydrophobicity plot

You can get the amino acid values from http://cn.expasy.org/cgi-bin/protscale.pl

Use Kyte & Doolittle’s values.

Create a function that has two inputs, the sequence and the window size. The function will create a hydrophobicity plot. The help for the function is on the next slide…

function hydrophobic(sequence, window_length)

% HYDROPHOBIC plots the hydrophobicity of an amino acid sequence

% HYDROPHOBIC(SEQUENCE,WINDOW_LENGTH) creates a hydrophobicity plot of

% SEQUENCE using a smoothing window of length, WINDOW_LENGTH.

%

% SEQUENCE must be a valid amino acid sequence. If SEQUENCE contain any

% symbols other than the standard 20 amino acid letters, the function

% will give an error message. SEQUENCE can be either upper or lower case.

%

% WINDOW_LENGTH must be an odd positive integer.

%

2. Modify the function to return the maximum and minimum hydrophobicity values in the plot.

Make appropriate changes to the help for the function.

- Alignment significance
- Alignment algorithms such as Smith-Waterman and Needleman-Wunsch always find some alignment. How do we know if what they find is significant or simply random?