15 strings
Download
1 / 56

15. Strings - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

15. Strings. Operations Subscripting Concatenation Search Numeric-String Conversions Built-Ins: int2str,num2str, str2double. Previous Dealings. N = input(‘ Enter Degree: ’) title(‘ The Sine Function ’) disp( sprintf(‘N = %2d’,N) ).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 15. Strings' - moe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
15 strings

15. Strings

Operations

Subscripting

Concatenation

Search

Numeric-String Conversions

Built-Ins: int2str,num2str, str2double


Previous dealings
Previous Dealings

N = input(‘Enter Degree: ’)

title(‘The Sine Function’)

disp( sprintf(‘N = %2d’,N) )


A string is an array of characters

A a 7 * > @ x !

A String is an Array of Characters

‘Aa7*>@ x!’

This string has length 9.


Why are stirngs important
Why are Stirngs Important?

  • Numerical Data often encoded as

    strings

    2. Genomic calculation/search


Numerical data is often encoded in strings
Numerical Data is Often Encoded in Strings

For example, a file containing

Ithaca weather data begins with the string

W07629N4226

Longitude:76o 29’ West

Latitude:42o 26’ North


What we would like to do
What We Would Like to Do

W07629N4226

Get hold of the substring ‘07629’

Convert it to floating format so that

it can be involved in numerical calculations.


Format issues
Format Issues

9 as an IEEE floating point number:

9 as a character:

0100000blablahblah01001111000100010010

Different

Representation

01000otherblabla


Genomic computations
Genomic Computations

Looking for patterns in a DNA sequence:

‘ATTCTGACCTCGATC’

ACCT


Genomic computations1
Genomic Computations

Quantifying Differences:

ATTCTGACCTCGATC

ATTGCTGACCTCGAT

Remove?



Strings can be assigned to variables
Strings Can Be Assignedto Variables

S = ‘N = 2’

N = 2;

S = sprintf(‘N = %1d’,N)

‘N = 2’

S

sprintf produces a formatted string using fprintf rules


Strings have a length
Strings Have a Length

s = ‘abc’;

n = length(s); % n = 3

s = ‘’; % the empty string

n = length(s) % n = 0

s = ‘ ‘; % single blank

n = length(s) % n = 1


Concatenation
Concatenation

This:

S = ‘abc’;

T = ‘xy’

R = [S T]

is the same as this:

R = ‘abcxy’


Repeated concatenation
Repeated Concatenation

This:

s = ‘’;

for k=1:5

s = [s ‘z’];

end

is the same as this:

z = ‘zzzzz’


Replacing and appending characters
Replacing and AppendingCharacters

s = ‘abc’;

s(2) = ‘x’ % s = ‘axc’

t = ‘abc’

t(4) = ‘d’ % t = ‘abcd’

v = ‘’

v(5) = ‘x’ % v = ‘ x’


Extracting substrings
Extracting Substrings

s = ‘abcdef’;

x = s(3) % x = ‘c’

x = s(2:4) % x = ‘bcd’

x = s(length(s)) % x = ‘f’


Colon notation
Colon Notation

s( : )

Starting

Location

Ending

Location


Replacing substrings
Replacing Substrings

s = ‘abcde’;

s(2:4) = ‘xyz’ % s = ‘axyze’

s = ‘abcde’

s(2:4) = ‘wxyz’ % Error


Question time
Question Time

s = ‘abcde’;

for k=1:3

s = [ s(4:5) s(1:3)];

end

What is the final value of s ?

A abcde B. bcdea C. eabcd D. deabc


Problem dna strand
Problem: DNA Strand

x is a string made up of the characters

‘A’, ‘C’, ‘T’, and ‘G’.

Construct a string Y obtained from x by replacinig each A by T, each T by A, each C by G, and each G by C

x: ACGTTGCAGTTCCATATG

y: TGCAACGTCAAGGTATAC


function y = Strand(x)

% x is a string consisting of

% the characters A, C, T, and G.

% y is a string obtained by

% replacing A by T, T by A,

% C by G and G by C.


Comparing strings
Comparing Strings

Built-in function strcmp

strcmp(s1,s2) is true if the strings s1 and s2 are identical.


How y is built up
How y is Built Up

x: ACGTTGCAGTTCCATATG

y: TGCAACGTCAAGGTATAC

Start: y: ‘’

After 1 pass:y: T

After 2 passes: y: TG

After 3 passes: y: TGC


for k=1:length(x)

if strcmp(x(k),'A')

y = [y 'T'];

elseif strcmp(x(k),'T')

y = [y 'A'];

elseif strcmp(x(k),'C')

y = [y 'G'];

else

y = [y 'C'];

end

end


A dna search problem
A DNA Search Problem

Suppose S and T are strings, e.g.,

S: ‘ACCT’

T: ‘ATGACCTGA’

We’d like to know if S is a substring of T

and if so, where is the first occurrance?


function k = FindCopy(S,T)

% S and T are strings.

% If S is not a substring of T,

% then k=0.

% Otherwise, k is the smallest

% integer so that S is identical

% to T(k:k+length(S)-1).


A dna search problem1
A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(1:4))  False


A dna search problem2
A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(2:5))  False


A dna search problem3
A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(3:6))  False


A dna search problem4
A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(4:7)))  True


Pseudocode
Pseudocode

First = 1; Last = length(S);

while S is not identical to T(First:Last)

First = First + 1;

Last = Last + 1;

end


Subscript error
Subscript Error

S: ‘ACCT’

T: ‘ATGACTGA’

strcmp(S,T(6:9))

There’s a problem if S is not a substring of T.


Pseudocode1
Pseudocode

First = 1; Last = length(s);

while Last<=length(T) && ... ~strcmp(S,T(First:Last))

First = First + 1;

Last = Last + 1;

end


Post loop processing
Post-Loop Processing

Loop ends when this is false:

Last<=length(T) && ...

~strcmp(S,T(First:Last))


Post loop processing1
Post-Loop Processing

if Last>length(T)

% No Match found

k=0;

else

% There was a match

k=First;

end

The loop ends for one of two reasons.


Numeric string conversion
Numeric/StringConversion


String to numeric conversion
String-to-Numeric Conversion

An example…

Convention:

W07629N4226

Longitude: 76o 29’ West

Latitude: 42o 26’ North


String to numeric conversion1
String-to-Numeric Conversion

S = ‘W07629N4226’

s1 = s(2:4);

x1 = str2double(s1);

s2 = s(5:6);

x2 = str2double(s2);

Longitude = x1 + x2/60

There are 60 minutes in a degree.


Numeric to string conversion
Numeric-to-String Conversion

x = 1234;

s = int2str(x); % s = ‘1234’

x = pi;

s = num2str(x,’%5.3f’); % s =‘3.142’


Problem
Problem

Given a date in the format

‘mm/dd’

specify the next day in the same format


Y tomorrow x
y = Tomorrow(x)

x y

02/28 03/01

07/13 07/14

12/31 01/01


Get the day and month
Get the Day and Month

month = str2double(x(1:2));

day = str2double(x(4:5));

Thus, if x = ’02/28’ then month is assigned

the numerical value of 2 and day is assigned the numerical value of 28.


L = [31 28 31 30 31 30 31 31 30 31 30 31];

if day+1<=L(month)

% Tomorrow is in the same month

newDay = day+1;

newMonth = month;


L = [31 28 31 30 31 30 31 31 30 31 30 31];

else

% Tomorrow is in the next month

newDay = 1;

if month <12

newMonth = month+1;

else

newMonth = 1;

end


The new day string
The New Day String

Compute newDay (numerical) and convert…

d = int2str(newDay);

if length(d)==1

d = ['0' d];

end


The new month string
The New Month String

Compute newMonth (numerical) and convert…

m = int2str(newMonth);

if length(m)==1;

m = ['0' m];

end



Some other useful string functions
Some other useful string functions

str= ‘Cs 1112’;

length(str) % 7

isletter(str) % [1 1 0 0 0 0 0]

isspace(str) % [0 0 1 0 0 0 0]

lower(str) % ‘cs 1112’

upper(str) % ‘CS 1112’

ischar(str)

% Is str a char array? True (1)

strcmp(str(1:2),‘cs’)

% Compare strings str(1:2) & ‘cs’. False (0)

strcmp(str(1:3),‘CS’)

% False (0)


Ascii characters american standard code for information interchange

ascii code Character

: :

: :

65 ‘A’

66 ‘B’

67 ‘C’

: :

90 ‘Z’

: :

ascii code Character

: :

: :

48 ‘0’

49 ‘1’

50 ‘2’

: :

57 ‘9’

: :

ASCII characters(American Standard Code for Information Interchange)


Character vs ascii code
Character vs ASCII code

str= ‘Age 19’

%a 1-d array of characters

code= double(str)

%convert chars to ascii values

str1= char(code)

%convert ascii values to chars


Arithmetic and relational ops on characters
Arithmetic and relational ops on characters

  • ‘c’-‘a’ gives 2

  • ‘6’-‘5’ gives 1

  • letter1=‘e’; letter2=‘f’;

  • letter1-letter2 gives -1

  • ‘c’>’a’ gives true

  • letter1==letter2 gives false

  • ‘A’ + 2 gives 67

  • char(‘A’+2) gives ‘C’


Example toupper
Example: toUpper

Write a function toUpper(cha) to convert character cha to upper case if cha is a lower case letter. Return the converted letter. If cha is not a lower case letter, simply return the character cha.

Hint: Think about the distance between a letter and the base letter ‘a’ (or ‘A’). E.g.,

a b c d e fgh …

A B C D E FGH …

Of course, do not use Matlab function upper!

distance = ‘g’-‘a’ = 6 = ‘G’-‘A’


function up = toUpper(cha)

% up is the upper case of character cha.

% If cha is not a letter then up is just cha.

up= cha;

cha is lower case if it is between ‘a’ and ‘z’


function up = toUpper(cha)

% up is the upper case of character cha.

% If cha is not a letter then up is just cha.

up= cha;

if ( cha >= 'a' && cha <= 'z' )

% Find distance of cha from ‘a’

end


function up = toUpper(cha)

% up is the upper case of character cha.

% If cha is not a letter then up is just cha.

up= cha;

if ( cha >= 'a' && cha <= 'z' )

% Find distance of cha from ‘a’

offset= cha - 'a';

% Go same distance from ‘A’

end


function up = toUpper(cha)

% up is the upper case of character cha.

% If cha is not a letter then up is just cha.

up= cha;

if ( cha >= 'a' && cha <= 'z' )

% Find distance of cha from ‘a’

offset= cha - 'a';

% Go same distance from ‘A’

up= char('A' + offset);

end


ad