Outline

1 / 39

# Outline - PowerPoint PPT Presentation

Outline. Outline. Lab 1 Solution Program 2 Scoping Algorithm efficiency Sorting Hashes Review for midterm Quiz 3. Lab 1 Solution. BINF634 Fall 2013 Regular Expression Lab (Key) All problems except number 9 are worth 11 points. Number 9 is worth 12 points.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Outline

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline

Outline
• Lab 1 Solution
• Program 2
• Scoping
• Algorithm efficiency
• Sorting
• Hashes
• Review for midterm
• Quiz 3

BINF 634 Fall 2013 - LECTURE06

Lab 1 Solution

BINF634 Fall 2013 Regular Expression Lab (Key)

All problems except number 9 are worth 11 points. Number 9 is worth 12 points.

1) Write a PERL regular expression that would match only the strings: “bat”, “at”, and “t”.

/^b?a?t\$/

2) Write a PERL regular expression to recognize any string that contains the substring “jeff”.

/jeff/

BINF 634 Fall 2013 - LECTURE06

Lab 1 Solution

3) Write a PERL regular expression that would match the strings: “bat”, “baat”, “baaat”, “baa…aat”, etc. (strings that start with b, followed by one or more a’s, ending with a t).

/^ba+t\$/

4) Write a PERL regular expression that matches the strings: “hog”, “Hog”, “hOg”, “HOG”, “hOG”, etc. (That is, “hog” written in any combination of uppercase or lowercase letters.)

/^[hH][oO][Gg]\$/

5) Write a PERL regular expression that matches any positive number (with or without a decimal point). Hint #1: if there is a decimal point, there must be at least one digit following the decimal point. Hint #2: Since the dot “.” matches any character, you must use \. to match a decimal point.

/^\d+(\.\d+)?\$/

BINF 634 Fall 2013 - LECTURE06

Lab 1 Solution

6) Write a PERL regular expression to match any integer that doesn’t end in 8.

/^\d*[^8]\$/

7) Write a PERL regular expression to match any line with exactly two words (or numbers) separated by any amount of whitespace (spaces or tabs). There may or may not be whitespace at the beginning or end of the line.

^\s*\w+\s+\w+\s*\$

BINF 634 Fall 2013 - LECTURE06

Program 2 Discussions
• Questions on Program 2?
• Discussions on the permute function

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

my \$x = 23;

print "value in main body is \$x \n";

mysub(\$x);

print "value in main body is \$x \n";

exit;

sub mysub{

print "value in subroutine is \$x \n";

\$x=33;

}

value in main body is 23

value in subroutine is 23

value in main body is 33

#!/usr/bin/perl

use strict;

use warnings;

{

my \$x = 23;

print "value in main body is \$x \n";

mysub(\$x);

print "value in main body is \$x \n";

exit;

}

sub mysub{

print "value in subroutine is \$x \n";

\$x=33;

}

This will not compile

Be Careful With Scope

Scoping

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

{

my \$x = 23;

print "value in main body is \$x \n";

mysub(\$x);

print "value in main body is \$x \n";

exit;

}

sub mysub{

my(\$x) = @_;

\$x=33;

print "value in subroutine is \$x \n";

}

value in main body is 23

value in subroutine is 33

value in main body is 23

Be Careful With Scope (cont.)

Scoping

BINF 634 Fall 2013 - LECTURE06

Data Structures and Algorithm Efficiency

Algorithm Efficiency

Algorithm is O(N2)

# An inefficient way to compute intersections

my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();

for my \$i (@a) {

for my \$j (@b) {

if (\$i eq \$j) {

push @intersection, \$i;

last;

}

}

}

print "@intersection\n";

exit;

Output:

A C D G H I J K X Z

N = size of Lists

BINF 634 Fall 2013 - LECTURE06

Algorithm is O(N)

N = size of Lists

Data Structures and Algorithm Efficiency

Algorithm Efficiency

# A better way to compute intersections

my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();

# "mark" each item in @a

my %mark = ();

for my \$i (@a) { \$mark{\$i} = 1 }

# intersection = any "marked" item in @b

for my \$j (@b) {

if (exists \$mark{\$j}) {

push @intersection, \$j;

}

}

print "@intersection\n";

exit;

Output:

A C D G H I J K X Z

version 1

version 2

BINF 634 Fall 2013 - LECTURE06

Demonstration

Algorithm Efficiency

• Unix commands:
• /usr/bin/time
• diff
• cmp

% wc -l list1 list2

24762 list1

12381 list2

37143 total

% /usr/bin/time intersect1.pl list1 list2 > out1

22.91 real 22.88 user 0.02 sys

% /usr/bin/time intersect2.pl list1 list2 > out2

0.06 real 0.05 user 0.00 sys

22.88/.05 = 458

BINF 634 Fall 2013 - LECTURE06

Hashes and Efficiency

Hashes

• Hashes provide a very fast way to look up information associated with a set of scalar values (keys)
• Examples:
• Count how many time each word appears in a file
• Also: whether or not a certain work appeared in a file
• Count how many time each codon appears in a DNA sequence
• Whether a given codon appears in a sequence
• How many time an item appears in a given list
• Intersections

BINF 634 Fall 2013 - LECTURE06

Examples

Hashes

• Write a subroutine get_intersection(\@a, \@b) that returns the intersection of two lists.
• Write a subroutine first_list_only(\@a, \@b) that returns the items that are in list @a but not in @b.
• Write a subroutine unique(@a) that return the unique items in list @a (that is, remove the duplicates).
• Write a subroutine dups(\$n, @a) that returns a list of items that appear in @a at least \$n times.

BINF 634 Fall 2013 - LECTURE06

Sorting

Sorting

• sort LIST -- returns list sorted in string order
• sort BLOCK LIST -- compares according to BLOCK
• sort USERSUB LIST -- compares according subroutine SUB

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort @unsorted;

print "@unsorted \n";

print "@sorted \n";

exit;

}

Output:

17 8 2 111

111 17 2 8

Sorting Our First Attempt

Sorting

BINF 634 Fall 2013 - LECTURE06

The Comparison Operator

Sorting

1. \$a <=> \$b returns 0 if equal, 1 if \$a > \$b, -1 if \$a < \$b

2. The "cmp" operator gives similar results for strings

3. \$a and \$b are special global variables:

do NOT declare with "my" and do NOT modify.

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort { \$a <=> \$b }@unsorted;

print "@unsorted \n";

print "@sorted \n";

exit;

}

Output:

17 8 2 111

2 8 17 111

Sorting Numerically

Sorting

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort numerically @unsorted;

print "@unsorted \n";

print "@sorted \n";

exit;

}

sub numerically { \$a <=> \$b }

Output:

17 8 2 111

2 8 17 111

Sorting Using a Subroutine

Sorting

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@reversesorted) = reverse sort numerically @unsorted;

print "@unsorted \n";

print "@reversesorted \n";

exit;

}

sub numerically { \$a <=> \$b }

Output:

17 8 2 111

111 17 8 2

Sorting Descending

Sorting

BINF 634 Fall 2013 - LECTURE06

!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;

## Sort @dna by length:

@dna = sort { length(\$a) <=> length(\$b) }@dna;

print "@dna\n"; # Output: GT TTTT CTCAT TATAATG

exit;

}

Output:

GT TTTT CTCAT TATAATG

Sorting DNA by Length

Sorting

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;

@dna = sort { (\$b =~ tr/Tt//) <=> (\$a =~ tr/Tt//) } @dna;

print "@dna\n"; # Output: TTTT TATAATG CTCAT GT

exit;

}

Output:

TTTT TATAATG CTCAT GT

Sorting DNA by Number of T’s (Largest First)

Sorting

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;

@dna = reverse sort {

(\$a =~ tr/Tt//) <=> (\$b =~ tr/Tt//) } @dna;

print "@dna\n"; # Output: TTTT TATAATG CTCAT GT

exit;

}

Output:

TTTT TATAATG CTCAT GT

Sorting DNA by Number of T’s (Largest First) (Take 2)

Sorting

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

{

# Sort strings without regard to case:

my(@unsorted) = qw/ mouse Rat HUMAN eColi /;

my(@sorted) = sort { lc(\$a) cmp lc(\$b) } @unsorted;

print "@unsorted \n";

print "@sorted \n";

exit;

}

Output:

mouse Rat HUMAN eColi

eColi HUMAN mouse Rat

Sorting Strings Without Regard to Case

Sorting

BINF 634 Fall 2013 - LECTURE06

#!/usr/bin/perl

use strict;

use warnings;

{

my(%sales_amount) = ( auto=>100, kitchen=>2000, hardware=>200 );

sub bysales { \$sales_amount{\$b} <=> \$sales_amount{\$a} }

for my \$dept (sort bysales keys %sales_amount) {

printf "%s:\t%4d\n", \$dept, \$sales_amount{\$dept};

}

exit;

}

Output:

kitchen:2000

hardware: 200

auto: 100

Sorting Hashes by Value

Sorting

BINF 634 Fall 2013 - LECTURE06

Review for Midterm BINF634

Midterm

• Material
• Tisdall Chapters 1-9
• Wall Chapter 5
• Lecture notes
• The exam will be open book and notes
• You cannot work together on it
• You cannot use outside material
• You will have the full period to take the midterm
• You will be asked to program

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• Given two DNA fragments contained in \$DNA1 and \$DNA2 how can we concatenate these to make a third string \$DNA3?

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What does this line of code do?

\$RNA = ~ s/T/U/ig

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What does this statement do?

\$revcom =~ tr/ACGT/TGCA/;

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What do these four lines do?

@bases = (‘A’, ‘C’, ‘G’, ‘T’);

\$base1 = pop @bases;

unshift (@bases, \$base1);

print “@bases\n\n”;

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What does this code snippet do if COND is true

unless(COND){

#do something

}

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What does this code fragment do?

\$protein = join(‘’,@protein)

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What does this code fragment do?

\$myfile = “myfile”;

Open(MYFILE, “>\$myfile”)

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What does this code fragment do?

while(\$DNA =~ /a/ig){\$a++}

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What is the effect of using the command

use strict;

• at the beginning of your program?

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What is contained in the reserved variable \$0 and

in the array @ARGV ?

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What is the difference between “pass by value” and “pass by reference” ?

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• What is a pointer and what does it mean to dereference a pointer?

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• How do you invoke perl with the debugger?

BINF 634 Fall 2013 - LECTURE06

Some Example Questions

Midterm

• Given an array @verbs what is going on here?

\$verbs[rand @verbs]

BINF 634 Fall 2013 - LECTURE06

• Niklaus Wirth, Algorithms + Data Structures = Programs, Prentice Hall 1976.
• Dated in terms of language, Pascal, but very well written and understandable

BINF 634 Fall 2013 - LECTURE06