60 likes | 207 Views
Searching. Why does SSAHA compute locations of all n-mers? agttc occurs at (1,3) (1,18) (2,6) (3,13), … What is cost of precomputation? Why are these ordered-pairs sometimes stored on disk? Time-space tradeoff How do we search a string once for an nmer? Java string method indexOf
E N D
Searching • Why does SSAHA compute locations of all n-mers? • agttc occurs at (1,3) (1,18) (2,6) (3,13), … • What is cost of precomputation? • Why are these ordered-pairs sometimes stored on disk? • Time-space tradeoff • How do we search a string once for an nmer? • Java string method indexOf • There are two versions!
What’s good, what’s bad? EcoRI ArrayList<String> list = new ArrayList<String>(); String eco = “GAATTC”; for(String s : strands){ for(int k=0; k <= s.length() - eco.length(); k++){ // start at location k in s and location j in eco boolean match = true; for(int j=0; j < eco.length(); j++){ if (eco.charAt(j) != s.charAt(k+j)){ match = false; } } if (match){ list.add(s); } } }
Concepts from EcoRI • What is boolean match variable used for? • flag or state variable, value tells us something • Once match is false, should we keep searching? • Correctness and efficiency • Why does the code fail? • Test case causes failure, how can we fix this? • Can we use another state variable? First-time? • What else can the language provide for us?
Language constructs/idioms • You need to stop a loop early from executing • Found what we’re looking for, calculation done • return early from method, break from loop • You need to search in one string for another • Why are two loops needed? • Stopping point for one loop? The other? • Common idiom, language provides solution!
Two versions of indexOf • One parameter: s.indexOf(“GAATTC”) • Returns first location at which GAATTC found • Returns -1 if not found, why is this ok? • Two parameter: s.indexOf(“GAATTC”,16) • First location on/after index 16 • Returns -1 if not found, why is this ok? • Loop to find all occurrences, what’s first value of position/index second parameter? • When do we stop loop?
Eric Lander • Leader of HGP • Westinghouse winner at 17 • MacArthur Fellow • NAS member • City of Medicine Award! • Math major at Princeton • PhD Math as Rhodes Scholar • Managerial Economics Prof at HBS 1981-1990 • Erdos number 2, Bacon number?