1 / 27

Data Structures and Algorithms

Data Structures and Algorithms. Outline. What is a data structure Examples elementary data structures hash tables Computer capabilities What is an algorithm Pseudocode/examples naïve alignment (and debugging). Data Structures.

ophrah
Download Presentation

Data Structures and Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Structures and Algorithms

  2. Outline • What is a data structure • Examples • elementary data structures • hash tables • Computer capabilities • What is an algorithm • Pseudocode/examples • naïve alignment (and debugging)

  3. Data Structures • Informal definition: an organization of information, usually in computer memory, to improve or simplify algorithm performance. Associated data structure algorithms typically exist to maintain the properties of data structures (search, insert, delete, push, pop, etc.)

  4. Data Structures • Elementary data structures • arrays • linear replication of a data type • useful for holding related items of identical type • multi-dimensional • conceptually, naturally maps to computer memory • Abstractions -- stacks and queues

  5. Arrays – allocation of space An array of chars (bytes): 0 1 2 3 4 5 6 7 8 9 A A A T G C T G A T 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 2 2 2 2 An array of integers: 0 1 2 3 4 5 6 7 8 9 100 200 250 269 300 11 12 13 1 15

  6. Languages In high level language such as C, data types are declared: int a, b, c; c = a+b; Perl: $c=$a+$b; Note that Perl does not require the specification of data type (however, as we will see later, this is useful for rapid prototyping, but can also be conducive to programming mistakes)

  7. Examples of C/Perl arrays C: char Dna[6]; char tissue[4][6]; Dna[0] = 'A'; Dna[1] = 'A'; strcpy(tissue[0],"liver"); strcpy(tissue[1],"kidney"); Perl: @Dna = (A, A, A, T, C, G); @tissue = (“liver”, “kidney”, “heart”, “brain”); $tissue[0]=“liver”; $tissue[1]=“kidney”;

  8. Java char myArray[]; // Note how the type declaration is de-coupled from the memory allocation myArray = new char[10]; myArray[0]='A';

  9. Stack Example • push 15 • push 6 • push 9 • push 2 • pop returns 2 • LIFO

  10. Queue Example • Enqueue(15) • Enqueue(6) • Enqueue(9) • Enqueue(2) • DeQueue returns 15 • FIFO

  11. Linked List • A linked list is a data structure in which the objects are arranged in a linear order, however, the order is encoded within the data structure itself by a “pointer” (as opposed to array indices). • “dynamic” • “sparse”

  12. Linked List

  13. Hash Table or Associative Array • A hash table is similar to an array, in that it is a linear collection of data types, with individual elements selected by some index value (key). Unlike arrays, the index values (keys) are arbitrary. • “hash function” maps keys to elements • do not have to search for values, but there is overhead of “hash function” • O(1) to examine an arbitrary position

  14. Array VS Hash Keys Values

  15. Hash Table Example %aminos = ( "TTT", "F", # Key Value pairs "TTC", "F", "TTA", "L", "TTG", "L", "CTT", "L", "CTC", "L", "CTA", "L", "CTG", "L", "ATT", "I", "ATC", "I", "ATA", "I", "ATG", "M", "GTT", "V", "GTC", "V", "GTA", "V", "GTG", "V", "TCT", "S", "TCC", "S", "TCA", "S", "TCG", "S“)

  16. Complicated extensions drug_target study_id clone date gene_identity id_technique cell_source pathology special_conditions regulation confirmation_diff_expression ocular_expr_profile cytogenetics genotyping_status priority reference Objects or Records

  17. Data Structures and Abstraction Applications Objects Communication and Data Sharing API Objects Data

  18. What computers/software can and cannot do • Can • simple (a=a+1) • fast (1 instruction in 1*10-9 s) • repetitive • Cannot • associate (a cloud looks like Mickey Mouse) • vision • however, we can define sets of rules that can stratify (becomes very complicated and difficult) • algorithms (computers) are black and white, and the world is gray

  19. Algorithms • Informally, an algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output. • finite set of steps, one or more operations per step • An algorithm is correct if, for every input instance, it halts with the correct output. • Example: sorting • Input: A sequence of n numbers (a1, a2, ….,an). • Output: A permutation (reordering) (a1’, a2’, …,an’) of the input sequence such that a1’<=a2’<=…<=an’.

  20. Algorithms • How to validate? • Mathematically prove (usually impractical) • Case base proving/testing • How to devise? • mimic a human procedure • follow a template • create • How to analyze? • complexity analysis • profiling

  21. Pseudocode • An abstract, informal representation of algorithms as computational operations that is similar to C, Pascal, Perl (or other programming languages). • Examples: • naïve sequence search/alignment • insertion sort (sort a hand of cards)

  22. Naïve Alignment • ATC • AAATCG NO • ATC • AAATCG NO • ATC • AAATCG YES

  23. Algorithms-- naïve alignment -- first try • Example – naïve sequence search and alignment • align some small number (10 nucleotides) -- called the "query" to some large number (3 billion nts) -- called the "subject" • 10 s with BLAT (uses significantly more efficient algorithm) snt[] = array of subject nucleotides qnt[] = array of query nucleotides for i = 0 to length(query) #i will be index for query sequence j=0 while (snt[i + j ] == qnt[j]) # but here, j is index for query sequence??? j=j+1 if (j == length (query)) found sequence at position i end query = ATC subject = AAATCG

  24. Algorithms- Refinement snt[] = array of subject nucleotides qnt[] = array of query nucleotides for i = 0 to length(subject) – length(query) j=0 while (snt[i + j ] == qnt[j]) j=j+1 if (j == length (query)) found sequence at position i end query = ATC subject = AAATCG Modern machine could do this, but what if query, subject are 100 nucleotides, and 30 billion? This can be done, but it will not scale to 100 seconds, because you can no longer hold 30 billion nucleotides in memory. You will have to swap portions of the 30 billion back to disk, and read in a new portion This overhead will adversely affect the performance of the algorithm

  25. Naïve Alignment j=1 j=0 • ATC • AAATCG NO • ATC • AAATCG NO • ATC • AAATCG YES • snt[] = array of subject nucleotides • qnt[] = array of query nucleotides • for i = 0 to length(subject) – length(query) • j=0 • while (snt[i + j ] == qnt[j]) • j=j+1 • if (j == length (query)) • found sequence at position i • end i=0 j=0 i=1

  26. End

More Related