200 likes | 547 Views
HI5100 Data Structures for BioInformatics Lesson 7. A Program to Transcribe a DNA Sequence into RNA. OBJECTIVES. In this lesson you will: Develop an algorithm. Translate the algorithm into a Pytyon program. DNA Sequence Representation. You can represent a DNA sequence as a string
E N D
HI5100 Data Structures for BioInformatics Lesson 7 A Program to Transcribe a DNA Sequence into RNA
OBJECTIVES • In this lesson you will: • Develop an algorithm. • Translate the algorithm into a Pytyon program.
DNA Sequence Representation • You can represent a DNA sequence as a string • DNA is composed of the four nucleic acids (nucleotides, bases) • Adenine: A • Cytosine: C • Guanine: G • Thymine: T • The single letters are standard IUB / IUPAC nucleic acid codes • IUB (International Union of Biochemistry) • IUPAC (International Union of Pure and Applied Chemistry http://www.iupac.org/index_to.html)
DNA Sequence ACCGATACGCCACTTAACAG
DNA to RNA • Very complex cellular mechanism • Fairly simple from a programming standpoint: • Change all the T’s in the sequence to U’s
Understand the Problem: Take 1 • From a computer processing perspective • The computer must: • Examine each letter in the sequence • Determine if it is a T • If it is, replace it with a U
Example ACCGATACGCCACTTAACAG First T is in position 5
Example • Can you replace the letter in position 5 with a different letter? ACCGATACGCCACTTAACAG U First T was in position 5
Try It # DNA_to_RNA.py def main(): DNA1 = “ACCGATACGCCACTTAACAG” print DNA1 DNA[5] = “U” print DNA1
Results Translation: You cannot assign a value to one item in a string sequence.
Understand the Problem: Take 2 • The computer must: • Examine each letter in the sequence • Determine if it is a T • If it is, create a new string with a copy of everything up to the T • Concatentate a “U” • Continue looking through the original string for more Ts
Example ACCGATACGCCACTTAACAG ACCGA U • Continue looking in original string +
Build the Algorithm in Pseudocode Set start_pos to 0 Find the position of the next T Assign it to t_pos Copy characters from start_pos to t_pos-1 into a new string Concatenate a “U” Set start_pos to t_pos+1 Continue looking at characters in original string Repeat from second line • Think of the pseudocode as a rough draft of the final algorithm
Or as a Flowchart – p. 2 • Think of the flowchart as a rough draft of the algorithm at this point
Pseudocode / Flowchart • Once you have the algorithm developed to a point where you can write some code, proceed with • Write code • Test • Write code • Test
Assignment 2 • Use the psuedocode and flowchart drafts of the DNA to RNA algorithm to build a small program that: • Reads DNA sequences from a text file • Converts each sequence into RNA • Saves the RNA sequence to a text file • Prints the DNA sequence on one line • Directly underneath on the next line prints the RNA • Continues reading, converting, writing to file, and printing until there is no more data in the text file
Assignment 2 • A text file with several test strings is provided as DNAtest.csv • Add data to DNAtest.csv so that all important special test cases are demonstrated, for example: • A sequence that starts with “T” • A sequence that ends with “T” • A sequence with no Ts • A sequence that is all Ts
Summary • That covers computing with Python in a nutshell • Now we are ready to tackle some data structures!
HI5100 Data Structures for BioInformatics Lesson 7 End of Slides for Lesson 7