A Program to Transcribe a DNA Sequence into RNA

HI5100 Data Structures for BioInformatics Lesson 7 A Program to Transcribe a DNA Sequence into RNA

OBJECTIVES • In this lesson you will: • Develop an algorithm. • Translate the algorithm into a Pytyon program.

DNA Sequence Representation • You can represent a DNA sequence as a string • DNA is composed of the four nucleic acids (nucleotides, bases) • Adenine: A • Cytosine: C • Guanine: G • Thymine: T • The single letters are standard IUB / IUPAC nucleic acid codes • IUB (International Union of Biochemistry) • IUPAC (International Union of Pure and Applied Chemistry http://www.iupac.org/index_to.html)

DNA Sequence ACCGATACGCCACTTAACAG

DNA to RNA • Very complex cellular mechanism • Fairly simple from a programming standpoint: • Change all the T’s in the sequence to U’s

Understand the Problem: Take 1 • From a computer processing perspective • The computer must: • Examine each letter in the sequence • Determine if it is a T • If it is, replace it with a U

Example ACCGATACGCCACTTAACAG First T is in position 5

Example • Can you replace the letter in position 5 with a different letter? ACCGATACGCCACTTAACAG U First T was in position 5

Try It # DNA_to_RNA.py def main(): DNA1 = “ACCGATACGCCACTTAACAG” print DNA1 DNA[5] = “U” print DNA1

Results Translation: You cannot assign a value to one item in a string sequence.

Understand the Problem: Take 2 • The computer must: • Examine each letter in the sequence • Determine if it is a T • If it is, create a new string with a copy of everything up to the T • Concatentate a “U” • Continue looking through the original string for more Ts

Example ACCGATACGCCACTTAACAG ACCGA U • Continue looking in original string +

Build the Algorithm in Pseudocode Set start_pos to 0 Find the position of the next T Assign it to t_pos Copy characters from start_pos to t_pos-1 into a new string Concatenate a “U” Set start_pos to t_pos+1 Continue looking at characters in original string Repeat from second line • Think of the pseudocode as a rough draft of the final algorithm

Or as a Flowchart – P. 1

Or as a Flowchart – p. 2 • Think of the flowchart as a rough draft of the algorithm at this point

Pseudocode / Flowchart • Once you have the algorithm developed to a point where you can write some code, proceed with • Write code • Test • Write code • Test

Assignment 2 • Use the psuedocode and flowchart drafts of the DNA to RNA algorithm to build a small program that: • Reads DNA sequences from a text file • Converts each sequence into RNA • Saves the RNA sequence to a text file • Prints the DNA sequence on one line • Directly underneath on the next line prints the RNA • Continues reading, converting, writing to file, and printing until there is no more data in the text file

Assignment 2 • A text file with several test strings is provided as DNAtest.csv • Add data to DNAtest.csv so that all important special test cases are demonstrated, for example: • A sequence that starts with “T” • A sequence that ends with “T” • A sequence with no Ts • A sequence that is all Ts

Summary • That covers computing with Python in a nutshell • Now we are ready to tackle some data structures!

HI5100 Data Structures for BioInformatics Lesson 7 End of Slides for Lesson 7

A Program to Transcribe a DNA Sequence into RNA