DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different bases (nucleic acid bases/nucleotides) appear in DNA – adenine (A), guanine (G), cytosine (C), thymine (T) Rule for basepairs (bp): A T, T A , G C, C G (four bp configuration). Each comprises a single piece of information in the DNA molecule (for the creation of amino acid) DNA alphabet
DNA – double helix The DNA molecule can be reconstructed from just one of the 2 strands.
CODONS • The basic unit of the genetic code is the DNA bp. The human gene can range in size from thousands to hundreds of thousands of bps. • Human DNA comprises of approximately 3 billion bps (Human Genome Project – effort to decode all of the 3 billion nucleotide base pairs) • Three DNA bps combine to form a codon which codes for the production of an amino acid (low-level instruction), for example, AGA represents A T, G C, A T. • Sequences of codons code for the assembly of amino acids into RNA, polypeptides, proteins, or functional RNA. • The products so formed mediate the growth and development of the organism.
DNA SEQUENCE • A DNA sequence is a succession of letters representing the structure of a DNA molecule or strand. The possible letters are A, C, G, and T, representing the four nucleotide subunits of a DNA strand (adenine, cytosine, guanine, thymine), and typically these are printed abutting one another without gaps, as in the sequence AAAGTCTGAC. This coded sequence is sometimes referred to as genetic information. A succession of any number of nucleotides greater than four is liable to be called a sequence. • In genetics terminology, DNA sequencing is the process of determining the nucleotide order of a given DNA fragment. • The sequence of DNA encodes the necessary information for living things to survive and reproduce. Determining the sequence is therefore useful in 'pure' research into why and how organisms live, as well as in applied subjects.
String Searching Algorithms • A string of nucleotides is called DNA or RNA. • String searching algorithms try to find a place where one or several strings are found within a larger string. • Naïve string search: The simplest and least efficient way to see where one string occurs inside another is to check each place it could be, one by one, to see if it's there. So, first we see if there's a copy of the substring in the first few characters of the text; if not, we look to see if there's a copy starting at the second character of the text; if not, we look starting at the third character, and so forth.
DNA Sequence alignment • Sequence alignment is an arrangement of two or more sequences, highlighting their similarity. The sequences are padded with gaps (usually denoted by dashes) so that wherever possible, columns contain identical or similar characters from the sequences involved: Example: tcctctgcctctgccatcat- - -caaccccaaagt | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcctgtgcatctgcaatcatgggcaaccccaaagt • It is usually used to study the evolution of the DNA sequences from a common ancestor. Mismatches in the alignment correspond to mutations, and gaps correspond to insertions or deletions. • The term sequence alignment may also refer to the process of constructing such alignment or finding significant alignments in a database of potentially unrelated sequences.
BIOINFORMATICS • Bioinformatics was born of the need for high-powered computing ability to help organize, analyze, and store biological information; primarily DNA and protein sequence data. • Gene sequence databases in the United States is called GenBank administeredby National Center for Biotechnology Information. • Besides storing biological information, the database can be used to help analyze genes, their functions, and evolution. • A DNA that has been cloned and sequenced is entered in a search computer program called BLAST to determine if 1) it has already been cloned; 2) it is related to an already known gene (if it is a new gene sequence, its relatedness to other known sequences might help determine its biological function) • The BLAST program lines up the query sequence with each sequence in the database in an alignment and shows similar nucleotides by connecting them with a line. This gives an estimate of gene relatedness.