1 / 25

Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005

Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005. Rehash of Exam 1 (selected). Rehash of Exam 2 (selected). Discussion of DGPB, Chapter 6. Exam 1, Problem 6. 6c. Find first nucleotides of genes that don’t encode protein. (LOAD-SHARED-FILE noncoding-genes-of)

gur
Download Presentation

Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome toIntroduction to BioinformaticsWednesday, 13 April 2005 Rehash of Exam 1 (selected) Rehash of Exam 2 (selected) Discussion of DGPB, Chapter 6

  2. Exam 1, Problem 6 6c. Find first nucleotides of genes that don’t encode protein. (LOAD-SHARED-FILE noncoding-genes-of) (DEFINE "nc-genes" (NONCODING GENES OF S7942) (FOR EACH (gene IN nc-genes)) WITH beginning = SEQUENCE OF gene 1 – 3 DO COLLECT beginning (DEFINEvariableASvalue)

  3. Exam 1, Problem 6 6c. Find first nucleotides of genes that don’t encode protein. (LOAD-SHARED-FILE "noncoding-genes-of") (DEFINE nc-genes AS (NONCODING-GENES-OF S7942) (FOR-EACH gene IN nc-genes AS beginning = (SEQUENCE-OF gene FROM 1 TO 3) COLLECT beginning) (DEFINEvariableASvalue)

  4. Exam 1, Problem 6 6c. Find first nucleotides of genes that don’t encode protein. (LOAD-SHARED-FILE "noncoding-genes-of") (DEFINE nc-genes AS (NONCODING-GENES-OF S7942) (FOR-EACH gene IN nc-genes AS beginning = (SEQUENCE-OF gene FROM 1 TO 3) COLLECT beginning) :: ("GCG" "GCG" "GGA" "GCC" "GCC" "GGA" "GCG" "GGG" "GCC" "GGG" "GCG" "GCC" "GCG" "GGA" "GGG" "GCG" "GGG" "TCC" "GGT" "GGG" "GGG" "AAA" "GGA" "CCA" "TCC" "GGC" "GGC" "CGC" "CGG" "GGG" "GGG" "GCG" "AAA" "GGG" "GGG" "GGT" "TCC" "GGC" "TGG" "GGG" "GCG" "GGG" "GCC" "GGG" "GCC" "GGG" "CGG" "CGG" "GGG" "GCG" "GGG" "GGG")

  5. Exam 1, Problem 8 8. Thermophilus extremus G+C% content = 80%. 8a. Frequency of cutting of MseI (TTAA) G + C = 0.8A + T = 0.2 A = 0.1 T = 0.1 Expected frequency of TTAA = 0.1 * 0.1 * 0.1 * 0.1 = 10-4

  6. Exam 1, Problem 8 8. Thermophilus extremus G+C% content = 80%. 8d. Test answer with 1000 random DNA sequences 1000-nucleotides in length (G+C% = 80%) (FOR-EACH iteration FROM 1 TO 1000 AS seq = (RANDOM-DNA A 1 T 1 C 4 G 4 LENGTH 1000) AS counts = (COUNT-OF "TTAA" IN seq) SUM counts) :: 103 ??? hits per trial?

  7. Exam 1, Problem 8 8. Thermophilus extremus G+C% content = 80%. 8d. Test answer with 1000 random DNA sequences 1000-nucleotides in length (G+C% = 80%) (FOR-EACH iteration FROM 1 TO 1000 AS seq = (RANDOM-DNA A 1 T 1 C 4 G 4 LENGTH 1000) AS counts = (COUNT-OF "TTAA" IN seq) SUM counts) :: 103 0.103 hits per trial?

  8. Exam 1, Problem 8 8. Thermophilus extremus G+C% content = 80%. 8e. Test answer with 1000 random DNA sequences 3000-nucleotides in length (G+C% = 80%) (FOR-EACH iteration FROM 1 TO 1000 AS seq = (RANDOM-DNA A 1 T 1 C 4 G 4 LENGTH 3000) AS counts = (COUNT-OF "TTAA" IN seq) SUM counts) :: 314 ??? hits per trial?

  9. Exam 1, Problem 8 8. Thermophilus extremus G+C% content = 80%. 8e. Test answer with 1000 random DNA sequences 3000-nucleotides in length (G+C% = 80%) (FOR-EACH iteration FROM 1 TO 1000 AS seq = (RANDOM-DNA A 1 T 1 C 4 G 4 LENGTH 3000) AS counts = (COUNT-OF "TTAA" IN seq) SUM counts) :: 314 0.314 hits per trial?

  10. Exam 1, Problem 8 8. Thermophilus extremus G+C% content = 80%. 8f. Interpret your results in light of the definition of E-value (or Expect value). E-value Your results: Expected frequency = 10-4 1000 1000-nucleotides DNA sequences  0.103 per trial 0.1 1000 3000-nucleotides DNA sequences  0.314 per trial 0.3 E-value = (expected frequency) · (search space)

  11. Exam 1, Problem 10 Examine Fig. 4.11 in the text, focusing on the spot labeled TDH1.

  12. Exam 1, Problem 10 Examine Fig. 4.11 in the text, focusing on the spot labeled TDH1. 10b. From what you can learn of the function of the gene, why might this result make sense?

  13. glucose glycolysis Glyceraldehyde-3-phosphate dehydrogenase gluconeogenesis pyruvate

  14. Exam 2, Problem 4 Consider Chi-Squared. 4a. Define a function that calculates chi-squared scores, given two input arguments… (DEFINE-FUNCTION chi-square (observed expected-freqs) (LET* ((total (SUM-OF observed)) (expected (FOR-EACH freq IN expected-freqs COLLECT (* freq total)))) (FOR-EACH O IN observed FOR E IN expected AS diff = (- O E) AS numerator = (* diff diff) SUM (/ numerator E))))

  15. Exam 2, Problem 4 Consider Chi-Squared. 4b. How do you interpret the 1.44 result from my example? The probability of getting a value of 1.44 is likely to occur in the gene 100-nt population

  16. Exam 2, Problem 4 Consider Chi-Squared. 4b. How do you interpret the 1.44 result from my example? This means that there is a > 5% chance that the population given fits the expected ratios.

  17. Exam 2, Problem 4 Consider Chi-Squared. 4b. How do you interpret the 1.44 result from my example? there is a 5% chance that the A:C:G:T ratio of 28:22:28:22 is due to random chance.

  18. Exam 2, Problem 5 5h. Rerun the program you wrote in 5d but using a single population: random DNA sequences.

  19. UPTAG DOWNTAG AGTCGT…TGTAACG…CGTGC… AGTCGT…CATGGGA…CGTGC… gene DGPB 6.1 Associating proteins with functions

  20. DGPB 6.1 Associating proteins with functions

  21. DGPB 6.1 Associating proteins with functions

  22. DGPB 6.1 Associating proteins with functions

  23. DGPB 6.1 Associating proteins with functions Sampling problem

More Related