1 / 65

Forensics and CS

Forensics and CS. Philip Chan. CSI: Crime Scene Investigation. www.cbs.com/shows/ csi / high tech forensics tools DNA profiling Use as evidence in court cases. DNA. Deoxyribonucleic Acid Each person is unique in DNA (except for twins) DNA samples can be collected at crime scenes

laddie
Download Presentation

Forensics and CS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Forensics and CS Philip Chan

  2. CSI: Crime Scene Investigation • www.cbs.com/shows/csi/ • high tech forensics tools • DNA profiling • Use as evidence in court cases

  3. DNA • Deoxyribonucleic Acid • Each person is unique in DNA (except for twins) • DNA samples can be collected at crime scenes • About .1% of human DNA varies from person to person

  4. Forensics Analysis • Focus on loci (locations) of the DNA • Values at the those loci (DNA profile) are recorded for comparing DNA samples. • Two DNA profiles from the same person have matching values at all loci. • More or fewer loci are more accurate in identification? • Tradeoffs? • FBI uses 13 core loci • http://www.cstl.nist.gov/biotech/strbase/fbicore.htm

  5. We do not want to wrongly accuse someone • How can we find out how likely another person has the same DNA profile? • How many people are in the world? • How low the probability needs to be so that a DNA profile is unique in the world? • Low probability doesn’t mean impossible • Just very unlikely

  6. Review of basic probability • Joint probability of two independent events • P(A,B) = ?

  7. Review of basic probability • Joint probability of two independent events • P(A,B) = P(A) * P(B) • Independent events mean knowing one event does not provide information about the other events • P(Die1=1, Die2=1) • = P(Die1=1) * P(Die2=1) • = 1/6 * 1/6 = 1/36.

  8. Enumerating the events 36 events, each is equally likely, so 1/36

  9. Joint probability • P(Die1=even, Die2=6) = ?

  10. Joint probability • P(Die1=even, Die2=6) • = 1/2 * 1/6 = 1/12 • P(Die1=1, Die2=5, Die3=4) = ?

  11. Joint probability • P(Die1=even, Die2=6) • = 1/2 * 1/6 = 1/12 • P(Die1=1, Die2=5, Die3=4) • = (1/6)3 = 1/216

  12. DNA profile probability • How to estimate?

  13. DNA profile probability • How to estimate? • Assuming loci are independent • P(Locus1=value1, Locus2=value2, ...) • = P(Locus1=value1) * P(Locus2=value2) * ...

  14. DNA profile probability • How to estimate? • Assuming loci are independent • P(Locus1=value1, Locus2=value2, ...) • = P(Locus1=value1) * P(Locus2=value2) * ... • How to estimate P(Locus1=value1)?

  15. DNA profile probability • How to estimate? • Assuming loci are independent • P(Locus1=value1, Locus2=value2, ...) • = P(Locus1=value1) * P(Locus2=value2) * ... • How to estimate P(Locus1=value1)? • a random sample of size N from the population and • find out how many people out of N have value1 at Locus1

  16. Database of DNA profiles

  17. Problem Formulation • Given • A sample profile (e.g. collected from the crime scene) • A database of known profiles • Find • The probability of the sample profile if it matches a known profile in the database

  18. Breaking Down the Problem • Find • The probability of the sample profile if it matches a known profile in the database • What are the subproblems?

  19. Breaking Down the Problem • Find • The probability of the sample profile if it matches a known profile in the database • What are the subproblems? • Subproblem 1 • Find whether the sample profile matches • 1a: ? • 1b: ? • Subproblem 2 • Calculate the probability of the profile

  20. Breaking Down the Problem • Find • The probability of the sample profile if it matches a known profile in the database • What are the subproblems? • Subproblem 1 • Find whether the sample profile matches • 1a: check entries in the database • 1b: check if all 13 loci match in each entry • Subproblem 2 • Calculate the probability of the profile

  21. Simpler Problem for 1a (very common) • Given • an array of integers (e.g. student IDs) • an integer (e.g. an ID) • Find • whether the integer is in the array int[] directory; // student id’s int id; // to be found boolean found; // true if id is in directory

  22. Linear/Sequential Search • Check one by one • Stop if you find it • Stop if you run out of items to check • Not found

  23. Number of Checks (speed of algorithm) • Consider N items in the array • Best-case scenario • When does it occur? How many checks?

  24. Number of Checks (speed of algorithm) • Consider N items in the array • Best-case scenario • When does it occur? How many checks? • First item;1 check • Worst-case scenario • When does it occur? How many checks?

  25. Number of Checks (speed of algorithm) • Consider N items in the array • Best-case scenario • When does it occur? How many checks? • First item;1 check • Worst-case scenario • When does it occur? How many checks? • Last item or not there; N checks • Average-case scenario • Average of all cases • (1 + 2 + … + N) / N =

  26. Can we do better? Faster algorithm? • What if the array is sorted, items are in an order • E.g. a phone book

  27. Binary Search • Check the item at midpoint • If found, done • Otherwise, eliminate half and repeat

  28. Breaking down the problem • While more items and not found • What are the two subproblems?

  29. Breaking down the problem • While more items and not found • Eliminate half of the items • Find the mid point

  30. Number of checks (Speed of algorithm) • Best-case scenario • When does it occur? How many checks?

  31. Number of checks (Speed of algorithm) • Best-case scenario • When does it occur? How many checks? • In the middle; 1 check

  32. Number of checks (Speed of algorithm) • Best-case scenario • When does it occur? How many checks? • In the middle; 1 check • Worst-case scenario • When does it occur? How many checks?

  33. Number of checks (Speed of algorithm) • Best-case scenario • When does it occur? How many checks? • In the middle; 1 check • Worst-case scenario • When does it occur? How many checks? • Dividing into two halves, half has only one item • ? checks

  34. Number of checks (Speed of algorithm) • Best-case scenario • When does it occur? How many checks? • In the middle; 1 check • Worst-case scenario • When does it occur? How many checks? • Dividing into two halves, half has only one item • ? checks

  35. Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1

  36. Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1

  37. Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1

  38. Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1

  39. Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1

  40. Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … any pattern?

  41. Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … = T(N/2k) + k

  42. Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … = T(N/2k) + k N/2k gets smaller and eventually becomes 1

  43. Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … = T(N/2k) + k • N/2k gets smaller and eventually becomes 1 • solve for k

  44. Number of Checks (Speed of Algorithm) • N/2k = 1 N = 2k k = ?

  45. Number of Checks (Speed of Algorithm) • N/2k = 1 N = 2k k = log2N

  46. Number of Checks (Speed of Algorithm) • N/2k = 1 N = 2k k = log2N • T(N) = T(N/2k) + k = T(1) + log2N = ? + log2N

  47. Number of Checks (Speed of Algorithm) • N/2k = 1 N = 2k k = log2N • T(N) = T(N/2k) + k = T(1) + log2N = 1 + log2N

  48. Sorting (arranging the items in adesired order) • How is the phone book arranged? • Why? • Why not arranged by numbers?

  49. Sorting (arranging the items in adesired order) • How is the phone book arranged? • Why? • Why not arranged by numbers? • Order • Alphabetical • Low to high numbers • DNA profile with 13 loci?

  50. Sorting • Imagine you have a thousand numbers in an array • How would you systemically sort them?

More Related