1 / 61

So Much Data

So Much Data. So Little Time. Bernard Chazelle Princeton University. So Many Slides. (before lunch). So Little Time. Bernard Chazelle Princeton University. math. algorithms. experimentation. 2006. computation.

ray-weaver
Download Presentation

So Much Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. So Much Data So Little Time Bernard Chazelle Princeton University

  2. So Many Slides (before lunch) So Little Time Bernard Chazelle Princeton University

  3. math algorithms experimentation 2006 computation

  4. Computers have two problems

  5. 1. They don’t have steering wheels

  6. 2. End of Moore’s Law 2020 party’s over !

  7. algorithms experimentation computation

  8. This is not me 32 x 17 224 32 = 544

  9. FFT RSA

  10. The Era of the Algorithm

  11. Data

  12. unevenly priced noisy Data big uncertain low entropy

  13. unevenly priced noisy Data big uncertain low entropy

  14. Sloan Digital Sky Survey 4 petabytes (~1MG) 10 petabytes/yr Biomedical imaging 150 petabytes/yr

  15. My A(9,9)-th paper Collected works of Micha Sharir

  16. massive input output Sample tiny fraction Sublinear Algorithms

  17. Shortest Paths [C-Liu-Magen ’03] New York Delphi

  18. Ray Shooting Optimal! Volume Intersection Point location

  19. Approximate MST [C-Rubinfeld-Trevisan ’01] Optimal!

  20. Reduces to counting connected components

  21. E = no. connected components 2 var << (no. connected components) is a good estimator whp, of # connected components

  22. input space worst case average case (uniform)

  23. worst case

  24. average case = actuarial view

  25. “ OK, if you elect NOT to have the surgery, the insurance company offers 6 days and 7 nights in Barbados. “

  26. arbitrary, unknown random source Self-Improving Algorithms

  27. Yes ! This could be YOU, too !

  28. 0110101100101000101001001010100010101001 time T1 time T2 time T3 time T4 E Tk  Optimal expected time for random source

  29. Clustering [ Ailon-C-Liu-Comandur ’05 ] K-median over Hamming cube

  30. minimize sum of distances

  31. minimize sum of distances NP-hard

  32. [ Kumar-Sabharwal-Sen ’04 ] ) ( 1 + COST OPT

  33. How to achieve linear limiting time? dn Input space {0,1} Identify core Use KSS prob < O(dn)/KSS Tail:

  34. Store sample of precomputed KSS Nearest neighbor Incremental algorithm

  35. Main difficulty: How to spot the tail?

  36. 011010110110101010110010101010110100111001101010010100010 011010110***110101010110010101010***10011100**10010***010 Bring in da noise !

  37. 011010110110101010110010101010110100111001101010010100010 011010110***110101010110010101010***10011100**10010***010 encode

  38. 011010110110101010110010101010110100111001101010010100010 011010110***110101010110010101010***10011100**10010***010 decode

  39. 011010110110101010110010101010110100111001101010010100010 error correcting codes

  40. 011010110110101010110010101010110100111001101010010100010 Data inaccessible before noise What makes you think it’s wrong?

  41. 011010110110101010110010101010110100111001101010010100010 Data inaccessible before noise must satisfy some property (eg, convex, bipartite) but does not quite

  42. f(x) = ? f =access function x data f(x)

  43. f(x) = ? f =access function x f(x)

  44. f(x) = ? x f(x) But life being what it is…

  45. f(x) = ? x f(x)

  46. Humans Define distance from any object to data class

  47. no undo f(x) = ? filter x x1, x2,… g(x) f(x1), f(x2),… g is access function for:

  48. Online Data Reconstruction early decisions are crucial !

  49. d Monotone function: [n]  R Filter requires polylog (n) lookups [ Ailon-C-Liu-Comandur ’04 ]

More Related