1 / 7

So Much Data

So Much Data. www.sims.berkeley.edu/research/projects/how-much-info 1-2 exabytes per year; 250MB/yr per person on earth

malo
Download Presentation

So Much Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. So Much Data www.sims.berkeley.edu/research/projects/how-much-info 1-2 exabytes per year; 250MB/yr per person on earth (phrased as “everyone on earth writes something the size of Moby Dick 250 times a year” it makes no sense; phrased as “everyone on earth makes 15 minutes of video each year” it doesn’t sound so bad)

  2. What kind of media? Paper: 23-240 TB/yr; mostly office documents Film: 58-427 TB/yr, mostly home snapshots Optical: 31-83 TB/yr, mostly music CDs Magnetic: 577-1693 TB/yr, mostly for computers (300TB of camcorder tape) Disk drives – 2500 petabytes per year, 55% for desktop (in 2000 they said disk was $10/GB and would reach $1 in 2005 – I saw 76 cents/GB last week)

  3. Disk prices

  4. How much online? About 100M books have been published; perhaps 200K have been digitized, half available free and half for pay. (Half in French, by the way). Very little music or video is online legally. The Web is about 10-20 TB of text; images 5X that; “deep web” or “dark matter” may be 100X as much.

  5. Strategies for finding things Search engines: Back of book indexes, now Google Human guidance: Once citations, now hyperlinks Knowledge structures: Encyclopedias; thesauri; someday we might see PRECIS, CYC, or Semantic Web actually work Ranking as a way of combining 1 and 2 seems useful. As for the Semantic Web, Dave Parnas once wrote that “a data base is something that works, a knowledge base is something that doesn’t work”

  6. What have you looked for? Tell us something you searched for that you couldn’t find. Was the problem that it (probably) (a) isn’t known, or (b) isn’t digitized and online, or (c) is restricted by legal or business rules, or (d) you couldn’t find it?

  7. How should things be found? For something that you wanted to find, and believe was probably known, and probably available, how would you have liked to phrase the query? What prompted your interest? How can you formalize that interest? What kind of data description would you need?

More Related