1 / 46

What is Information? The Nature, Growth and Characteristics of Information

What is Information? The Nature, Growth and Characteristics of Information. University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Lecture authors: Marti Hearst & Ray Larson. There is no “correct” definition

leticiaa
Download Presentation

What is Information? The Nature, Growth and Characteristics of Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Lecture authors: Marti Hearst & Ray Larson Information Organization and Retrieval

  2. There is no “correct” definition Can involve philosophy, psychology, signal processing, physics Cookie Monster’s definition: “news or facts about something” Oxford English Dictionary information: informing, telling; thing told, knowledge, items of knowledge, news knowledge: knowing familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known What is Information? Information Organization and Retrieval

  3. Assignment 1 Discussion • What is information, according to your background or area of expertise? Information Organization and Retrieval

  4. Types of Information • Differentiation by form. • Differentiation by content. • Differentiation by quality. • Differentiation by associated information. Information Organization and Retrieval

  5. Information Properties • Information can be communicated electronically • Broadcasting • Networking • Information can be easily duplicated and shared • Problems of Ownership • Problems of Control Adapted from ‘Silicon Dreams’ by Robert W. Lucky Information Organization and Retrieval

  6. Information must Be something, although the exact nature (substance, energy, or abstract concept) is not clear; Be “new”: repetition of previously received messages is not informative Be “true”: false or counterfactual information is “mis-information” Be “about” something This human-centered approach emphasizes meaning and use of message Intuitive Notion (Losee 97) Information Organization and Retrieval

  7. Levels in cognitive processing perception observation/attention reasoning, assimilating, forming inferences Knowledge: “justified true belief” Belief: an idea held based on some support; an internally accepted statement, result of inductive processes combining observed facts with a reasoning process Does information require a human mind? Communication and information transfer among ants A tree falls in the forest … is there information there? Existence of quarks Information from the Human Perspective Information Organization and Retrieval

  8. Form of information as the information itself Meaning of a signal vs. the signal itself What aspects of a document are information? Representation (Norman 93) Why do we write things down? Socrates thought writing would obliterate serious thought Sounds and gestures fade away Artifacts help us to reason Anything not present in the representation can be ignored Things left out of the representation are often what we don’t know how to represent Meaning vs. Form Information Organization and Retrieval

  9. Information • Consider Borges infinite Library of Babel… • It has all possible data combinations of letters • Does it therefore contain all possible information? • What about all possible knowledge? • What about Wisdom? • Is the Internet a prototype Library of Babel? Information Organization and Retrieval

  10. Information Hierarchy Wisdom Knowledge Information Data Information Organization and Retrieval

  11. Information Hierarchy • Data • The raw material of information • Information • Data organized and presented by someone • Knowledge • Information read, heard or seen and understood • Wisdom • Distilled and integrated knowledge and understanding Information Organization and Retrieval

  12. Information Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -- T.S. Eliot, “The Rock” Where is the information we have lost in data? Information Organization and Retrieval

  13. Claude Shannon, 1940’s, studying communication Ways to measure information Communication: producing the same message at its destination as that seen at its source Problem: a “noisy channel” can distort the message Between transmitter and receiver, the message must be encoded Semantic aspects are irrelevant Noise Trans-mitter Message source Desti-nation Receiver Channel Information Theory Information Organization and Retrieval

  14. Message Message Source Encoding Decoding Destination Channel Message Message Source Encoding (writing/indexing) Storage Decoding (Retrieval/Reading) Destination Information Theory • Better called “Communication Theory” • Communication may be over time and space Noise Information Organization and Retrieval

  15. Text books, periodicals, WWW, memos, ads published/refeered Film Photos, other Images Broadcast TV, Radio Telephone Conversations Databases What kinds of information are there? Information Organization and Retrieval

  16. How much information is there?(Estimates courtesy Hal Varian and Peter Lyman: http://www.sims.berkeley.edu/emc) Information Organization and Retrieval

  17. Stored Information Print Film Optical Magnetic Communicated Internet Broadcast Phone Mail How Much Information? Information Organization and Retrieval

  18. Print • Annual Production • Books 968,735 = 8 Terabytes (compressed image) • Newspapers 22643 = 25 Terabytes • Journals 40000 = 2 Terabytes • Magazines 80000 = 10 Terabytes • Office Documents 7.5x10^9 pages = 195 Terabytes • I.e. 7,500,000,000 • TOTAL 240 Terabytes (1200 scanned, 24 text) Information Organization and Retrieval

  19. Print • Library of Congress Printed book collection • About 18 Million books • About 130 Terabytes (compressed image) • For all of LC we should also assume • 13M photographs, 5MB each = 65 TB • 4M maps, say 200 TB • 500K files, 1GB each = 500 TB • 3.5M sound recordings, ~2000 TB • Grand total: 3 petabytes (~3000 terabytes) • Books in Print • 3.2 Million titles • About 26 Terabytes Information Organization and Retrieval

  20. Film and Image • Film • Photographs = 410 Petabytes per year • Movies = 16 Terabytes (Commercial Production of about 4000 films) • X-Rays = 17.2 Petabytes Information Organization and Retrieval

  21. Optical Media • CD-Music 90,000 items = 58 TB • CD-ROM 1,000 items = 3 TB • DVD-Video 5,000 items = 22 TB • Total 83 TB • Total compressed 29 TB Information Organization and Retrieval

  22. Magnetic Media • Audio Tape 184,200,000 = 184.2 Petabytes • Video Tape 355,000,000 = 1420 • Floppy disks = 0.07 • Removable disks = 1.69 • Hard Disks = 500 Information Organization and Retrieval

  23. Totals Stored Per Year Medium Type of content Terabytes/Year Terabytes/Year Upper Bound Lower Bound Paper Books 8 1 Newspapers 25 2 Periodicals 12 1 Office documents 195 19 SUBTOTAL 240 23 Film Photographs 410,000 41,000 Cinema 16 16 X-Rays 17,200 17,200 SUBTOTAL 427,216 58,216 Optical Music CDs 58 6 Data CDs 3 3 DVDs 22 22 SUBTOTAL 83 31 Magnetic Camcorder 300,000 300,000 Disk drives 1,393,000 277,210 SUBTOTAL 1,693,000 577,210 TOTAL 2,120,539 635,480 Information Organization and Retrieval

  24. Dec 1997 = 3000Tb Dec 1996 = 1500Tb Internet Traffic -- Historical Tb Nov ‘92 Apr ‘95 Information Organization and Retrieval

  25. Internet Traffic Percentage Nov ‘92 Apr ‘95 Information Organization and Retrieval

  26. Currently... • There are an estimated 2.5 Billion pages on the Web • About 25-50 Terabytes (surface web) • About 7500 further Terabytes in web-accessed DBs. • 610 Billion email messages per year = 11285 TB • Internet Traffic is doubling every 100 days - An estimated 62 Million Americans now use the internet (US Commerce Dept 1998) • Radio took 38 years to get 50 M listeners, TV took 13 years, the Net took 4 years... Information Organization and Retrieval

  27. Internet - Recent Statistics 5 M Level 2 Domains (NW June 1999) 43.2 Million Hosts (NW January 1999) 206/246 IP countries (NW July 1998) 300 Million Users (Newsbytes, Mar 2000) (830 Million Telephone Terminations) Source: Vint Cerf Information Organization and Retrieval

  28. Internet Hosts (000s) 1989-2006 Source: Vint Cerf Information Organization and Retrieval

  29. Projected Voice and Data Traffic Gb/s Source: America's Network, May 15, 1998 Information Organization and Retrieval

  30. Users on the Internet - May 1999 • CAN/US - 90.65M • Europe - 40.09M • Asia/Pac - 26.97M • Latin Am - 5.29M • Africa - 1.14M • Mid-east - 0.88 M --------------------------- • Total - 165M Source: Vint Cerf Information Organization and Retrieval

  31. Language Distribution of Web Content Source: Jack Xu: Excite Information Organization and Retrieval

  32. Language Distribution on a 634 Million Web Pages Corpus Information Organization and Retrieval

  33. Sources on Information, Computer, and Network Use • http://www.sims.berkeley.edu/emc/ • http://www.cs.cmu.edu/afs/cs.cmu.edu/user/bam/www/numbers.html • Statistical snippets extracted from the news • http://www.wcom.com/about_the_company/cerfs_up/ • Vint Cerf’s pages • http://www.firstmonday.dk/issues/issue3_10/coffman/index.html • The size and growth rate of the Internet by K.G. Coffman and Andrew Odlyzko Information Organization and Retrieval

  34. Human Memory • Landauer 86: Human brain holds 200MB • looked at rate of information intake and rate of forgetting, and amount of information adults need for normal tasks • 6B people on earth implies total memory of all people alive about 1,200 petabytes • Another way: • estimate that people take in a byte/sec • lifetime 250,000 days or 2B sec • result is 2 GB (doesn’t count synthesizing new info) Information Organization and Retrieval

  35. Information Overload • “The world's total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage. This is the equivalent of 250 megabytes per person for each man, woman, and child on earth.” (Varian & Lyman) • “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden) Information Organization and Retrieval

  36. Information Organization and Retrieval • To organize is to (1) furnish with organs, make organic, make into living tissue, become organic; (2) form into an organic whole; give orderly structure to; frame and put into working order; make arrangements for. • Knowledge is knowing, familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known. • To retrieve is to (1) recover by investigation or effort of memory, restore to knowledge or recall to mind; regain possession of; (2) rescue from a bad state, revive, repair, set right. • Information is (1) informing, telling; thing told, knowledge, items of knowledge, news. The Oxford English Dictionary, cf. Rowley Information Organization and Retrieval

  37. Creation Active Authoring Modifying Using Creating Organizing Indexing Retention/ Mining Accessing Filtering Storing Retrieval Semi-Active Discard Distribution Networking Utilization Disposition Searching Inactive Information Life Cycle Information Organization and Retrieval

  38. Authoring/Modifying • Converting Data+Information+Knowledge to New Information. • Creating information from observation, thought. • Editing and Publication. • Gatekeeping Information Organization and Retrieval

  39. Organizing/Indexing • Collecting and Integrating information. • Affects Data, Information and Metadata. • “Metadata” Describes data and information. • More on this later. • Organizing Information. • Types of organization? • Indexing Information Organization and Retrieval

  40. Storing/Retrieving • Information Storage • How and Where is Information stored? • Retrieving Information. • How is information recovered from storage • How to find needed information • Linked with Accessing/Filtering stage Information Organization and Retrieval

  41. Distribution/Networking • Transmission of information • How is information transmitted? • Networks vs Broadcast. Information Organization and Retrieval

  42. Accessing/Filtering • Using the organization created in the O/I stage to: • Select desired (or relevant) information • Locate that information • Retrieve the information from its storage location (often via a network) Information Organization and Retrieval

  43. Using/Creating • Using Information. • Transformation of Information to Knowledge. • Knowledge to New Data and New Information. Information Organization and Retrieval

  44. Key issues in this course • How to find the appropriate information resources or information-bearing objects for someone’s (or your own) needs. • Retrieving • How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them. • Organizing Information Organization and Retrieval

  45. Creation Active Authoring Modifying Using Creating Organizing Indexing Retention/ Mining Accessing Filtering Storing Retrieval Semi-Active Discard Distribution Networking Utilization Disposition Searching Inactive Key Issues Information Organization and Retrieval

  46. Next Week • Introduction to IR • The search process Information Organization and Retrieval

More Related