Information Seeking Behavior LIS 510
Introduction • Every day we are deluged by data • It is received through our five senses, which are continuously at work • Wide variety of input sources • Written material (hard copy and electronic) • Auditory (speech, radio, CDs, etc.) • Imagery (photographs, graphs, etc.) • Video (TV, movies, etc.)
Information Overload • “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden)
Noise Trans-mitter Message source Desti-nation Receiver Channel Information Theory • Claude Shannon, 1940’s, studying communication • Ways to measure information • Communication: producing the same message at its destination as that seen at its source • Problem: a “noisy channel” can distort the message • Between transmitter and receiver, the message must be encoded • Semantic aspects are irrelevant
Message Message Source Encoding Decoding Destination Channel Message Message Source Encoding (writing/indexing) Storage Decoding (Retrieval/Reading) Destination Information Theory • Better called “Communication Theory” • Communication may be over time and space
What kinds of information are there? • Text • books, periodicals, WWW, memos, ads • published/refeered • Film • Photos, other Images • Broadcast TV, Radio • Telephone Conversations • Databases
Stored Information Print Film Optical Magnetic Communicated Internet Broadcast Phone Mail How Much Information?
Print • Annual Production • Books 968,735 = 8 Terabytes (compressed image) • Newspapers 22643 = 25 Terabytes • Journals 40000 = 2 Terabytes • Magazines 80000 = 10 Terabytes • Office Documents 12x10^9 pages = 312 Terabytes • TOTAL 357 Terabytes (1824 scanned, 35 text)
Print • Library of Congress Printed book collection • About 18 Million books • About 130 Terabytes (compressed image) • For all of LC we should also assume • 13M photographs, 5MB each = 65 TB • 4M maps, say 200 TB • 500K files, 1GB each = 500 TB • 3.5M sound recordings, ~2000 TB • Grand total: 3 petabytes (~3000 terabytes) • Books in Print • 3.2 Million titles • About 26 Terabytes
Film and Image • Film • Photographs = 410 Petabytes per year • Movies = 16 Terabytes (Commercial Production of about 4000 films) • X-Rays = 12 Petabytes
Optical Media • CD-Music 90,000 items = 58 TB • CD-ROM 3,000 items = 3 TB • DVD-Video 5,000 items = 22 TB • Total 83 TB
Magnetic Media • Audio Tape 184,200,000 = 184.2 Petabytes • Video Tape 355,000,000 = 1420 • Floppy disks = 0.07 • Removable disks = 1.69 • Hard Disks = 500
Medium Type of content Terabytes/Year Terabytes/Year Upper Bound Lower Bound Paper Books 8 7 Newspapers 25 20 Periodicals 12 12 Office documents 312 312 SUBTOTAL 357 351 Film Photographs 410,000 100,000 Cinema 16 16 X-Rays 12,000 12,000 SUBTOTAL 422,000 112,016 Optical Music CDs 58 40 Data CDs 3 3 DVDs 22 22 SUBTOTAL 83 65 Magnetic Camcorder 300,000 300,000 Disk drives 2,555,000 1,000,20 SUBTOTAL 2,855,000 1,300,200 TOTAL 3,277,440 1,412,632
Current Size of Web • There are an estimated 2.1 Billion pages on the Web • About 21 Terabytes • About 7500 further Terabytes in web-accessed DBs. • 610 Billion email messages per year = 11285 TB • Internet Traffic is doubling every 100 days - An estimated 62 Million Americans now use the internet Radio took 38 years to get 50 M listeners, TV took 13 years, the Net took 4 years...
Human Memory • Landauer 86: Human brain holds 200MB • looked at rate of information intake and rate of forgetting, and amount of information adults need for normal tasks • 6B people on earth implies total memory of all people alive about 1,200 petabytes • Another way: • estimate that people take in a byte/sec • lifetime 250,000 days or 2B sec • result is 2 GB (doesn’t count synthesizing new info)
Data and Information • These two terms are quite often used interchangeably • used without any definitions or explanation • There are no standard definitions for these two terms • Two possible definitions:
Data and Information (cont.) • Data • items such as text, facts, numbers, images or sounds that may or may not be useful for a particular purpose • Information • data which has been processed so that its form and content are appropriate for a particular purpose
Intuitive Notion • Information must • Be something, although the exact nature (substance, energy, or abstract concept) is not clear; • Be “new”: repetition of previously received messages is not informative • Be “true”: false or counterfactual information is “mis-information” • Be “about” something • This human-centered approach emphasizes meaning and use of message
Knowledge • Quite often the terms information and knowledge are used interchangeably • One possible definition of knowledge • a combination of information, instincts, rules, ideas, procedures and experience that guide actions and decisions
Knowledge (cont.) • Two types of knowledge • Tacit • also called implicit, private or personal knowledge • knowledge held by an individual; may not have been articulated or may not be articulatable • For example, how does Michael Jordan accomplish his “slam dunks”
Knowledge (cont.) • Explicit • also called public or social knowledge • expressed in a form that makes it available to others • usually in a written form, but may be in other forms such as verbal
Continuum • Quite often data, information and knowledge are expressed as a continuum: • Data => Information => Knowledge
Pyramid • Data, information and knowledge are also depicted as a pyramid • a distillation occurs as we move up the pyramid • data is “raw material” • as data is processed, information is distilled from it and the resulting amount is smaller in size; the same result is experienced in going from information to knowledge
Wisdom • Long term goal should be the acquisition of wisdom • but there is not much discussion in the literature or in the media • The current situation was aptly described by T.S. Eliot: • “Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?”
Wisdom (cont.) • Wisdom connotes the ability to acquire and use knowledge and information judiciously, possessing the power of judging rightly and following the soundest course of action based on knowledge, skill, experience and understanding.
Information Hierarchy • Data • The raw material of information • Information • Data organized and presented by someone • Knowledge • Information read, heard or seen and understood • Wisdom • Distilled and integrated knowledge and understanding
Represented by shapes or symbols that require cognitive skill to decipher May not provide a context to fully understand its meaning e.g. 10,000,000 5,000,000 What is Data?
Involves process of reception, recognition and conversion May involve a ‘novelty’ factor--a new piece of data May have multiple interpretations resulting in ‘public’ and ‘private’ information e.g. Joe won $10,000,000 in the lottery last year and $5,000,000 more this year. What is Information?
Is created/acquired from a collection of information Knowledge builds on a foundation of accurate information and can be passed on to others e.g. Joe has been paying a lot of taxes because of his lottery winnings and the brand new mansion he bought. What is Knowledge?
Represents highest level of complexity in chain of concepts Difficult to impart via a storage medium Argued to exist only within an individual e.g. He who has money has friends. What is Wisdom/Insight?
Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -- T.S. Eliot, “The Rock” Where is the information we have lost in data?
Whom Do People Ask for Information? • People immediately present • People they know • People they trust • “Gatekeepers” • People in authority generally • People with cognitive authority • Teachers • Librarians
How Do People Ask for Information? • At the moment of need • By the easiest available route • By what they expect will give them the most suitable answer • By what they expect will give them the most accessible answer
Information and People • Information reinforces social bonds • People exchange familiar information • People continue to believe erroneous information • People say they value information (more than they use it) • People want a known available source
Limits to Information • People do not want information that will upset them • People do not want information that might upset them • People do not want more information than they can store • People do not want more information than they can process • People must eventually stop getting information and act on what they know
Dangers of Information • Information might be erroneous • Information might be deliberately misleading • Information might be contradictory • Information might be so excessive as to paralyze action • Information may cost more than its worth • Relying on authority may be better than information • Possessing information may make one a too conspicuous social figure • Possessing information may make one a challenge to authority
Storing Information • People do not want more information than they can store • Immediate storage: • Short and long term memory • Active knowledge • People need more information than they can store immediately: • “At hand” • “In the library” • “On the web”
Information Wants and Needs • What people truly need • What people recognize they need • What people are willing to admit they need • What people truly want now • What people think they want now • What people say they want now
Standard Model Assumptions • Maximizing precision and recall simultaneously • The information need remains static • The value is the resulting document set
Problems with Standard Model • Users learn during the search process • Scanning titles of retrieved documents • Reading retrieved documents • Viewing lists of related topics • Navigating hyperlinks • Some users don’t like long disorganized lists of documents
Berry-Picking as an Information Seeking Strategy • Standard IR model • Assumes the information need remains the same throughout the search process • Berry-picking model • Interesting information is scattered like berries among bushes • The query is continually shifting
Berry-Picking Model (cont.) • The query is continually shifting • New information may yield new ideas and new directions • The information need • Is not satisfied by a single, final retrieved set • Is satisfied by a series of selections and bits of information found along the way