haystack per user information environments n.
Skip this Video
Loading SlideShow in 5 Seconds..
Haystack: Per-User Information Environments PowerPoint Presentation
Download Presentation
Haystack: Per-User Information Environments

Loading in 2 Seconds...

play fullscreen
1 / 37

Haystack: Per-User Information Environments - PowerPoint PPT Presentation

  • Uploaded on

Haystack: Per-User Information Environments. David Karger. Motivation. Indices search by keyword Taxonomies classify by subject Cool site of the day. A lot like libraries... Library catalogues Dewey digital New book shelf, suggested reading. Web Search Tools.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Haystack: Per-User Information Environments' - audra

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
web search tools

search by keyword


classify by subject

Cool site of the day

A lot like libraries...

Library catalogues

Dewey digital

New book shelf, suggested reading

Web Search Tools

Is a universal library enough?

library web limitations
Library/web Limitations
  • Huge
    • Too many answers, mostly irrelevant
  • Only published material
    • Miss info known to few, leading-edge content
  • Rigid
    • All get same search results
    • Even if come back and try again

The library is the last place we look

start with bookshelf
Start with Bookshelf
  • I try solving problems using my data:
    • Information gathered personally
    • High quality, easy for me to understand
    • Not limited to publicly available content
  • My organization:
    • Personal annotations and metadata
    • Choose own subject arrangement
    • Optimize for my kind of searching
  • Adapts to my needs
then turn to a friend
Then Turn to a Friend
  • Leverage
    • They organize information for their own use
    • Let them find things for me too
  • Shared vocabulary
    • They know me and what I want
  • Personal expertise
    • They know things not in any library
  • Trust
    • Their recommendations are good
last to library web
Last to Library/web
  • Answer usually there
    • But hard to find
    • Wish: rearrange to suit my needs
    • Wish: help from my friends in looking
    • E.g. NY public library catalogue
  • Individualized access: The best tools adapt to individual ways of organizing and seeking data.
  • Individualized knowledge: People know much more than they publish. That knowledge is useful to them and others.
  • End user: understands their data the best, so should control organization and presentation
problems with current tools
Problems with Current Tools
  • Applications designed by few for use by many
    • Developers decide what information is important
    • Provide model to hold that information
    • Provide interfaces to view/manipulate that info
  • Users discover uses/needs for other info
    • Tool cannot store, cannot support interaction
  • Users discover connections between info
    • If connected info is in different applications, neither app can record connection
  • People could do a lot more with information, if environment let them record/use what they know
haystack approach
Haystack Approach
  • Data Model
    • Define rich data model that lets user represent all interesting info
    • Rich search capabilities
    • Machine readable so agents can augment/share/exchange info
  • User Interface
    • Strengthen UI tools to show rich data model to user
    • And let them navigate/manipulate/share it
  • Adaptation
    • People are lazy, unwilling to “waste time” telling system what to do, even if it could help them later
    • System must introspect about user actions, deduce user needs and preferences, and self-adjust to provide better behavior
  • Collaboration
    • As system gathers information from one user, share with others
    • Rich data model maximizes useful knowledge transfer
data model

Data Model

A semantic web of information

  • Tremendous amount of information is relational
    • Named relationships
      • Written by, married to, traveling to, owned by…
    • Collections
      • Directories, bookmarks, menus, albums
      • Families, workgroups,
    • Web links
  • People can take huge advantage of navigating relationships
  • Network of relationships much more “structured” than a textual description, but much less regular than a spreadsheet/database
the haystack data model






D. Karger





The Haystack Data Model
  • W3C RDF/DAML standard
  • Arbitrary objects, connected by named links
    • A semantic web
    • Links can be linked
  • No fixed schema
    • User extensible
    • Add annotations
    • Create brand new attributes
rdf lowers barriers
RDF Lowers Barriers
  • Location Independent
    • Universal Locators, even for local data (as may become non-local)
  • Application Independent
    • Simple, common language suitable for variety of information types
    • Enables interlinking and exchange of information from all apps
  • Extensible
    • Can add attributes as needed, leave them out if unimportant
  • Enables powerful search
    • Based on broad variety of attributes
  • Support for data agents
    • Extract information from raw data
    • Make available for search and other forms of navigation
where does data come from
Where does data come from?
  • Pull from outside sources
    • Web, databases, news feeds…
  • Active user input
    • Interfaces let user add data, note relationships
  • Mining data from prior data
    • Plug-in agents opportunistically extract data
  • Passive observation of user
    • Plug-ins to other interfaces record user actions
  • Other Users


Machine Learning Services

Web Viewer

Haystack UI

Web Observer

Mail Observer

Data Extraction Services

RDF Store



Data Sources

user interface

User Interface

Uniform Access to All Information

current barriers to information flow
Current Barriers to Information Flow
  • Partitions by Location
    • Some data on this computer, some on that
    • Remote access always noticeable, distracting
  • Partitions by Application
    • Mail reader for this, web browser for that, text editor for those
    • To-do list, but without needed elements
  • Invisibility
    • Where did I put that file?
    • Tendency for objects to have single (inappropriate) location (folder)
  • Missing attributes
    • Too lazy to add keywords that would aid searching later
goal task based interface
Goal: Task-Based Interface
  • When working on X, all information relevant to X (and no other) should be at my fingertips
    • Planning the day: to-do list, news articles, urgent email, seminars
    • Editing a paper: relevant citations, email from coauthors, prior versions
    • Hacking: code modules, documentation, working notes, email threads
  • Location, source and format of data irrelevant
sign of need email usage
Sign of Need: Email Usage
  • Email as to-do list
    • Anything not yet “done” kept there
    • Reminder email to ourselves
    • Single interface containing numerous document types
  • Overflowing Inboxes
    • Navigate only by brute-force scanning
    • Unsafe file/categorize anything: out of sight, out of mind
interface options
Interface Options
  • Folders
    • Out of sight, out of mind
    • Still need applications to see data
    • Which is the right folder?
  • Desktops
    • Allow arbitrary data types
    • But coupling between applications & data types too light
    • A smear of many tasks, so hard to focus
      • Hundreds of icons, tens of windows, huge menus
      • No partitioning
  • Databases
    • OK if you have a degree in database administration
    • Interface is impoverished---long lists of tuples
user interface architecture

View 2

UI data


Mapping 2

User Interface Architecture
  • Views: Data about how to display data
  • Views are persistent, manipulable data


UI data

Data to be displayed

Underlying information

semantic user interface

View for Favorites collection

View for cnn.com

View for yahoo.com

View for ~/documents/thesis.pdf

Semantic User Interface
  • Present information by assembling different views together
  • Information manipulation decoupled from presentation
    • New views can be added without mucking with data types
    • New data types can be added without designing new UIs
  • Uniform support for features like context menus
    • Actions apply to objects on screen in various “roles”
    • E.g. as word, as title of mail message, as member of collection
persistence of views
Persistence of Views
  • Views are data like all other data
  • Stored persistently, manipulated by user
  • User can customize a view
    • View for particular task can be cloned from another
    • Can evolve over time to need of task
    • To an extent previously limited to sophisticated UI designer
  • Views can be shared
    • Once someone determines “right” way to look at data, others can benefit
role of schemata
Role of Schemata
  • Benefits
    • Help people look at information the right way
    • Help creators avoid creation mistakes
  • Risks of Enforcement
    • Deters lazy users from entering data
    • Prevents creative users from stretching the boundaries
  • Is there a middle ground?
    • Can schemata be “advisory”?
  • One or many?
    • If each user makes own schema, how translate?


Learning from the User over Time

  • Haystack is ideally positioned to adapt to user
    • RDF data model provides rich attribute set for learning
    • In particular, can record user actions with information
      • (the flexible UI can capture easily)
    • Extensive record can be built up over time
  • Introspect on that information
    • Make Haystack adapt to needs, skills, and preferences of that user
observe user
Observe User
  • Instrument all interfaces, report user actions to haystack
    • Mail sent, files edited, web pages browsed
  • Discover quality
    • What does the user visit often?
  • Discover semantic relationships
    • What gets used at the same time?
  • Discover search intent
    • Which results were actually used?
learning from queries
Learning from Queries
  • Searching involves a dialogue
    • First query doesn’t work
    • So look at the results, change the query
    • Iterate till home in on desired results
  • Haystack remembers the dialogue
    • instead of first query attempt, use last one
    • record items user picked as good matches
    • on future, similar searches, have better query plus examples to compare to candidate results
    • Use data to modify queries to big search engines, filter results coming back
  • Haystack can be a lens for viewing data from the rest of the world
    • Stored content shows what user knows/finds useful
    • Selectively spider “good” sites
    • Filter results coming back
      • Compare to objects user has found useful in the past
    • Can learn over time
  • Example - personalized news service


Haystack’s Ulterior Motive

hidden knowledge
Hidden Knowledge
  • People know a lot that they are
    • Willing to share
    • But too lazy to publish
  • Haystack passively collects that knowledge
    • Without interfering with user
  • Once there, share it!
    • RDF---uniform language for data exchange
  • Challenges
    • As people individualize systems, semantics diverge
    • Who is the “expert” on a topic? (collaborative filtering)
  • I want info on probabilistic models in data mining
    • My haystack doesn’t know, but “probability” is in lots of email I got from Tommi Jaakola
    • Tommi told his haystack that “Bayesian” refers to “probability models”
    • Tommi has read several papers on Bayesian methods in data mining
    • Some are by Daphne Koller
    • I read/liked other work by Koller
    • My Haystack queries “Daphne Koller Bayes” on Yahoo
    • Tommi’s haystack can rank the results for me…
  • Rich data Model
    • Lets user represent all interesting info
    • Supports sophisticated searches
    • Accessible to information agents
  • User Interface
    • Extensibly shows rich data model to user
    • Lets them navigate/manipulate it
  • Adaptability
    • System may introspect about user actions, deduce user needs and preferences, and self-adjust to provide better behavior
  • Collaboration
    • As system gathers information from one user, share with others
    • Rich data model maximizes useful knowledge transfer
more info

More Info


(initial release available for download)