Will xml and information retrieval make society transparent
1 / 16

Will XML and Information Retrieval Make Society Transparent? - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Will XML and Information Retrieval Make Society Transparent?. Gregory B. Newby School of Information and Library Science University of North Carolina at Chapel Hill http://ils.unc.edu/gbnewby. Basic Premise.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Will XML and Information Retrieval Make Society Transparent?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Will XML and Information Retrieval Make Society Transparent?

Gregory B. NewbySchool of Information and Library ScienceUniversity of North Carolina at Chapel Hill


Basic Premise

  • Information retrieval will be facilitated by XML because of the additional structure that XML adds.

  • This will result in better IR abilities compared to plain text or HTML

IR is Not Database Retrieval







Structured data

Natural Language

(Semi-) Structured

Or unstructured

Information Retrieval in One Slide

  • IR is about matching information to info. needs

    • Information may be contained in documents, extracts, document surrogates, or newly-created documents

    • Information needs may be poorly defined, changeable, and context-specific

  • We evaluate IR systems by the numbers of relevant documents they identify

    • Recall: proportion of all relevant documents that are retrieved

    • Precision: proportion of documents that are retrieved that are judged as relevant

Why IR Sucks

  • Human language is ambiguous

    • Polysemy: The same word can mean different things

    • Synonymy: Different words can mean the same thing

  • The topic or aboutness of a document is hard to assess

  • Queries are short and ambiguous

  • Information needs are moving and vague targets

Things that help IR

  • Structure: matching based on known types of content (e.g., a list vs. discourse)

  • Relationships: Knowing how groups of documents are related

  • Metadata: terms or phrases that are of assuredly high importance

  • User knowledge: context, user models, history…

Transparency through Information Access (utopic view)

  • What if organizations (government, corporations, etc.) are less able to hide their actions?

  • What if individuals’ information is readily accessible to all?

  • What if nearly all information that is generated is available to all seekers?

Inequity through Information Access (dystopic view)

  • Organizations share their data only when and with whom they choose

  • Individuals’ information is hoarded by businesses, government and the people themselves

  • Information is available on a fee- and authority basis

XML can’t make societal decisions…

  • But XML brings about the opportunity for such decisions to be made

    • If information is readily available to all, XML will help make it more searchable

    • If information is only available to the privileged, XML will make them more powerful

XML Uncertainties

  • Will XML be used for markup? Or only at the back end?

  • Will standards such as Z39.50 or EDI make it easier for sharing XML data? Or will translation & mapping be difficult?

  • What sort of variety will exist in DTDs? How difficult will it be for IR and database systems to map between DTDs?

XML stakeholders: Big organizations

  • Organizations with lots of internal data

    • (The IRS; Time-Warner; others big & small)

  • These organizations will benefit from XML + IR by being able to match database-type items with IR-type information needs.

    • E.g., “for people who purchase these products, what email and chat messages have they exchanged”

XML stakeholders: Organizations who share

  • Organizations who broker, repackage or resell information will benefit from XML + IR

    • (Credit bureaus, investigative services…)

  • XML will make it easier to submit IR queries against multiple datasets and merge the results

    • E.g.,”See what this person’s public Web pages say before deciding whether to hire him or her.”

XML stakeholders: Individuals

  • Ultimately, lots of the most valuable information is by or about individuals

    • (Lifestyle, health, purchasing, travel…)

  • IR systems that understand us better will be able to serve us better

    • E.g., “recommend a book based on my past reading, movies and available time to read.”

What we know, revisited

  • IR sucks, but is better to the extent that language is unambiguated and structure is present

  • People have information needs, but have trouble expressing those needs

  • Documents can address some needs, but often real-world information needs are better met by assembling answers from diverse sources

What we don’t know, revisited

  • XML: In the background or the foreground?

  • How will organizations share XML data (will they?)

  • What external forces might make data in all forms more accessible across organizations and to individuals?


  • Despite problems, IR has continued to make good progress

  • Despite problems, XML appears to be making a strong contribution to storing, organizing and presenting data of all types

  • With IR, XML will be more searchable for a variety of purposes

  • With XML, IR will gain better precision and ability to serve the needs of individuals and organizations

  • Login