1 / 41

INFM 700: Session 6 Taxonomies and Metadata

INFM 700: Session 6 Taxonomies and Metadata. Paul Jacobs The iSchool University of Maryland Wednesday, Oct. 16, 2013. This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details.

keren
Download Presentation

INFM 700: Session 6 Taxonomies and Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INFM 700: Session 6Taxonomies and Metadata Paul Jacobs The iSchool University of Maryland Wednesday, Oct. 16, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

  2. Today’s Topics • Nature and types of metadata • General-purpose taxonomies (ontologies, thesauri, …) • Special-purpose taxonomies & thesauri • Practical use of taxonomies and metadata Metadata Taxonomies & Thesauri Practical Uses

  3. Metadata • Literally “data about data” • “a set of data that describes and gives information about other data” ― Oxford English Dictionary • Why do we need this? • Types of metadata • Descriptive/subjective/content (e.g. author, subject, keywords, …) • Administrative (e.g. owner, rights, cost, creation date, version, …) • Technical (e.g. format, size, dependencies, programs) • . . . . • In practical terms: • Metadata helps users locate, navigate, interpret content • Metadata helps organizations manage content • Metadata helps systems manipulate content Metadata Taxonomies & Thesauri Practical Uses

  4. Early Example of Metadata Metadata Taxonomies & Thesauri Practical Uses

  5. Related Terms & Techniques • Taxonomies • Anything organized in some sort of hierarchical structure • Tagging • Adding almost any kind of metadata to content, but now often descriptive and user-provided • Thesauri • Focus on relations between terms • Focus on “concepts” • Ontologies • Usually model a specific domain or part of the world • Generally machine-readable Metadata Taxonomies & Thesauri Practical Uses Increasing complexity and richness

  6. Menagerie of Terms • Classification • Hierarchies • Epistemology • Directories • Controlled vocabularies • Knowledge representation Metadata Taxonomies & Thesauri Practical Uses Let’s focus on significant differences. Let’s focus on advantages/disadvantages. Let’s focus on how each is useful. Let’s not quibble over what to exactly call each.

  7. Segue – Metadata to Taxonomies What do taxonomies, thesauri, etc., have to do with meta-data? Metadata Taxonomies & Thesauri Practical Uses

  8. Taxonomies • Organization of objects according to some principle • Familiar examples: • Linnaean taxonomy (for living organisms) • Web directories (e.g., Yahoo or ODP) • Corporate directories • Organization charts • Organizational structures previously discussed Metadata Taxonomies & Thesauri Practical Uses

  9. Thesauri: Motivation • “Semantic gap” between concepts and words • Words are used to evoke concepts • Concrete objects: MacBook Pro, iPhone • Abstract ideas: freedom, peace Concepts Ideas Words Meaning Metadata Taxonomies & Thesauri Practical Uses

  10. Words and concepts • The semantic gap: What’s the problem? • Synonymy – roughly, different words or phrases can be used to express similar ideas (e.g. “notebook”, “laptop”) • Polysemy – roughly, the same word can have different meanings (e.g., “line” (fishing, code, queue, . . .) ) • Taxonomies try to group similar concepts • “Tags” often assign words to concepts, making it easier to find related concepts • Controlled vocabularies avoid ambiguity (like a specific tag set) • Thesauri represent attempts to better organize mappings between words and concepts Do these present precision or recall problems? Metadata Taxonomies & Thesauri Practical Uses

  11. Some Real Examples • Content tagging and social media (e.g. flickr, del.i.cious) • Special-purpose classification schemes and thesauri (e.g. art & architecture thesaurus – AAT, UMLS) • General semantic tools and classification schemes (e.g., Princeton WordNet, Roget’s Thesaurus) Metadata Taxonomies & Thesauri Practical Uses

  12. Think for a sec… • You are developing a content-rich site and need organization and labeling schemes to help users view/browse/learn/find stuff – what do you do? • Define your own tagging/organization scheme? • Let the users define their own? • Leave it all to a search engine? • Use some existing scheme? • . . . Metadata Taxonomies & Thesauri Practical Uses

  13. Flickr – popular tags Metadata Taxonomies & Thesauri Practical Uses

  14. Flickr – related tags Metadata Taxonomies & Thesauri Practical Uses

  15. Del.icio.us – related tags Metadata Taxonomies & Thesauri Practical Uses

  16. Art & Architecture Thesaurus http://www.getty.edu/research/conducting_research/vocabularies/aat/ Metadata Taxonomies & Thesauri Practical Uses

  17. UMLS (Unified Medical Labeling System) Source: National Library of Medicine (NIH) SPECIALIST Lexicon +Tools Semantic Network Metathesaurus 135 broad categories and 54 relationships between them lexical information and programs for language processing 1 million+ biomedical concepts from over 100 sources Metadata Taxonomies & Thesauri Practical Uses 3 Knowledge Sources used separately or together

  18. UMLS (Unified Medical Labeling System) Source: National Library of Medicine (NIH) Began in 1986 as long-term R&D project • Designed for systems developers • Develop multi-purpose tools to enhance understanding of medical meaning across systems • Overcome barriers to effective retrieval of machine-readable information • Overcome variety of ways the same concepts are expressed in machine readable and human language Metadata Taxonomies & Thesauri Practical Uses

  19. UMLS Uses Source: National Library of Medicine (NIH) • Information retrieval • Thesaurus construction • Natural language processing • Automated indexing • Electronic health records (EHR) • Distribution mechanism for • HIPAA, CHI, PHIN regulatory standards • SNOMED CT Metadata Taxonomies & Thesauri Practical Uses

  20. UMLS Metathesaurus http://www.nlm.nih.gov/research/umls/ Metadata Taxonomies & Thesauri Practical Uses

  21. UMLS Metathesaurus http://www.nlm.nih.gov/research/umls/ Metadata Taxonomies & Thesauri Practical Uses

  22. UMLS Thesaurus Browser http://www.nlm.nih.gov/research/umls/ Metadata Taxonomies & Thesauri Practical Uses

  23. Think for a sec… • You are developing a content-rich site and need organization and labeling schemes to help users view/browse/learn/find stuff – what do you do? • Define your own tagging/organization scheme? • Let the users define their own? • Leave it all to a search engine? • Use some existing scheme? • . . . Metadata Taxonomies & Thesauri Practical Uses

  24. Applying IA Principles • Focus on users and user needs – users are different, and have different models • Focus on content – concepts are different, too – different levels, words, complexity, vagueness • Examples: • What’s the difference between laptop, PDA, phone, and convergence device? • When is “cancer research” “oncology”? • When a user browses a furniture catalog for chairs, do you show them ottomans and footstools? Metadata Taxonomies & Thesauri Practical Uses

  25. Standard Thesaurus Structure Broader Terms Computer IS-A Preferred Notebook Laptop Synonyms (variants) AKA IS-A Metadata Taxonomies & Thesauri Practical Uses Narrower Terms DesktopReplacement Ultraportable Tablet PC

  26. IA Uses of Thesauri • For organization • For navigation • For indexing content • For searching Metadata Taxonomies & Thesauri Practical Uses

  27. Poly-Hierarchies • Concepts can have multiple parents • Example: Cracow (Poland : Voivodship) German death camps Auschwitz II-Birkenau (Poland : Death Camp) Metadata Taxonomies & Thesauri Practical Uses Block 25 (Auschwitz II-Birkenau) Kanada(Auschwitz II-Birkenau) From Shoah Foundation’s thesaurus of holocaust terms

  28. Poly-Hierarchies • What are the advantages and disadvantages? • What’s the relationship to polysemy? Metadata Taxonomies & Thesauri Practical Uses

  29. Practical Uses & Implementation • What are we trying to do (e.g., help users find stuff)? • What tools are at our disposal (e.g., tags, XML, databases)? • Given the above, how do we use/implement hierarchies and thesauri? Metadata Taxonomies & Thesauri Practical Uses

  30. Faceted Hierarchies • Alternative to single and poly-hierarchies • Basic idea: • Describe objects along multiple facets • Each facet has its associated hierarchy • Issues: • What’s a facet? • How do you navigate faceted hierarchies? Metadata Taxonomies & Thesauri Practical Uses

  31. Faceted Browsing Example Metadata Taxonomies & Thesauri Practical Uses

  32. Faceted Browsing Example Metadata Taxonomies & Thesauri Practical Uses Demo: http://flamenco.berkeley.edu/demos.html

  33. Advantages of Facets • Integrates searching and browsing • Easy to build complex queries • Easy to narrow, broaden, shift focus • Helps users avoid getting lost • Helps to prevent “categorization wars” Metadata Taxonomies & Thesauri Practical Uses

  34. Relationship to IA? Database WebServer ApplicationServer Network Ontologies are implicitly “hidden” here!!! Trip Airplane Type: Capacity: Part-of Equipment Flight Metadata Taxonomies & Thesauri Practical Uses From: Departure Time: Origin: To: Arrival Time: Destination: Rule: Arrival Time is always after Departure Time Rule: Distance from Origin to Destination typical > 100 miles

  35. Putting it all together… mySQL Apache Database WebServer PHP Network Two-Layer Architecture Database WebServer ApplicationServer Network Metadata Taxonomies & Thesauri Practical Uses Three-Layer Architecture

  36. Popular Implementation Presentation PHP/HTML Metadata Taxonomies & Thesauri Practical Uses Content Metadata SQL Database

  37. Encoding Hierarchies A Table: Hierarchy B C Store in RDBMS D E F G H Metadata Taxonomies & Thesauri Practical Uses Finding children of A: Select child from Hierarchy where parent = ‘A’  B, C Finding parent of G: Select parent from Hierarchy where child = ‘G’  D Finding siblings of D: find parent, and then find its children

  38. Encoding Metadata A Table: Items B C D E F G H Metadata Taxonomies & Thesauri Practical Uses

  39. Content  Presentation A You are here: A > C > D Related - D - E B C Contents at D D E F G H Metadata Taxonomies & Thesauri Practical Uses Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)

  40. Faceted Browsing Filter by - Facet1 (possible values) - Facet2 (possible values) Matching Results Metadata Taxonomies & Thesauri Practical Uses Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)

  41. Recap • Meta-data • General function • Types of meta-data • Taxonomies and Thesauri • Role in organizing, navigating and searching content • General-purpose taxonomies • Special-purpose taxonomies • Practical use & implementation Metadata Taxonomies & Thesauri Practical Uses

More Related