Metadata : Promise and Practice Jeffrey Beall Nebraska Library Association Technical Services Round Table Spring Meeting, April 25, 2008
Outline • Introduction • 8 theses of my talk • About me • Metadata and high-quality information retrieval; value of browse displays • Four types of searching in libraries • The weaknesses of full-text searching • The future of cataloging and the debate • Next-generation library interfaces
Favorite funny subject headings Golf and war Electric donkeys Infants — Congresses World Wide Web — Early works to 1800 Automobile driving — Religious aspects Dance — France Women, Kukukuku (Changed to: Women, Hamtai) Ugly contests Host-fungus relationships
Favorite funny subject headings Weapons of mass destruction—Safety measures Pomegranate seeds in literature Infants — Books and reading Eskimos — Hunting Headache patients’ writings Bird surveys Violin — Methods (Fiddling) Global warming — Fiction Body, Human — Catalogs Mentally ill parents Appalachian Region — Intellectual life
Favorite funny subject headings Tax exemption — Taxation Dinosaurs as pets Labor disputes — Poetry Crappie fishing Reality — Fiction Historic buildings — Design and construction Public toilets in motion pictures Domestic asses Hurling managers Uranus probes 110 10 |a United States. |b Office of Solid Waste
Theses • Libraries should provide high-quality information discovery and information retrieval. • The best way to achieve this is with systems that sufficiently exploit rich, standard, and comprehensive metadata. • Rich, standard, and comprehensive metadata requires controlled vocabularies for subject metadata, name disambiguation, granularity of description, and collocation.
Theses (continued) • Full-text searching, while not devoid of value, is a low-quality IR/ID system for the type of searching done in libraries, especially serious research and scholarship, etc. • At this time, computers, which do not understand the nuances of human language, are not able to create metadata that is of sufficient quality for use in library IR systems
Theses (continued) • Information discovery often requires mediation. IR systems don’t have to be dumbed-down and made simple. Many things in the world are complicated, so it’s natural that the organization of information will reflect that. It’s okay to have to learn to use a library catalog or other IR system.
Theses (continued) • Library IR systems should not abandon alphabetical browse displays in favor of relevance ranking. • The creation, maintenance, and sharing of metadata for intellectual resources should not be made so complicated that it reduces the amount or quality of metadata being created.
About me Auraria Campus
The value of metadata • Elements of metadata • The value of rich metadata • The library technology graveyard – analyses of low-quality, emerging library technologies • Defining quality in library IR systems
The value of left-anchored browse displays • Simplicity • Structure • Parsing advantage • References • Truncation • Concept consolidation • Collocation of inverted terms • Typographical errors • Classification display • Completeness • Skill transference
The Four categories of searching in libraries • Deterministic searching • Full text searching • Metatext searching • Metadata-enhanced stochastic searching
Deterministic searching • An author, title, subject, number search in an online library catalog • Only searches metadata; results sorted alphanumerically • Can use cross-references
Full-text searching • Matches words in a search with words in documents • Advantages: free, good for rare terms, good for casual information seeking • Also called stochastic searching, probabilistic searching
Metatext searching • Is a full-text search but only of metadata • A keyword search in a library catalog is an example • Advantages: good for rare words; good for novice searchers • Disadvantage: May miss abbreviated terms; is full text, but not of full text itself
Metadata-enhanced stochastic searching • Is a full-text search but also uses metadata to limit results • Google advanced search is an example • Google staff mode – how do they encode metadata? What's their metadata scheme?
The weaknesses of full-text searching • The synonym problem • The homonym problem • Inability to search by facets • Spamming • The "aboutness" problem • Figurative language • Word lists • Abstract topics
The weaknesses of full-text searching (continued) • The incognito problem • Difficult-to-search paired topics • Search engine variability • The opaque web
Miscellaneous • What computers still cannot do • Gresham's Law • Still need metadata surrogates • The debate about the future of cataloging • My strategy • "Next-generation" library catalogs
WorldCat.org Example of a next-generation, FRBRized search engine • Facets • Metatext search • Hope for catalogers • Can be sorted also by • author, title, date
email@example.com Discussion … Scarlet