Knowledge Organization in Digital Libraries (II)Digital LibrariesINFO 653Week 6Xia LinCollege of Information Science and TechnologyDrexel University
Approaches: • Keyword Indexing • Making search engines functional • Metadata (bottom-up) • Extending traditional subject indexing • Classification (Top-down) • Using a structured classification frame to provide hierarchical browsing and access. • Ontology Approach
Keyword Indexing • Highly automated process. • Use every meaningful word to index documents. • Make search engines functional • Make large amount of information accessible.
MetaData Approach • Digital Object Identifiers • Dublin Core • Subject tag • Description tag • RDF Data model • Resource
Classification Approach • Use Current Classification Scheme • LC Classification • Dewey Classification • Most projects are not completed • A mile wide an inch deep • Use ad-hoc classification schemes • Yahoo style hierarchical list • Use automatic classification
Ontology Approach • Ontologies • Define not only concepts but also relationships of concepts. • Define both links and types of links.
Ontology • An ontology is a specification of a conceptualization. • An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. • An ontology is a commitment to use the shared vocabulary in a coherent and consistent manner.
Work Force Digital Library Ontology Cases that worked Concepts (taxonomy and ontology) Lessons learned example-of example-of Workforce Programs describes represents Policy and regulation Documents refers-to Projects example-of Info Resources sponsors uses is-part-of describes Government refers-to example-of Guides, Handbooks Document initiates is-related-to write Describes includes Organizations People Presentations example-of sponsors Events (conferences, workshops, ...) Peter Creticos sponsors
Why Develop an Ontology? • To enable a machine to use the knowledge in some application. • To enable multiple machines to share their knowledge. • To help yourself understand some area of knowledge better. • To help other people understand some area of knowledge. • To help people reach a consensus in their understanding of some area of knowledge.
Ontology and thesaurus • Ontology inherits the ideas, purposes, and functions of the thesaurus. • Ontology extends relationships among concepts beyond those in thesaurus (NT, BT, RT, Synonyms). • Ontology intends to be consumed by both human and machine.
Topic Maps • A key component of Semantic Web • A new ISO standards • ISO 13250 Topic Maps • XML-like syntax • XML Schema • XTM: XML Topic Maps
XTM Topic MAPS • XML Topic Maps(XTM) defines an abstract model and XML grammar for topic maps. • XTM does not define topic maps at the implementation level. • Each implementation may interpret XTM differently or define their own “metadata” with the framework of XTM.
TAO of Topic Maps • <topicmap> • TOPIC • topname • basename • dispname • sortname • OCCURS • ASSOC • assocrl • facet • fvalue • addthms • </topicmap>
Topic Maps for Knowledge Representation • Establishing an associative network between resources which represent concepts • Organizing legacy resources into a new information/knowledge space, by relating them to topics, and associating those topics, in a structured way • Enabling disparate sets of information resources to be used together, by interrelating them using a unifying conceptual framework
Topic Map Implementation • Why is topic map implementation hard? • There are no “magic” solutions for content representation. • It is labor-intensive and involves many manual activities to create a complete TAO. • There are no good tools for topic map creation. • XML is not designed to let end-users work directly on objects contained in a XML file.
Topic Maps and Thesaurus • Different Directions of indexing • Thesaurus: assign descriptors to documents • Topic maps: associate occurrences to terms • Different structures • Thesaurus: mainly a hierarchy plus some cross-references • Topic Maps: more link types
ALL Together – Libraries Keyword indexing Classification Thesaurus Metadata Knowledge Organizing Ontology XML RDF Topic Maps Semantic Web
Personal Research Projects • Explore solutions to make knowledge organizing practical • Knowledge Class • KEPT • Knowledge Middleware
Knowledge Class • Purposes • to customize knowledge organization and access, • to supplement and complement existing devices for Web users, and • to explore the possibility of combining existing methods of knowledge organization with advanced Web technology.
Knowledge Class • Design Principles • balance of browsing and searching • balance of manual indexing and automatic indexing • balance of personal (topical) information space and the whole web space
Knowledge Class • Three components • an organizing framework • a dynamic web interface • Search strategies for each term
Knowledge Class Features • A hierarchical structure of subject terms constructed on classification principles • Multiple levels of knowledge organization --Expandable and contractible branches of the hierarchy to allow varying levels of depths, • Static links to remote resources and related sites or pages • Dynamic links to target information through search engines such as Google, AltaVista, InfoSeek, Yohoo!, and Lycos, etc. • Coded search strategies for terms • Use of scope terms for classes and for branches
Knowledge Class Features • Referral links among terms within a knowledge class and potentially among knowledge classes to assist cross reference. • Instant switch among search engines available over the Web to allow access of a variety of resources covered by different search engines.
A Knowledge Class for Digital Libraries • Developed by students two years ago
Yahoo Categories: • References – Libraries – Digital Libraries: • Cataloging Electronic Resources@ • Conferences (5) • Electronic Literature@ • Electronic Theses and • Dissertations (ETDs) (14) • Metadata@ • Organizations (2) • Projects and Collections (33)
IFLA page: • Resources and Projects • Cataloguing & Indexing of Electronic Resources • Electronic Text & Journal Archives • Metadata Resources
Digital Libraries: a Selected Resource Guide • Overview and general resources • Project planning & management • Architecture • Technology • Standards and guidelines • Archiving & Preservation • Metadata • Intellectual property rights.
Northern Light folders • Digital Libraries • Special collections • Conferences • dlib.org • dlib.org.ar • uh.edu • rutgers.edu • stanford.edu • stfx.ca • vt.edu • uni-trier.de • ucla.edu • Class notes & Assignments • all others...
Digital libraries by William Y. Arms: Table of Contents 1 Libraries, Technology, and People 2 The Internet and the World Wide Web 3 Libraries and Publishers 4 Innovation and Research 5 People, Organizations, and Change 6 Economic and Legal Issues 7 Access Management and Security 8 User Interfaces and Usability 9 Text 10 Information Retrieval and Descriptive Metadata 11 Distributed Information Discovery 12 Object Models, Identifiers, and Structural Metadata 13 Repositories and Archives 14 Digital Libraries and Electronic Publishing Today
Practical Digital Libraries: Books, Bytes, and Bucks by Michael Lesk 1. Evolution of Libraries 2. Text Access Methods 3. Images of Pages 4. Multimedia Storage and Access 5. Knowledge Representation Methods 6 Distribution 7 Usability and Retrieval Evaluation 8 Collections and Preservation 9 Economics 10 Intellectual Property Rights 11 International Activities 12 Future: Ubiquity, Diversity, Creativity, and Public Policy
How do I build a Thesaurus • Use existing dictionaries and thesauri to decide on the terms and their relationships. • Collect a set of representative documents and try to index them; take the set of indexing terms as your preliminary list. • Review and organize the preliminary term set: • decide on preferred terms and make Use references from the variants and synonyms; • build hierarchical and associative relationships among the preferred terms. • Produce a draft list, test and revise.
Scope terms • Each knowledge class can have one scope term to limit the search scope: • Technology -- will be searched by technologies AND “digital libraries” in the kclass of Digital Libraries. • Each branch of knowledge class can have one scope term: • Issues – in Technology branch will be search by “Issues and Technology and digital libraries”
Data Format –first year --,mutual funds,mutual-funds Investment-trusts Unit-trusts,http://www.brill.com,1 • 1. Hierarchical level • 2. Display term • 3. Search term (synonyms) • 4. URL • 5. Search strategy code
Second year-- Last Year’s student project <topicmap title="Digital Libraries"> <topic id="General Resources" type="Main category"> <topic id="Bibliography"> <topname> <basename>Bibliography</basename> <dispname>Bibliography</dispname> <sortname></sortname> </topname> <occurs> </occurs> <topic id="IFLA bibliography" type="reference"> <topname> <basename>IFLA bibliography</basename> <dispname>IFLA bibliography</dispname> <sortname></sortname> </topname> <occurs> type="website" href="http://www.ifla.org/II/diglib.htm" </occurs> </topic>
Search Strategy key word search: 0 search term + branch scope term + class scope term 1 search term + class scope term 2 search term only Phrase search: 3 search term (as a phrase) +branch scope term + class scope term 4 search term (as a phrase) + class scope term 5 search term (as a phrase) Hierarchical search: 6 search term +its all the children + branch scope term + class scope term 7 search term +its all the children +class scope term 8 search term +its all the children No search: 9 No search No link for this display term; Label only Search terms+ display term: 10 same as 0 except display term also adds to the query 11 same as 1 except display term also adds to the query 12 … …
Digital Libraries • General Resources • Technology • Projects • Indexing & Cataloging • Knowledge representation • Metadata Resources • Collections and Repositories • Digital Preservation • Economic and legal issues • Intellectual Property Rights • People and organizations
Next Version • Convert to XML • Use topic map standards • Improve the editing tool
Next Integration: KEPT RDF-ISO Standards OAI protocol Knowledge-Enabled Personalization Tool (KEPT) Knowledge Repository Topic Map Editor Information Resources Drag and drop Relational Database Thesauri Ontologies Topic maps ……. Hierarchical Generator Co-occurrence Mapping Web Browser XML Schema XML XSLT Searching/ Browsing Interface Search engines XML Application Server HTTP Server
New Interface Search: Primary Source: Recycling ERIC Thesaurus TopicMap ERIC Thesaurus ERIC Database Secondary Source: MeSH Related Terms: Conservation (Environment) Depleted Resources Ecology Natural Resources Pollution Recycling Solid Wastes Waste Disposal Waste Water Wastes Water Treatment Broader Terms: Sanitation Waste Disposal Recycling Co-occurrence Terms: Environmental Education Waste Disposal Conservation (Environment) Science Education Natural Resources Solid Wastes Ecology Pollution Learning Activities Higher Education Wastes Instructional Materials Conservation Education Energy Environment MeSH Terms matched “Pollution”: Air Pollution Air Pollution, Indoor Indoor Air Pollution Air Pollution, Radioactive Environmental Pollution Pollution, Environmental Tobacco Smoke Pollution Air Pollution, Tobacco Smoke Environmental Pollution, Tobacco Smoke Environmental Smoke Pollution, Tobacco Environmental Tobacco Smoke Pollution Water Pollution Thermal Water Pollution Water Pollution, Thermal Water Pollution, Chemical Chemical Water Pollution Water Pollution, Radioactive Recycling Ecology Wastes Waste Water Waste disposal Pollution Air pollution Water pollution Indoor pollution Energy Natural Resources Water Power Conservation Education Attitudes Motivations ……
The Knowledge Middleware • A centralized repository that integrates diverse knowledge structures • A set of mapping tools and protocols for crosswalks among various thesauri; • A dynamic knowledge base for semantic neighborhoods that uses term occurrences and co-occurrences • A web-based authoring and editing tool for building personalized topic maps from existing knowledge structures in the repository • A visual search interface for content-base searching with the help of knowledge structures in the repository.
Conclusions • Knowledge Organizing is one of the major challenges of Digital Libraries. • There are increasing demand for formalized (marked up) knowledge. • There are increasing tools and specification for subject access (or knowledge access) to the Web and to Digital libraries.
References • Xiao, Y. (1994). Facet Classification: A consideration of its features as a paradigm of knowledge organization. Knowledge Organization 21(2), pp. 64-68. • Bies, W. (1996). Thinking with the help of images: on the metaphors of knowledge organization. Knowledge Organization 23(1), pp. 3-8. • Huth, M. (1995). Symbolic and sub-symbolic knowledge organization in the computational theory of mind. Knowledge Organization 22(1), 10 - 17.