Formatting and structuring knowledge Ch 2 from textbook: Organizing Knowledge: An Introduction to Managing Access to Information.
The tool for formatting and structuring knowledge • Database • Bibliographic relationships • Text analysis • Text markup • Metadata
Data base • A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. • Some data base will hold public accessible information , such as abstracting and indexing database , full texts of report and directories while other will be data base that are shared within an organization or group organization.
Data base (Cont) • There are two main types of databases: • Reference database : these refer or point the user to another source such as a document, an organization or an individual for additional information, or for the full text of a document. • Source databases :These contain the original source data, and are one type of electronic document.
Reference database • It is include : • Bibliographic database: including citations or bibliographic references and some time abstracts of literature , it is tell the user what has been written and in which sours . • Catalogue database: it is show the stock of given library or library network but not give mush information on the contents of these document . • Referral databases : references to information or data such as the names and addresses of organization and other directory-type data Decreases in information contents
Source databases • Data is available in machine readable form instead of printed form . • Source databases can grouped according to their content: • Numeric database ( ex : statistics , survey data ) • Full-text databases ( journal articles , newsletters) • Text-numeric databases (mix of textual and numeric) • Multimedia databases ( sound , video . Pix)
The inverted file • Useful for searching complex text-based databases, where the searcher does not know the form in which the search key may have been entered in the database, and has, essentially, to guess the most appropriate form. • The inverted file is similar to an index. • In the inverted file approach there may be two or three separate files: • two-file approach ( txt file, index file) • Three-file approach ( txt file, intermediate file, index file)
The relational model • Relational database has been widely adopted in database systems. • In relation system , information is held in a set of relations or tables. • The row in the tables are equivalent to record and the columns in the tables are equivalent to filed a) catalogued-book relation occurrences Year 2007 2007 2003 2004 Title Alchemy Expert systems Computer science bibliography ISBN 0-82112-462-3 0-84131-460-7 0-69213-517-8 0-93112-345-9 Author O. Ahmad R. Ali S. Saleh M. Omar b) order-book relation occurrences ISBN 0-82112-462-3 0-84131-460-7 0-69213-517-8 0-93112-345-9 Quantity ordered 1 4 3 2 Order no 644 644 645 646
The object-orientated model • The object oriented approach to programming and database designee constructs system and database as collection of reusable interacting object. • The object-oriented approach is attractive because: • Objects are easy to change and develop without necessarily changing any other part of the system • New objects can be easily created from existing objects • Objects can be copied or transferred into new systems (With little difficulty)
Complex database structures • Stander DB design focuses on data in a limited range of data types such as integer and text . • Other data type such as image , audio and video present special challenges . • Multimedia DBMS (MM-DBMS ) are used to manage different data types, such as images, audio and video . • MM-DBMS seek to use a range of technologies, such as • relational technology for tables • text databases for documents • image storage devices for graphics an animation.
Text and multimedia • A language has a vocabulary of words, a syntax and a semantic . • Syntax : is a set of rules for stringing work together to make meaningful statements. • Semantic :is the name given to study of meaning in language. • Structural patterns are: • Problem-solution : it is simplest problem and solution proposed. • General-particular: Generalization is made and provided with one or more examples. • These structural patterns are often found in combination
Documents • A document is a record of knowledge, information or data, or a creative expression . • Characteristics of electronic documents • Easily manipulatable • Internally and externally linkable, through hyperlinks • Readily transformable • Inherently searchable • Instantly transportable • Infinitely replicable
Bibliographic relationships • Research has identified seven categories of relationship between tow or more document : • Equivalence relationships : equivalence for exact copies that can be used interchangeably • such as reproductions from the same type set document, e.g., photocopies, reprints, faxes, e-mail, microfilm, and microfiche. • Derivative relationships- horizontal : are expressions, representing different editions, translations, adaptations, arrangements. • Descriptive relationships: include critical and evaluative reviews, criticism and interpretation, annotated editions, commentaries, and analyses (these are all new works).
Bibliographic relationships (cont) • Whole-part relationships: are hierarchical relationships between component parts and its whole • Accompanying relationships: ex : supplements, indexes, and individual maps within magazines • Sequential relationships • Shared characteristic relationships include different works that share an attribute, such as title, subject, or author.
Bibliographic relationships • The functional requirements for bibliographic records (FRBR) identified a different set of relationships in the definition of the concepts • Work: the artistic creation . • Expression: the article realization of work through which the work can be read , seen , head of felt. • Manifestation: the format in which one of the expressions of the work can be found (HTML or PDF) • Item: single exemplar of Manifestation.
Text analysis • With text analysis can automate processes such as • Extracting keywords • Preparing document representations : • Ex: by processing the text to generate abstract . • Determining various characteristics of a text • Ex: the level of the reading difficulty
Approaches to text analysis • Statistical analysis: based on counting the frequency of particular words in the text • Structural analysis: (knowledge-based analysis) scans the text for words, phases or sentences that are in significant position within the text.
Text markup and encoding • Electronic test at its most basic uses the ASCII character set. • The application of markup to plain (ASCII) text enables electronic documents to be stored and re-used efficiently • Markup is of two kinds: • Procedural markup : defines the final presentation of the documents and application . • Descriptive markup: defines the heading ,content list ,paragraphs and other element which make up the structure of document .
SGML • SGML- standard generalized markup language: used to embedding descriptive markup within a document, and thus for describing the structure of a document. • SGML formally describes the role of each piece of text, using labels enclosed within <brakets> • It is a descriptive not a procedural
HTML • HTML- hypertext markup language: is a subset of SGML – formally, it is an SGML document type definition – that has been specially developed for creating world wide web documents • HTML is used to define the display of web documents, including features such as font size and type, background and text colors, the use of bold and italic, and page layout.
XML • XML –eXtensible markup language: is a version of SGML that can be sues on the web. As compared with HTML, XML is extensible in the sense that new markup tags can be created to facilitate searching and exchange of information. • An XML implementation typically consists of three parts: the XML document, a document type definition (DTD) and a style sheet (XSL)
DTD • DTD- Document type definition: are SGML or XML applications that define the structure of a particular type of document, using markup. • An XML schema is a richer form of a DTD that defines not only the structure, but also the content and semantics or meaning of documents
Both DTDs and XML schemas define: • Elements that might be part of a particular document type • Element names and whether they are repeatable • The content of elements • What kinds of markup can be omitted • Tag attributes and their default values • Names of permissible entities
Metadata What is metadata? • Metadata is data about data created to describe or represent the attributes and contents of that information package. • Metadata is a form of document representation, but it is not a document surrogate in the way that a catalogue entry is . • Metadata is linked directly to the resource and allows direct access to the resource .