1 / 31

INF 389F—ORGANIZATION OF RECORDS INFORMATION

INF 389F—ORGANIZATION OF RECORDS INFORMATION. Professor Fran Miksa November 18, 2003 Data, Metadata, Metadata Formats, and Databases. Data & Databases. Data are strings of characters that record assertions, etc., about something. Data in computers are strings of codes representing characters

oralee
Download Presentation

INF 389F—ORGANIZATION OF RECORDS INFORMATION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INF 389F—ORGANIZATION OF RECORDS INFORMATION Professor Fran Miksa November 18, 2003 Data, Metadata, Metadata Formats, and Databases

  2. Data & Databases • Data are strings of characters that record assertions, etc., about something. • Data in computers are strings of codes representing characters • Databases are computer programs that allow us to manipulate data in computers. • IE data are data pertaining to IEs including, especially, attribute data. • IE databases are databases the data of which pertain to IEs as objects.

  3. How Does Data Become Machine - Readable (i.e., computerized?) • Basic question--We want computer to write data into its memory, but how does it do that? • Substitution codes—a basic approach to representing data in computers.

  4. Computer Switches as Codes 1 Switch—2 positions (on/off)--How different signals are possible? (21= 2 possible signals) 2 switches—always used together = (22 = 4 possible signals because of 4 possible switch setting combinations) 3 switches = “ “ “ = (23 = 8 “ “ ) 4 switches = “ “ “ = (24 = 16 “ “ ) 5 switches = “ “ “ = (25 = 32 “ “ ) 6 switches = “ “ “ = (26 = 64 “ “ ) 7 switches = “ “ “ = (27 = 128 “ “ ) 8 switches = “ “ “ = (28 = 256 “ “ ) Of course, in each case, the meanings of the signal switch combinations have to be agreed upon.

  5. Where are Codes Placed? • First, each switch or spot is called a ‘bit’ (BInary digiT) • Second, each set of basic bits in a given character set of codes is called a ‘byte’ • Bit codes can be transferred to/triggered/“set” as a series of “switches” on a computer “chip.” • Codes can be represented as magnetized or not magnetized positions (i.e., spots, locations) on a magnetic surface such as a disk. • The bits of each byte are kept together as a unit.

  6. Coding for Colors in Graphics • Colors are also encoded the same way, though the # of bits used for each coding may vary—for example, 8 bit, 16 bit, 24 bit codes for colors. • 8 bit color codes mean that each point that is coded has 256 bit combinations to represent all the colors, or all the shades in a “grayscale.” • 16 bit color codes have 65,536 bit combinations (in groups of 16 bits), and 24 bit color codes have 16,777,216 bit combinations (in groups of 24 bits).

  7. Bits used in Graphics • Pixel = a location in a grid of locations superimposed on a graphic image. • 300 pixels to the inch in each dimension of an image yields for a 8” by 5” picture, 2,400 such pixels (locations/spots/dots, etc.) down and 1,500 pixels across, or 3,600,000 pixels total, each of which are coded for a color in a 8 bit, 16 bit, 24 bit code, etc. Formatting of the pixels are known by such names as tiff, jpeg, gif, etc., files.

  8. Text & Control Characters as Codes • Lower case letters (26 total) • Upper case letters (26 total) • Numerals (10 total) • Special signs . , ; : “ ” ? / < > [ ] { } \ | - _ = + ` ~ @ # $ % ^ & * ( ) (31 total) [93 to here] • Blank space & other special symbols • Special codes for computer operation • Foreign language special signs

  9. Character Codes • ASCII, EBCDIC, • See “A Brief History of Character Codes” • <http://tronweb.super-nova.co.jp/characcodehist.html>

  10. ASCII Code-I

  11. ASCII Code-II

  12. Sequencing Codes in a Computer Space--example 1

  13. Sequencing Codes in a Computer Space--example 2

  14. Databases • Flat File Databases • Relational Databases • Data Modeling • Entity-relationship data models • Object oriented data models

  15. Flat File Database From geekgirls reading-”Databases from Scratch—III”

  16. Relatable Tables within the Database From geekgirls reading- “Databases from Scratch —III”

  17. What Kinds of IE data might be useful? • Names (Persons, Corporate bodies) • Titles • Dates, Publishers, Places • Other physical details of packaging • Statements of editions, issues, etc. • Topics, genre, audiences, uses • Relationships

  18. Two Forms of Data • Data that represents IE attributes and is simply recorded in some sequence • Among the foregoing, that data that are used specifically for searching (called access points, index terms, etc.)

  19. Metadata & Metadata Formats • Metadata consists of strings of data within computers that record the attributes of informational objects (IEs). • Metadata formats are organized arrangements of categories of metadata

  20. Original use of term metadata Object = Students; Data = attributes of students; Metadata = Data about data. D = Data; M = Metadata M D

  21. Use of the term Metadata in Information Organization • When object became an IE, it represented data in and of itself. • Therefore, what would the phrase “Metadata = data about data” mean? • Metadata came to mean, all data inside the computer about

  22. Metadata Formats The purpose of metadata formats is to “code” metadata in terms of categories. • The Categories have a wide variety of uses (e.g., content categories, computer instructions, formatting of content as text, etc.) • Some codes are used within databases only and are not generally seen by the information user (e.g., the codes in the MARC format) • Some codes are attached to metadata and text through “markup” in HTML or XML (though they are not usually seen by a user in a browser unless a special switch is clicked).

  23. Mark-up Languages • A text-processing language which embeds commands into the text that is to be processed. These commands then instruct a display device or a printer to carry out some formatting. From “Markup language"  A Dictionary of the Internet. Darrel Ince. Oxford University Press, 2001. Oxford Reference Online. Oxford University Press.   23 September 2003 <http://www.oxfordreference.com/views/ENTRY.html?subview=Main&entry=t12.002053>

  24. From “A Gentle Introduction to SGML--http://etext.virginia.edu/bin/tei-tocs?div=DIV1&id=SG • Historically, the word markup has been used to describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out. • Generalizing from that sense, we define markup, or (synonymously) encoding, as any means of making explicit an interpretation of a text. • By markup language we mean a set of markup conventions used together for encoding texts.

  25. From “A Gentle Guide to SGML” (cont’d) • A markup language must specify • what markup is allowed, • what markup is required, • how markup is to be distinguished from text, and • what the markup means. SGML provides the means for doing the first three; documentation such as these Guidelines is required for the last.

  26. Specific Markup “Languages” • SGML--Standard Generalized Markup Language • For texts • DTD--Document-type-description • Header • HTML--Hypertext Markup Language • A subset of SGML for marking up text for browsers that is platform independent

  27. Specific Markup Languages (cont’d) • XML--Extensible Markup Language • Based on SGML, but adds the capacity to define or otherwise insert special categories. • HTXML--Hypertext Extensible Markup Language • Other Markup languages--e.g., for every special purpose imaginable--Geography ML, Chemical ML, Gene Expression ML (GEML), Wireless ML, Rule ML (for XML), Theological ML, Bean ML (for JavaBean), etc.

  28. Why is a Knowledge of Markup Languages Important for Information Organization? • MLs contain Document Description capabilities. • MLs contain categories that can be used in databases. • At some point, an information organizer must use markup language for displaying information organization data.

  29. Metadata Category Codes • No metadata category codes will be useful unless they are consciously deployed in a computer program. • Metadata codes become especially useful for information organization when they are deployed in an IE organization system—i.e., in an IE database.

  30. IE Databases • An IE database is an organized structure of metadata that is used for organizing and retrieving IEs in computers. • Organizing and retrieving IEs by means of a database is possible because the database allows us to manipulate the metadata in terms of the categories represented by the metadata.

  31. A General Maxim • A professional information entity organizer must understand the place of data, metadata, metadata formats, and databases in his or her work • Their general roles • The particular details of specific systems used.

More Related