1 / 46

lis512 lecture 6

lis512 lecture 6. identifiers, dublin core and RDF. identifiers. When you are maintaining your information on your own, you don't exchange it, you don't need to worry about identifiers. Identifiers come into play when you try to communicated to another party that

mio
Download Presentation

lis512 lecture 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. lis512 lecture 6 identifiers, dublin core and RDF

  2. identifiers • When you are maintaining your information on your own, you don't exchange it, you don't need to worry about identifiers. • Identifiers come into play when you try to communicated to another party that • you are talking about the some thing as before or • the other party and you mean the same thing.

  3. keys • We have already seen keys in a database. • A key is simply a column in a table that is unique in for each line in in the table. • A key is simply a field in a record that is unique for each record in a table.

  4. identifier • An identifier is something a bit different. • When two records have the same identifier, they are about the same thing. • Being about the same thing calls the philosophers into the room. • Things become pretty messy. • Discussion on identifiers is often very confused, even among experts.

  5. example • Record 1 • Name: Thomas Krichel • Homepage: http://wotan.liu.edu/home/krichel • Record 2 • Name: Thomas Krichel • Homepage: http://openlib.org/home/krichel • Are they about the same person?

  6. a person homepage finder • Imagine a home page finder. • Ask it what the homepage of Thomas Krichel is. • It may say that there are two Thomas Krichels with two different homepages. • If there is only one, the service is wrong.

  7. work to do • If we have two records that look like we describe the same person, we try to unite them. • In a practical scenario, we most of the time have a first Thomas Krichel. • When the other Thomas Krichel comes along, we check, make a decision if it is the same person, then update the homepage setting.

  8. human labour • To find out if a person is the same as another person is something that no computer can do. • We need a person to verify this. • Identity can only be controlled by a person. • Worst: two people may differ on what “same person” means.

  9. identifier • The identifier is a special field that we keep constant over time as long as the record is about the some thing. • When we get our first record about Thomas Krichel, we have • Name: Thomas Krichel • Homepage: http://openlib.org/home/krichel • Identifier: tk1

  10. changes to the record • Thomas gets married • Name: Thomas Lechirk • Homepage: http://openlib.org/home/krichel • Identifier: tk1 • Later, he decides to change his hompage • Name: Thomas Lechirk • Homepage: http://openlib.org/home/lechirk • Identifier: tk1

  11. problem • Since he is now know as Thomas Lechirk, should he not be given an identifier tl1? • Well in general, the resultion of this question depends on the enviroment that we are operating in. • Changing the identifier would destroy its value.

  12. value of the identifier • If you change the identifier every time you change the record, it is useless. • The value of the identifier is that is long-lived. • With a good idenifier, users will still be able to find the same thing even when they query the same dataset in hundreds of years.

  13. dumb vs intelligent identifiers • Identifiers can be dumb or intelligent. • Choosing a scheme for identifiers is a key step at the start of a project. • Once we are running our project, changing identifiers can only be accomplished at great expense. • Dump identifiers are strings that have no meaning or structure “na$19h-ab9a19]”

  14. dumb indentifiers • The advantages are • If they are long and complicated enough they may be found by web searches. • They are not at risk from lobbying for change. • The problem is • It is not easy to resolve the identifier into data about.

  15. intelligent identifiers • Most identifiers have some form of intelligence. • That intelligence says something about the identifier to a person who looks at it. • The intelligence may also help a computer program to make decisions.

  16. ISBN • An ISBN is assigned to each edition of a book. • After 2007, it has 13 numbers, the first 3 are 978 or 978. Before 2007, it had only the remaining 10. • Digits 1 to 5 are the group identifier for country or country group. • Then follows a publisher identifier. • Then follows an item number by the publisher. • The parts are often split by hyphens but the hyphen is not part of the actual number.

  17. Uniform Resource Locators URL • We all know URLs. • Speaking in simplified terms, a URL typically has three components. • a mechanism • a host • a local part, local to the host

  18. URL mechanism • The URL mechanism essentially says what to do with the thing when you actually find it. • Typical URL mechanisms include • http • ftp • telnet • mailto

  19. host • The host is the domain name of a machine. • Domain names are a hierarchical scheme of names that can be resolved into IP addresses. • Example • openlib.org • fafner.openlib.org • www.fafner.openlib.org • I lease openlib.org.

  20. local part • Once the resolution reaches the host it needs to know what to send out. • This is some sort of local identifier that the host knows about.

  21. URIs • The world wide web consortium, in its quest to make the web a more universal information medium, generalized URLs to URIs. • URI are Uniform Resource Identifiers. • The don't only allow for locating resources but also for naming them. • The structure is similar, but there is a wide variety of mechanisms, called schemes.

  22. example scheme info • The info: URI scheme was created by representatives of the library community and the publishing industry as a light-weight way to make existing identifiers available as URIs. • http://info-uri.info/ has the list of these schemes.

  23. URL examples • http://openlib.org/home/krichel/ • mechanism: http • host: openlib.org • local part: /home/krichel/ • mailto:krichel@openlib.org • mechanism: mailto • host: openlib.org • local part: krichel

  24. DC contributor • URI http://purl.org/dc/elements/1.1/contributor • Definition: An entity responsible for making contributions to the resource. • Comment: Examples of a Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity.

  25. Dublin Core • Dublin Core was initially created at a invitational meeting at OCLC in Dublin, OH. • The aim was to create a metadata format that non-trained specialists could use the annotate resources on the Web. • Nowadays, it serves as a basic common denominator metadata in a wide variety of applications.

  26. DC coverage • URI http://purl.org/dc/elements/1.1/coverage • Definition: The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant. • Comment: Spatial topic and spatial applicability may be a named place or a location specified by its geographic coordinates. Temporal topic may be a named period, date, or date range.

  27. DC creator • URI http://purl.org/dc/elements/1.1/creator • Definition: An entity primarily responsible for making the resource. • Comment: Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.

  28. DC date • URI: http://purl.org/dc/elements/1.1/date • Definition: A point or period of time associated with an event in the lifecycle of the resource. • Comment: Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme.

  29. DC description • http://purl.org/dc/elements/1.1/description • Definition: An account of the resource. • Comment: Description may include but is not limited to: • an abstract • a table of contents • a graphical representation, or • a free-text account of the resource.

  30. DC format • http://purl.org/dc/elements/1.1/format • Definition: The file format, physical medium, or dimensions of the resource. • Comment: Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary. • The MIME types by the IANA are such a vocabulary for files.

  31. DC identifier • URI: http://purl.org/dc/elements/1.1/identifier • Definition: An unambiguous reference to the resource within a given context. • Comment: Recommended best practice is to identify the resource by means of a string conforming to a formal identification system.

  32. DC language • URI: http://purl.org/dc/elements/1.1/language • Definition: A language of the resource. • Comment: Recommended best practice is to use a controlled vocabulary such as RFC 4646

  33. DC publisher • URI: http://purl.org/dc/elements/1.1/publisher • Definition: An entity responsible for making the resource available. • Comment: Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.

  34. DC relation • URI: http://purl.org/dc/elements/1.1/relation • Definition: A related resource. • Comment: Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system.

  35. DC source • URI: http://purl.org/dc/elements/1.1/source • Definition: A related resource from which the described resource is derived. • Comment: The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. • Apparently only kept for historical purposes.

  36. DC subject • URI: http://purl.org/dc/elements/1.1/subject Definition: The topic of the resource. Comment: Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the DC coverage element.

  37. DC title • URI: http://purl.org/dc/elements/1.1/title Definition: A name given to the resource. Comment: Typically, a Title will be a name by which the resource is formally known.

  38. DC type • URI: http://purl.org/dc/elements/1.1/type • Definition: The nature or genre of the resource. • Comment: Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary • To describe the file format, physical medium, or dimensions of the resource, use the DC Format element.

  39. DC abstract model • Basically, this is something like the FRBR for Dublin Core. • It maybe quite useless in practice, but it makes for beautiful theory.

  40. the resource • A resource is anything that can be given an identifier. • Since a URI can be given to almost everything, this includes a vast array of "stuff". • But the important thing is that the resource is actually identified by some identifier, rather than being named. • It's essentially same thing as an entity in FRBR.

  41. property / value pair • A description occurs when a bunch of attribute / value pairs are associated with a resource. • This is like the attributes in FRBR, just that it counts them together with the values.

  42. property values • Property values can be of two forms. • They can be literals, i.e. human understandable strings. • Or they can be resources, which again can be described. When a resource is used as a value, we quote an identifier.

  43. example • Resource: http://openlib.org/home/krichel • DC.Creator: Thomas Krichel • DC.Format: text/html • Note: properties themselves are identified by URIs.

  44. similar: the resource description framework • The resource description framework has been developed by the W3C is a way generically represent information, such that it can be processed via the Web. • It again is a conceptual model. • RDF can be expressed in a variety of syntaxes. • The XML syntax of RDF is very ugly.

  45. general idea • The general idea is that of a semantic web. • Such a semantic web you could ask real questions such as what we will tomorrows weather here, or what athlete won the most gold medals in the Olympic games of 1936. • Such a system is many years away. • Computer scientists don't make much progress with this.

  46. key problem • The key problem is identity. • You and I may understand what a gold medal and what the Olympic games are. • But teaching agreeing on what it precisely means appears to be a matter of great difficulty. • Recently, some research, the dbpedia, used wikipedia headings as concept names.

More Related