650 likes | 959 Views
An introduction to the Semantic Web for Museums. Presented at Museums and the Web 2006, Albuquerque. Dr Mike Lowndes, Interactive Media Manager, Natural History Museum, London mikel@nhm.ac.uk. The semantic web: Contents. Web futures: context What is it?
E N D
An introduction to the Semantic Web for Museums Presented at Museums and the Web 2006, Albuquerque Dr Mike Lowndes, Interactive Media Manager, Natural History Museum, London mikel@nhm.ac.uk
The semantic web: Contents • Web futures: context • What is it? • Web problems: Digital objects and other issues • Building blocks • Steps along the way • Current applications • Other advances – Web2.0 • Activity in the cultural sector • Is it actually going to happen? • Conclusions for Museums
Web Futures We can be assured that • Whatever we propose today, the future will be different. • Technology progresses and conceptual thought keeps playing catch-up. New ideas supplant old. • The future ‘web’ will be as messy and tricky to predict as the past. So… • For Museum web users, we should strive toward a greater signal to noise ratio.
Web futures: Other Developments • Web 2.0: web as application platform • Convergence • The web becomes more TV-like, but remains interactive and always available. • More layers of information on more channels. • It will become optionally immersive: degrees of immersion depending on how you interact with it. • Internet 2www.Internet2.org • Infrastructure, for massive bandwidth. • Grid computingwww.gridcomputing.com • Shared processing: increasing available power when connected. • Computing power becoming a utility like electricity. • Towards instant processing of everyday tasks (in the human timeframe).
Web Futures: Internet Ubiquity. • All technological devices connected. • The intelligent fridge, RFID, mobiles with GSM, GPRS, G3. • Future mobile device: operate your bank account, hifi and front door lock, turn the car heater on before you get to it – these things are not that far away. • The ‘web’ is already old-school. • We don’t yet have a simple word for the continuum between digital radio, TV, the web, mobile internet, sms and multimedia kiosk interactions, though internet technology underpins it all. • We can no longer limit our thinking to the needs of the desktop browser-based ‘web’.
Problems: A Digital Object • Digital object. • Named Anomalocaris. • Did that help? • If we need help to make sense of many digital objects, Google needs even more. • So: A digital object should include or connect to the supporting data that allows both humans and machines to understand it. • Answer: The semantic web: • Provides a framework, standards and tools for attaching, extending, making available and understanding the ‘meaning’ of digital objects. • Makes the digital medium self-explaining.
Problems: the worst things about today’s web? • Its ‘manual’. • Google is currently the most popular way to begin exploring a topic. • It relies on humans to link sensibly to interesting and relevant content. This only works when a LOT of humans are making the links. • Hyperlinks. They are dumb. • They do not explain themselves. • Can you trust them? • When you create them, you need to keep validating them. • Searching for new links to make requires a search engine. • Metadata can improve this, but metadata is poorly used. • Answer: The semantic web
Problems: how many logins do you have? Bank1 Bank2 Sharedealing Work VPN Basecamp Amazon eBay eBuyer Picstop Flickr Email 1 Email 2 Etc………………. • What about an infrastructure that allows you to ‘log on’ to the internet just once? • Answer: The semantic web.
Some Web Issues For Museums • People trust online museum’s content and their links more than others, perhaps. • But our knowledge and collections are not easily available for the public, as a single ‘collection’ relevant to their needs. • This requires breaking down the digital walls between institutions: digital access, interoperability, flexible context. • Interoperability is a difficult thing. • Our metadata is easy to publish, but nothing ‘out there’ uses it to improve searching. • Attempts are being made (e.g. OAI-PMH) • GBIF • Other portals • Answer: The semantic web.
The Semantic Web • Tim Berners-Lee – Web Visionary and head of W3C. • Formally set off in 1998: Goal is the solution to information overload and the personalisation of the web. • ‘Adding logic to the web’ • If you’re 38 and some available content is aimed at six year-olds, then its not appropriate to prioritise display (unless you’re searching for your kids). This kind of logic is built into the semantic web. • ‘Turning the web into a global database’ • Semantic web software should be able to find, sort, classify, interpret, and present relevant content in context. • Achieved via global use of metadata leading to vastly improved ‘browsing’, and agents which may seem intelligent because they can process a web that describes itself.
W3C Definition • Tim Berners-Lee: • ‘The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.’ • For the Web to become a truly machine-readable resource, the information it contains must be structured in a logical, comprehensible and transparent fashion. • This is the primary work required to ‘enable’ the semantic web.
The Building Blocks How do we get there?
What the Semantic Web Will Require • Adoption of metadata standards. • Usable tools for automatic and semiautomatic multilingual knowledge mark-up. • Modelling relationships. E.g between types of metadata. • Construction of ontologies (and mappings between them). • Stay awake! • Plus, intelligent agents to ‘mine’ the above for a particular person’s needs. • Defining a particular person requires a user profile.
Boxes And Arrows – No Clouds! Context A digital object User profile Other ontologies Maps to User query, or query generated by user behaviour Semantic Web Agent Maps to and is constrained by Identified ontologies • Accurate, • meaningful • Answers • Actions • Views of information Associated metadata
W3C: Current Semantic Web Work (2006) • A roadmap. Two ‘formal’ XML technologies are now part of the first generation semantic web: • RDF for holding and communicating the metadata. • OWL for describing relationships and inferring meaning.
1. XML • XML underpins the next step. • It can describe the 'data' on the web by wrapping that data in tags that explain it. • E.g.<product><fruit>orange</fruit><price>20</price><currency>gbp</currency></product> • XML is a framework. • Ad-hoc files can be created in it for specific uses, using any tags you like. • There is no need to formally describe them unless you want them to be understood outside your particular use.
XML Languages for Describing Content • You can formalise a tag set written in XML by creating a ‘config file’ for it, known as a Document Type Definition, or more recently, a Schema. • e.g. • Summary Metadata: Dublin Core and its derivatives. • Data Markup: Encoded Archival Description, RSS. • XML can also format and transform itself with XML ‘stylesheets’: XSL/XSLt. • Formal XML ‘languages’ underpin the semantic web. • XML over the internet: enables machine-to–machine communication.
2. RDF: Resource Description Framework • W3C supports the development of the “Resource Description Framework ”. • RDF is the ‘official’ current encoding format for semantic web data. • Can contain data, metadata and relationships. • E.g. Dublin Core, RSS. • Make web resources self-describing. • RDF-S (a more recent development) • ‘Schema’ provides some ontology support to RDF. • E.g. Simple DC file
3. Ontologies - OWL • W3C supports the development of the Web Ontology Language,usually abbreviated as OWL. • What is an ‘ontology’? • A dictionary defines the meaning of words. • A taxonomy or classification system describes hierarchical relationships between things but not usually other kinds of relationships. • A thesaurus deals with wider relationships between words but meaning by inference only. • Ontologies join taxonomies and thesauri together and can derive logic and inference – relationships of meaning. • OWL is the latest iteration of this idea as applied to the web. • It is a ‘vocabulary extension’ of RDF – not something ‘different’.
Brainbreak - FenFire Ouch, my brain hurts.
Definitions and Properties of an Ontology • James Hendler: • “a set of knowledge terms, including the vocabulary, the interconnections in meaning, and some simple rules of inference and logic for some particular topic.” • ‘DigiCULT: • “The most typical kind of ontology for the Web has a taxonomy and a set of ‘inference rules’.” • What does it do? Describes relationships between data. • TBL: • An ontology may express the rule "If a city code is associated with a state code, and an address uses that city code, then that address has the associated state code.” = the functionality of a database (query) and a thesaurus (meaning by context).
How Will Ontologies Be Used In The Semantic Web? • Ontologies can be domain-oriented, task-oriented, application-oriented or general purpose. Also called ‘class taxonomies’. • ‘Upper Ontologies’ are more general and can tie more specific ones together by ‘mapping’ them. • e.g. How can we make a machine understand that ‘watercolours’ are linked to ‘jewellery’ semantically? • Concept of ‘watercolour’ links to a definition URI (url). • Local ontology: ‘watercolour is a type of painting.’ • Local ontology: ‘necklace is a type of jewellery’. • Upper ontology: ‘painting’ and ‘jewellery’ are both types of ‘art’. • Someone needs to build these mappings. • Now, do it all again in multiple languages…
A lot of talk… • Foundational ontologies - shared understanding, providing intended meaning of a vocabulary. • Completeness, precision and overlap between ontologies agreement on all are needed for 'establishing consensus'. • Gets philosophical very quickly: • is a hole different from the region of space it occupies? I.e. Are there holes, or only holed objects? • Is a statue different from the stuff it is constituted by? I.e. Are there statues or only statue-shaped stuffs? • Is a person different from their body? Ontologies - There will be a lot of them…
Higher layers of the Roadmap • Rules layer - early stage work • Initial proposals to provide standardised languages for the querying of RDF: SPARQL - Joseki query engine. • Experiments with rule languages: RuleML • Proof • Authority, encryption • Trust • (PICS) • Profiles • FOAF – friend of a friend. EARL
Long Term: Agents • DigiCULT: • Agents are the final ‘product’ of the semantic web – automatic, even artificially intelligent software that does all your searching for you (the process of narrowing down) and much more. However, this is a very long term goal and there are many steps on the way, each of which can help. • Examples: • The agent attached to your diary automatically organises travel etc, and can change your travel tickets when you alter your diary. • The agent attached to your house automatically organises food purchasing, bill payment, lighting, heating, alarms etc.
Visual navigation of ontology (Sculpteur) • Visualising RDF metadata: An aid for Museum professionals, not the public. • Addis, M., et al., New Ways to Search, Navigate and Use Multimedia Museum Collections over the Web, Figure 3, in J. Trant and D. Bearman (eds.). Museums and the Web 2005: Proceedings, [CD-ROM: ISBN 1-885626-31-2] Toronto: Archives & Museum Informatics, March 31, 2005 (right click/click-hold (Mac) for notes)
Boxes, arrows – and Acronyms Context A digital object OWL User profile FOAF/ EARL Other ontologies Maps to User query, or query generated by user behaviour RDF-S/ OWL (CIDOC-CRM, SKOS) SPARQLRuleML Semantic Web Agent Maps to and is constrained by Identified ontology • Accurate, • meaningful • Answers • Actions • Views of information RDF (DC, RSS) Associated metadata
Q. Why isn’t the semantic web here? • A. Its hard to do.
Short Term: some current applications Making digital resources self-describing… • RSS – in RDF • Was ‘rich site summary’, now ‘really simple syndication’ - making simple summary information self-describing. • Mobile devices: CC/PP. • called Composite Capability/Preference Profile (CC/PP). • will let cell phones and other non- standard Web clients describe their characteristics to other software and agents. • Business: XBRL. • describes/classifies content of financial statements. • makes report generation easier. • FOAF • Friend of a Friend. • Describes people and their interests, plus network of peers. • www.foaf-project.org/ • Topic Maps. • A framework forcreating and browsing relationships. • Works within and between between systems and disciplines. • Works with RDF. • Human friendly; relatively easy to grasp how it works -browsers are in development.
Medium Term: e.g. 'smart links' As ‘semantic’ content appears browsers can be modified to use it. On mouseover…. • Metadata of target. • [More information on evolution]. • Multiple targets. • [More information on evolution]. • These do not even need to be defined as ‘links’ – simply highlighting words could initiate the ‘semantic web browser’. • Its ‘automatic for the people’. • As well as ‘smart links’ more and more ‘local domains’ of knowledge will be related by their linking ontologies. More semantic portals will appear. Author: the Natural History Museum, London. Date published: July 2005. Description: A website exploring evolution by natural selection. Audience: 12 years plus. Language: English (international). Link: definition of evolution. Link: evolution at the Natural History Museum. Link: evolution at the American Museum of Natural History. Link: Evolution on god.com. Link: evolution at New Scientist magazine. Definition: Evolution: part of natural history. Browse evolution.
So nothing practical even yet? (Semagix – can you afford it?) Semantic web portals?
Web 2.0: the Web as Application platform - first uses: social networking, content authoring and sharing, real-time GIS, feedback • Flickr • Google Maps / Earth • Del.icio.us (bookmarking) • Technorati (blog-tracking) • Wikipedia • Basecamp, ACEproject • Blogger • Open source frameworks (e.g. Drupal) • Amazon, Yahoo
AJAX • Advanced javascript to send / receive content and update parts of pages, using XML over the web • can use other messaging formats as well – thus getting around another ‘issue’ with the web from the start; pages being static. • ‘Real-time’ response to user input i.e. approaching true desktop applications on the web. Beyond the original Berners-Lee vision? Examples: • Google Maps (map data) • Basecamp (saving changes/state without reloading pages) • Writely (word processor for online collaboration) • Shell Wildlife Photographer of the Year
Social Tagging (folksonomies) • The old Yahoo / Google (DMOZ) directories method of classifying sites is hardly used as a search aid • also ungainly, complex and impossible to maintain • 'tagging' is communities of web users freely keywording their content • These sites then use popularity and associations of keywords to infer relevance/closeness of meaning • http://www.flickr.com/photos/tags/family/clusters/ • provides a simple way to group content • What about specialist knowledge? • Specialist knowledge = fewer people = worse tagging? • Go visit the steve project
RSS • Newsfeed reading via Really Simple Syndication is now huge • a simple but structured way to syndicate information or ‘broadcast’ change • A subscription model: people get the information they want delivered to them as it is generated • Gets around an original web turnoff – having to revisit favourite sites regularly. • Content from many sources can be aggregated into themed ‘feeds’ • Use of truly semantic ideas is at an early stage, • RDF is extensible • will improve as the sheer number of newsfeeds requires new layers of interpretation. • Will be embedded in next generation operating systems e.g. 24 Hour Museum
Web2.0 and the Semantic Web • Joshua Allen, 2001 (Making a Semantic Web): Until anyone can create metadata about any page and share it with everyone there will not be a semantic web • Web 2.0?! • Web 2.0 is NOT a new infrastructure for the web. It won’t do the job of providing the ‘global database’. It does take steps in the right direction.
What has the cultural sector done? Done? – this is mostly ‘old’ stuff.
We Have ARole. • Were are the holders of knowledge and authority, and can help to define the semantic web. • Thesauri owned and created by Museums could become ontologies and act as part of the backbone. • Museums are behind and will remain behind as other areas see competitive advantage: business, commerce and research. • DigiCULT Thematic issue 3, 2003 – museums need to take a lead. We need to do a big project together; Standardise thesauri, develop ontologies.
Infrastructure : The CIDOC Conceptual Reference Model • A common language and extensible semantic framework to which any cultural heritage information can be mapped. The ‘interoperability glue’. • Provides the ‘words’ and ‘relationships’ we can use to map our stuff together. • I.e. an agreed framework for our ontologies • An international standard. • Exposed in RDF already – RDF-S/OWL to follow? • http://cidoc.ics.forth.gr/ • For an introduction, download: http://www.rlg.org/en/downloads/2002metadata/gill/gill.PPT
Portal example: Sculpteur • Several collections brought together into one place, one meta – database or portal. • Content from the V&A among others. • Visual display of relationships. • A published ontology in RDF. • Concept-based searching based on a semantic network. • Content-based searching of images and 3d models. • http://www.sculpteurweb.org/ (Browser needs downloading)
Richard Light Museum thesauriin Topic Maps • Ontology framework written to thesaurus standards. • Museum thesauri turned into ontologies in Topic Map format. • Topic Map browser (Omnigator) a visual environment. • Aims to provide ‘meaning’ – an authoritative reference that software can use when searching the web. • Could become part of the future semantic web ‘backbone’. • Topic Map / RDF interoperability now a focus at W3C Museums Computer Group Newsletter, April 2004.
VICODI – ‘Visual Contextualization of Digital Content’ • ‘semi-automatic creation of contextual semantic metadata for digital historical resources, by users’. • ‘Visualisation of richly structured, contextualised content’. • Interface uses historical maps and colour-coded links. • Felt to be not generally usable in hindsight by the developers, but still in some development. • http://www.vicodi.org/