1 / 57

On Community Web Portals and the Semantic Web: A Database Perspective

On Community Web Portals and the Semantic Web: A Database Perspective. Vassilis Christophides Computer Science Department, University of Crete Institute for Computer Science - FORTH Heraklion, Crete. Portalmania!. Elements of Comparison.

onella
Download Presentation

On Community Web Portals and the Semantic Web: A Database Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Community Web Portals and the Semantic Web: A Database Perspective Vassilis ChristophidesComputer Science Department, University of CreteInstitute for Computer Science - FORTHHeraklion, Crete Christophides Vassilis

  2. Portalmania! Christophides Vassilis

  3. Elements of Comparison • Gateways to WWW resources with the aim of making information/service research simpler and more effective Christophides Vassilis

  4. Common Objectives/Goals • Community Knowledge Management • Ranging from simple vocabularies to formal ontologies • Aggregation/Integration of Community Content • Ranging from unstructured (documents) to semi-structured (web sites) and structured information (data) • Collaboration & Messaging • Ranging from simple to advanced task management (synchronous/ asynchronous) • System Integration & Security • Front end to application servers/ workflow systems • User Personalization (“pull”) & Syndicated Content Subscription (“push”) • role-based access control • information filtering (contexts/viewpoints) • customizable information rendering • location/time specific information Christophides Vassilis

  5. Advanced Knowledge Schemas (ontologies, thesauri) Heterogeneous resource descriptions Complexity and diversity of information resources <tag1> <tag2> <tag3> </tag1> Community Web Portals: Knowledge Management Christophides Vassilis

  6. Community Web Portals: A Broader Functional View Presentation Services Multiple Style Sheets Virtual Documents Access and Integration Services Content Syndication System Management Task Management Classification Metadata Security Network Application Integration Information Services Personalization Services Collaboration Services Description, Search Docs Repositories Annotations, Recommendations Messaging Workflow Christophides Vassilis

  7. On the Semantic Web • Main infrastructure for supporting Community Webs • groups of people sharing a domain of discourse and a set of information resources (e.g.,data, documents, services) and having some common interests/objectives • Higher Quality Web Information Services • having data and programs described in a way that facilitates their reuse and integration by machines across applications Workplace Education Semantic Web Commerce Health Christophides Vassilis

  8. Metadata exists for Almost Anything/Everywhere • Physical Objects, Places, People, • Devices, Networks, Infrastructure, • Digital Documents, Data, Programs, • User Profiles, Preferences, <tag1> <tag2> <tag3> </tag1> Christophides Vassilis

  9. RDF Objectives • Enables communities to define their own semantics of resource descriptions • we can disagree about semantics, but share the same infrastructure (syntax, editors, query languages, databases, etc.) • Imposes structural constraints on the expression of metadata in various application contexts • for consistent encoding, exchange and processing of metadata on the Web • Facilitates development of metadata vocabularies without central coordination • mechanisms for reusing descriptions of resources, concepts, etc. • Focus on DBMS technology for RDF metadata • Related W3C efforts on XML data management Christophides Vassilis

  10. Looking at existing RDF Applications • Audio-visual: • Internet Movie DataBase (IMDB) • Ubiquitous/Mobile/Grid Computing • Composite Capability/Preference Profile (CC/PP) • RDF Calendar Task Force • Scheduler Allocation Ontology(SAO) • E-commerce • Basic Semantic Registry (BSR) • Real Estate Data Consortium • Universal Standard Products and Services Classification (UNSPSC) • Geospatial/ Environmental: • Geography Markup Language(GML) • Costal Zone Management Ontology • Biology/Medecine • Gene Ontology • Cross-domain • Publishing/News: • Biblink • Scholarly Link Specification (Slinks) • Rich Site Summary (RSS) • Education/ Academic: • Common European Research Information Format (CERIF) • Mathematics International • Universal • IMS Global Learning Consortium • Cultural Heritage/ Archives/ Libraries: • Inter. Committee for Documentation Reference Schema (CIDOC) • Research Support Libraries – Colle ction Level Description (RSLP-CLD) • EUropean Libraries & Electronic Resources in Mathematical Sciences (Euler) Christophides Vassilis

  11. Semantic Depth of Resource Descriptions • Dictionaries and Vocabularies: • the schemas developed at this level define simple lists of concepts and their definitions • Taxonomies: • their characteristic is that the main relation they define between concepts is that of specialization • Thesauri: • besides defining relations among broader/narrower terms through the definition of hierarchies, a thesaurus also declares relations of equivalence, association and synonymy • Reference Models: • comprise a representation vocabulary for referring to the concepts in the subject area and the logical statements that describe the nature of the terms, the relations among the terms and the way the terms can or cannot be related to each other Christophides Vassilis

  12. A First Classification of RDF Schemas Christophides Vassilis

  13. Outline • Database issues for RDF metadata management • The Data Independence Issue • The Query Language Issue • The Model Issue • RDF Query Language: RQL • Querying Large RDF Schemas • Filtering/Navigating Complex RDF descriptions • Storing Voluminous RDF descriptions • Alternative DB representations • Performance Figures • The ICS-FORTH RDFSuite • Conclusions and remaining issues Christophides Vassilis

  14. The Data Independence Issue • Conceptual Level: Describing resources using one or several RDF schemas • Logical Level: How RDF descriptions and schemas are physically stored • Logical-schema: Data organization using tables, objects, etc. • Physical-schema: Data organization using files, records, indices, etc. • RDF data independence is crucial for ensuring scalability of real-scale Semantic Web applications Christophides Vassilis

  15. Find resources classified under … whose property value is …. Description Graphs Find statements whose subject is … and object is … Triple Database Find description elements whose attribute value contains …. XML Repository The Query Language Issue Querying the Semantics (RQL) Querying the Structure (Squish) Querying the Syntax (XQuery) Christophides Vassilis

  16. Why a Data Model for RDF ? • As support for physical/logical independence • RDF can be stored in files, a native repository, a relational database • RDF can be virtual, as a view of a repository, integrated sources • RDF can be in memory, using data structures in C, C++, Java, etc • RDF can be streamed between processes • To describe information content of RDF Statements • to agree and reason about information content, preservation • To define semantics of a data manipulation language: • A query language describes in a declarative fashion, the mapping between an inputinstance of thedata model to an outputinstance of thedata model Christophides Vassilis

  17. Painting Painter rdf:type rdf:type created paints 1937 fname “Pablo” &r2 r2: museoreinasofia.mcu.es/guernica.jpg created lname &r6 “Picasso” paints r3:www.artchive.com/woman.jpg 1904 &r3 r6: picasso132 But RDF has specifics: Serialization syntax • XML attributes vs elements for RDF properties • fname, lname • XML flat vs nested structures of RDF statements • Description vs. Painter elements • RDF properties are unordered, optional, and multivalued • 2 paints and 0 creates • One more motivation for a data model : • isolate the user from syntactic aspects of RDF/XML <rdf:Descriptionrdf:ID=“picasso132" fname=Pablo lname=Picasso> <paints rdf:resource="http://museoreinasofia.mcu.es/guernica.gif"/> <paints rdf:resource="http://www.artchive.com/woman.jpg”/> <rdf:type>Painter</rdf:type> </rdf:Description> <rdf:Descriptionrdf:about ="http://museoreinasofia.mcu.es/guernica.gif"> <rdf:type>Painting</rdf:type> <created>1937</created> </rdf:Description> <rdf:Descriptionrdf:about =" http://www.artchive.com/woman.jpg"> <rdf:type>Painting</rdf:type> <created>1904</created> </rdf:Description> <Painterrdf:ID=“picasso132"> <fname>Pablo</fname> <lname>Picasso</lname> <paints> <Paintingrdf:about="http://www.artchive.com/woman.jpg”/> <created>1904</created> </paints> <paints> <Paintingrdf:about="http://museoreinasofia.mcu.es/guernica.gif"> <created>1937</created> </Painting> </paints> </Painter> Christophides Vassilis

  18. fname creates String Artifact Artist String lname paints created Painting Date Painter But RDF has specifics: Schema Semantics <rdfs:Classrdf:ID="Artist"/> <rdfs:Classrdf:ID="Artifact"/> <rdfs:subClassOfrdf:resource="#Artist"/> </rdfs:Class> <rdfs:Classrdf:ID="Painter"> <rdfs:subClassOfrdf:resource="#Artist"/> </rdfs:Class> <rdfs:Classrdf:ID="Painting"> <rdfs:subClassOfrdf:resource="#Artifact"/> </rdfs:Class> <rdf:Propertyrdf:ID="fname"> <rdfs:domain rdf:resource="#Painting"/> <rdfs:rangerdf:resource=“ http://www.w3.org/rdf- datatypes.xsd#String"/> </rdf:Property> • Distinguish between labels of nodes and edges • Painter vs. paints • Class and properties are organized in subsumption hierarchies • Painter <= Artist • Properties are inherited • &r6 may also have a creates property • References are typed • &r2 should be of class <= Painting • Literal values are typed • 1937 is not a string but a date value ! <rdf:Propertyrdf:ID="creates"> <rdfs:domainrdf:resource="#Artist"/> <rdfs:rangerdf:resource="#Artifact"/> </rdf:Property> <rdf:Propertyrdf:ID="paints"> <rdfs:domainrdf:resource="#Painter"/> <rdfs:rangerdf:resource="#Painting"/> <rdfs:subPropertyOf rdf:resource="#creates"/> </rdf:Property> <rdf:Propertyrdf:ID="created"> <rdfs:domain rdf:resource="#Painting"/> <rdfs:rangerdf:resource=“ http://www.w3.org/rdf- datatypes.xsd#Date"/> </rdf:Property> Christophides Vassilis

  19. fname creates title String String Artifact Artist ExtResource String lname file_size Int paints created Painting Date Painter rdf:type title “Guernica” paints fname “Pablo” &r2 created 1937 file_size lname &r6 “Picasso” paints 4 &r3 created 1904 But RDF has specifics: Superimposed Descriptions • Resources may belong to multiple (unrelated though isa) classes • &r2 is both a Painting and an ExtResource • Heterogeneous descriptions reminiscent of SGML exceptions • What is the structure of Painting resources? rdf:type rdf:type Christophides Vassilis

  20. RDF/S vs. Well-Known Formalisms • Relational or Object Database Models (ODMG, SQL) • Classes don’t define table or object types • Instances may have associated quite different properties • Collections with heterogeneous members • Semistructured or XML Data Models (OEM, UnQL, YAT, XML Schema) • Labels only on nodes or edges • Class and property subsumption is not captured • Heterogeneous structures reminiscent to SGML exceptions • Knowledge Representation Languages (Telos, DL, F-Logic) • Absence of complex values and n-ary relationships (bags, sequences) Christophides Vassilis

  21. Painter Painter &r6 &r10 fname lname paints paints friends fname lname Painting Extresource 1 2 Painting Extresource Seq String String String String &r3 &seq1 &r2 “Pablo” “Picasso” “XXXX” “YYYY” created file_size title created Date Int String Date 1904 4 “Guernica” 1937 A Semistructured Data Model for RDF • Graph based, unordered, edge/node-labeled (in the style of OEM) • But what aboutsequences (ordered)? Christophides Vassilis

  22. Towards a Formal Data Model for RDF • An RDF schema is a 5-tuple: RS = (VS, ES, H, , ) • VS a set of nodes • ES a set of edges • Η = (Ν,<) a well-formed hierarchy of names •  an incidence function: Es VsVs •  a labeling function: VS ES Ν Τ • An RDF description base, instance of a schema RS, is a 5-tuple: RD = (VD, ED, , , ) • VD a set of nodes • ED a set of edges •  an incidence function: ED VDVD •  a valuation function: VDV •  a labeling function: VD ED 2ΝΤ : •  u  VD,  n  CT: (u) [[n]] •  e  ED [u,u’],  p Christophides Vassilis

  23. Why a Type System for RDF ? • For error detection & safety: • to verify that statements comply to what the application expects • to make sure that the application accesses valid statements • to enforce safe operations (e.g., don’t do float arithmetic on classes!) • to check that compositions of operations make sense • For performance: • to design storage (saving space, improving clustering, etc.) • to process queries (algebraic laws, rewriting path expressions, etc.) • We need a full-fledged Data Definition Language for RDF ! • RDF Schema is viewed more as an ontology & modeling tool Christophides Vassilis

  24. Towards a Type System for RDF • Type System:  = L | U | {} | [] | (1: + 2: + … + n:) • Interpretation Function: • Literal types, [[ L]] = dom(L) • Bag types, [[ {} ]] = {1,2,…,n}, 1,2,…,n  Vare values of type • Seq types, [[ [] ]] = {1,2,…,n}, 1,2,…,n  Vare values of type • Alt types, [[ (1:1 + 2:2 +…+ n:n ) ]] = I, i  V, 1<i<nis a value of typei • c  C, [[c]] = { |   (c)}{(c’) | c’ < c} • p  P, [[p]] = {[1,2] | 1  [[domain(p)]], 2  [[range(p)]]}{(p’) | p’ < p} Christophides Vassilis

  25. Querying RDF Descriptions: An Introduction to RQL Christophides Vassilis

  26. The RDF Query Language (RQL) • Declarative query language for RDF description bases • relies on a typed data model (literal & container types + union types) • follows a functional approach (basic queries and filters) • adapts the functionality of XML query languages to RDF, but also: • treats properties as self-existent individuals • exploits taxonomies of node and edge labels • allows querying of schemas as semistructured data • Relational interpretation of schemas & resource descriptions • Classes (unary relations) • Properties (binary relations) • Containers (n-ary relations) Christophides Vassilis

  27. fname creates exhibited String Artifact Museum Artist String lname Date String sculpts Sculpture Sculptor last_modified title paints technique ExtResource Painting String Painter lname creates “Rodin” &r1 &r5 last_modified exhibited 2000/06/09 &r4 title “Reina Sofia Museum” paints fname “Pablo” &r2 technique “oil on canvas” &r6 lname paints “Picasso” 2000/01/02 last_modified &r3 A Cultural Community Resource Description Example Portal Schema Portal Resource Descriptions r2: museoreinasofia.mcu.es/ guernica.jpg r1:www.rodin.fr/ thinker.gif r3:www.artchive.com/ woman.jpg r4:museoreinasofia.mcu.es Web Resources Christophides Vassilis

  28. Querying Large RDF Schemas with RQL • Basic Class Queries • topclass • subclassof(Artist) • subclassof^(Artist) • superclassof(Painter) • superclassof^(Painter) • Basic Property Queries • topproperty • subpropertyof(creates) • subpropertyof^(creates) • superpropertyof(paints) • superpropertyof^(paints) • domain(creates) • range(creates) • Querying the RDF/S meta-schema • Class • Property • Literal Christophides Vassilis

  29. Class & Property Querying • Find the domain and range of the property creates seq(domain(creates), range(creates) ) while thanks to functional composition we can express subclassof( seq (domain(creates), range(creates) ) [0] ) or selectXfrom subclassof(seq(domain(creates), range(creates))[0]) {X} • Which classes can appear as domain and range of property creates • select$X, $Yfrom{:$X}creates{:$Y} or • selectX, Yfrom Class{X}, Class{Y}, {:X}creates{:Y} • Find all properties defined on class Painting and its superclasses • select@P,range(@P) from{:Painting}@P or • selectP,range(P) fromProperty{P}wheredomain(P)>=Painting Christophides Vassilis

  30. Schema Navigation using RQL • Iterate over the subclasses of class Artist select$Xfrom Artist{:$X} or selectXfromsubclassof(Artist){X} • Find the ranges of the property exhibited which can be reached from a class in the range of property creates select$Y, $Zfromcreates{:$Y}.exhibited{:$Z} • Find the properties that can be reached from a range class of property creates, as well as, their respective ranges select *fromcreates{:$Y}.@P{:$$Z} or from Class{Y}, (Classunion Literal){Z}, creates{:Y}.@P{:Z} Christophides Vassilis

  31. Exporting Schemas using RQL Queries • Find Leaf Classes (i.e., classes without subclasses) selectC1 fromClass{C1} wherenot(C1in(selectC1 fromClass{C2} whereC2 < C1) ) • Find all schema information (i.e., group related superclasses and properties for each class) selectC, superclassof^(C), (selectP, range(P) fromProperty{P} where domain(P)=C) fromClass{C} Christophides Vassilis

  32. Querying Complex RDF Descriptions with RQL • Find all resources Resource • Find the resources in the extent of the property creates creates or select*from{X}creates{Y} • Find the resources of type ExtResource and Sculpture ExtResourceintersectSculpture ExtResourceminusSculpture ExtResourceunionSculpture Christophides Vassilis

  33. Navigating in Description Graphs using RQL • Find the Museum resources that have been modified after year 2000 (i.e., data path with node and edge labels) selectX from Museum{X}.last_modified{Y} whereY>=2000-01-01 • Find the resources that have been created and their respective titles (i.e., data path using only edge labels) selectX, Zfromcreates{Y}.title{Z} • Find the titles of exhibited resources that have been created bya Sculptor(i.e., multiple data paths) selectZ,W fromSculptor.creates{Y}.exhibited{Z},{Z}title{W} Christophides Vassilis

  34. Using Schema to Filter Resource Descriptions • Find the Painting resources that have been exhibited as well as the related target resources of type ExtResource (i.e., restrict multiply classified property target values using node labels) selectX, Y from {X:Painting}exhibited{Y}.ExtResource Note the difference with the following path exression selectX, Y from{X:Painting}exhibited{Y:ExtResource} • Find modified resources which can be reached by a propertyapplied to the class Painting and its subclasses(i.e., restrict property source values using edge labels) select@P, Y, Z from{:$X}@P.{Y}last_modified{Z} where$X <=Painting Christophides Vassilis

  35. Discover the Schema of RDF Descriptions • Find the description of a resource with URI “http://www.museum.es” select$X, (select@P,Y from{Z:$Z}@P{Y} whereX =Zand$X=$Z) from$X {X} whereX =&http://www.museum.es • Find the descriptions of resources whose URI match “www.museum.es” selectX, (select$W, (select@P,Y from{Z:$Z}@P{Y} whereW=Zand$W=$Z) from$W{W} whereW=X) fromResource{X} whereXlike"*www.museum.es*" Christophides Vassilis

  36. And if you still like triples … • Find the description of resources which are not of type ExtResource ( (selectX, @P,Yfrom{X}@P{Y}) union (selectX, type,$Xfrom$X{X}) ) minus ( (selectX, @P, Yfrom{X:ExtResource}@P{Y}) union (selectX, type,ExtResourcefromExtResource{X}) ) Christophides Vassilis

  37. Storing RDF Descriptions: RSSDB Preliminary Performance Results Christophides Vassilis

  38. typeOf(instance) subClassOf(isA) attribution ns2: www.oclc.org/dublincore.rdfs string title description Ext.Resource file_size string last_modified date title Disneyland description title title title description Officialsiteof DisneylandParis Danube Orsay SunScale Notre-Dame Hotel Siteofficielde DisneylandParis Modeling the ODP Catalog with RDF/S rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# related Class ns1: http://www.dmoz.org/topic.rdf Recreation Regional Paris Lodging Travel Vacation- Rentals Hotel Ile-de-France related Hotel Directories &r2 &r4 &r3 &r1 &r1: http://www.sunscale.com/france/paris/index.htm Christophides Vassilis

  39. ODP Statistics • ODP Version: 16-01-2001 • 170 Mbytes of class hierarchies • 700 Mbytes of resource descriptions • 337,085 topics • 16 hierarchies with • max depth: 13 ( 6.86 on average) • max # subclasses: 314 ( 4.02 on average) • 2,342,978 URIs Christophides Vassilis

  40. Generic Representation Resources Triples uri: text id: int predid: int subid: int objid: int objvalue: text http://www.dmoz.org/topics.rdfs#Hotel 1 6 2 1 2 http://www.dmoz.org/topics.rdfs#Hotel Directories 5 3 7 3 http://www.oclc.org/dublincore.rdfs#title 5 1 8 4 http://www.dmoz.org/schema.rdf#Ext.Resource 5 9 2 5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 3 9 SunScale http://www.w3.org/2000/01/rdf-schema#subClassOf 6 7 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property 8 http://www.w3.org/2000/01/rdf-schema#Class r1 9 Christophides Vassilis

  41. Instances t1 uri: text classid: int URI: text r1 11 r2 11 t11 t12 t14 r1 13 URI: text URI: text source: text target: text r2 12 t15 r1 r2 r1 SunScale r2 t13 source: text target: text r2 Pulitzer Opera t16 URI: text source: text target: text r1 subtable Specific Representation Namespace Type id:int uri: text id: int nsid: int lpart: text 1 http://www.w3.org/2000/01/rdf-schema# 1 1 Resource 2 http://www.w3.org/1999/02/22-rdf-syntax-ns# 2 2 Bag 3 http://www.oclc.org/dublincore.rdfs# 3 2 Seq 4 http://www.dmoz.org/topics.rdfs# 4 String Property Class id: int nsid: int lpart: text id: int nsid: int lpart: text domainid: int rangeid: int 11 5 Ext.Resource 14 3 title 1 4 15 12 4 Hotel 3 description 1 4 13 4 Hotel Directories 16 5 title 11 4 SubClass SubProperty subid: int superid: int subid: int superid: int 11 1 16 14 12 1 13 12 Christophides Vassilis

  42. DBMS Size vs. Schema Triples • DBMS size scales linearly with the number of schema triples Christophides Vassilis

  43. DBMS Size vs. Data Triples • DBMS size scales linearly with the number of data triples Christophides Vassilis

  44. Query Templates for RDF description bases Christophides Vassilis

  45. Execution Time of RDF Benchmark Queries Christophides Vassilis

  46. Comparison • Specific Representation permits the customization of the database representation of RDF metadata • Specific Representation outperforms the Generic Representation for all types of queries • Q1, Q2, Q5, Q7, Q10, Q11: by a factor up to 3.73 • Q3, Q4, Q6: by a factor up to 2.8 • Q8, Q9: by a factor up to 95,538 • Generic representation pays severe penalty for maintaining large tables (Triples, Resources) • e.g., queries Q8, Q9 require (self-) joins of Triples, Resources Christophides Vassilis

  47. Other Issues • RDF Metadata Generation from Legacy Repositories: • need to capture schemas from heterogeneous resources • RDF Schema Evolution and Metadata Revision: • to support the dynamics of resource descriptions • RDF Repositories Distribution: • for integration with WebDAV or LDAP-like architectures • RDF Query Languages Optimization: • for real-scale Semantic Web applications Christophides Vassilis

  48. The ICS-FORTH R&D Activities on the Semantic Web Christophides Vassilis

  49. The C-Web Project • EC IST Project (13479) 1999-2000 • Overall Aim: Set-up methodologies and infrastructure for fast deployment and easy management of Web Portals forcommunitiesrequiring • effective knowledge assimilation,elicitation • efficient query answering • Partners: INRIA(FR), FORTH(GR), EDW(IT) • Running Application Scenario: Learning Portals for intranets or the Internet • Corporate Knowledge Servers (e.g., automobile, telecommunications) • Memory Organizations (e.g., museums, libraries, archives) Christophides Vassilis

  50. Programme:(IST) KAIII.1.4. Multimedia Content and Tools (Access to digital collections of cultural and scientific content) Contract: IST-2000-26074 (02/2001 – 07/2003) Partners: INRIA (France), FINSIEL - Multimedia Services (Italy), ICS-FORTH (Hellas), ENSTB - Ecole Nationale Supérieure des Télécommunications - Bretagne (France), VALORIS - Group, Paris (France) IMSS - Istituto e Museo di Storia della Scienza, Firenze (Italy) CSI - Cité des Sciences et de l'Industrie, Paris (France) EDW International, Milano (Italy) DET-UNIFI - University of Florence (Italy) Project “MESMUSES” Christophides Vassilis

More Related