1 / 25

Semantic Network Analysis 11.07.05

Semantic Network Analysis 11.07.05. Analyzing Semantic Interoperability in Bioinformatic Database Networks Philippe Cudré-Mauroux, EPFL Joint work with: Julien Gaugaz, Adriana Budura and Karl Aberer. Overview. Peer Data Management Systems (PDMS)

iola-norman
Download Presentation

Semantic Network Analysis 11.07.05

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Network Analysis 11.07.05 Analyzing Semantic Interoperability in Bioinformatic Database Networks Philippe Cudré-Mauroux, EPFL Joint work with: Julien Gaugaz, Adriana Budura and Karl Aberer

  2. Overview • Peer Data Management Systems (PDMS) • Semantic Interoperability in the Large • Generatingfunctionologic framework • The Sequence Retrieval System • Degree distribution • Analysis of giant component • Weighted analysis • Conclusions

  3. Beyond Keyword Search • searching semantically richer objects in large scale heterogeneous networks <xap:CreateDate>2001-12-19T18:49:03Z</xap:CreateDate> <xap:ModifyDate>2001-12-19T20:09:28Z</xap:ModifyDate> date? <es:DofCreation> 05/08/2004 </es:DofCreation> ? ? ? ? ? <myRDF:Date> Jan 1, 2005 </myRDF:Date>

  4. VS Decentralized Data Integration • Distributed Databases • Number of sources < 100 • Consistent data • Coordination • Structured data • E.g., Relational data model • Integrity constraints • Transactions • Powerful queries • E.g., SQL, aggregation • Schemas created by administrators • Relatively Fixed topology • Large Scale Information Systems (e.g., WWW) • Number of sources > 100 • Unreliable data • Autonomy • Semi-structured data • E.g., XML/RDF • No integrity constraints • No transactions • Simple SP queries • E.g., triple patterns, ranking • Schemata created by end users • Network churn

  5. Data Integration: LAV/GAV • Traditional database techniques (e.g., LAV/GAV) rely on centralizedschemas to integrate data sources • Not applicable to our context • Scale (upper ontologies?) • Churn • Autonomy • How can we foster semantic interoperability in decentralized settings? Date m(Date) = myDate m(Date) = yourDate myDate yourDate

  6. Semantic Interoperability Q2=<GUID>$p/GUID</GUID> FOR $p IN T12WHERE $p/Creator LIKE "%Robi%" Q1=<GUID>$p/GUID</GUID> FOR $p IN /Photoshop_Image WHERE $p/Creator LIKE "%Robi%" Extending semantic interoperability techniques to decentralized settings Photoshop (own schema) WinFS (known schema) <Photoshop_Image> <GUID>178A8CD8865</GUID> <Creator>Robinson</Creator> <Subject> <Bag> <Item> Tunbridge Wells</Item> <Item>Royal Council</Item> </Bag> </Subject> … </Photoshop_Image> <WinFSImage> <GUID>178A8CD8866</GUID> <Author> <DisplayName> Henry Peach Robinson <DisplayName> <Role>Photographer</Role> <Author> <Keyword> Tunbridge </Keyword> <Keyword>Council</Keyword> … </WinFSImage> T12 = <Photoshop_Image> <GUID>$fs/GUID</GUID> <Creator> $fs/Author/DisplayName </Creator></Photoshop_Image>FOR $fs IN /WinFSImage

  7. <xap:CreateDate>2001-12-19T18:49:03Z</xap:CreateDate> <xap:ModifyDate>2001-12-19T20:09:28Z</xap:ModifyDate> date? <es:cDate> 05/08/2004 </es:cDate> myRDF:Date xap:ModifyDate es:cDate  myRDF:Date <myRDF:Date> Jan 1, 2005 </myRDF:Date> 1. Peer Data Management Systems • Pairwise mappings • Peer Data Management Systems (PDMS) • Local mappings overcome global heterogeneity • Iterative query rewriting es:cDate  xap:CreateDate weather article

  8. Semantic Mediation Layer Semantic Mediation Layer Correlated / Uncorrelated Overlay Layer Correlated / Uncorrelated “Physical” layer

  9. Schema-to-Schema Graph • Inter-organization of the different schemas used by the peers • Logical model • Directed • Weighted • Redundant

  10. The Semantic Connectivity Graph • Definition (Semantic Interoperability) Two peers are said to be semantically interoperable if they can forward queries to each other in the Schema-to-Schema graph, potentially through series of semantic translation links • Idea • As for physical network analyses, create a connectivity layer to account for semantic interoperability • The semantic connectivity Graph S • Unweighted, irreflexive and non-redundant version of the Schema-to-Schema graph

  11. Observations • Theorem Peers in a set Ps are semantically interoperable iff Ss is strongly connected, with Ss {s | p Ps, ps} • Observation 1 A set of peers Pscannot be semantically interoperable if |Es| <|Vs| • Observation 2 A set of peers Psis semantically interoperable if |Es| >|Vs| (|Vs|-1) - (|Vs|-1)

  12. 2. Semantic Interoperability in the Large • Question • How can we analyze semantic interoperability in large-scale PDMS? • Idea: use percolation theory to detect the emergence of a strongly connected component in S • Necessary condition for vertex-strong connectivity • Necessary condition for semantic interoperability

  13. The Model • Adaptation of a recent graph-theoretic framework • Newman, Strogatz, Watts 2001 • Large-scale semantic graphs as random graphs with arbitrary degree distribution • Exponentially distributed, small-world, scale-free… graphs • Specificities of our model • Strong clustering (clustering coefficient cc) • Bidirectionality (bidirectionality coefficient bc) (for directed networks) • Based on generatingfunctionology • Percolation: ci > 0

  14. Size of the giant component With u the smallest non-negative solution of And G1the distribution of edges from first to second-order neighbors:

  15. 3. The Sequence Retrieval System (SRS) • Commercial information indexing and retrieval system • Bioinformatic libraries • EMBL • SwissProt • Prosite • Etc. • Schemas described in a custom language (Icarus) • Mappings (links) from one database to others

  16. Why is SRS interesting? • Applying our heuristics on a real large-scale corpus of interconnected databases • More than 380 databanks • More than 500 (undirected) links • Data used by professionals on a daily basis

  17. Crawling the SRS schema-to-schema graph • Custom crawler • As of May 2005 (EBI repository) • 388 nodes • 518 edges • Giant connected component: 187 nodes • Power-law distribution of node degrees • Clustering coefficient = 0.32 • Diameter = 9

  18. Results • Connectivity indicator ci = 25.4 • Super-critical state • Size of the giant component • 0.47 (derived) • 0.48 (observed)

  19. Graphs with same power-law degree distr. • Varying number of edges

  20. 10x Bigger Graph

  21. Analyzing weighted networks • Do we have a sufficient number of good mappings? • Introducing quality measures from the mappings • Weights • Attribute / schema level • Cf. Chatty Web (WWW03) • Semantic query forwarding • Per-hop forwarding behaviors • Only forward if wi >=  •  = 0 : flooding •  = 1 : exact answers

  22. Weighted Results • Same degree distribution (388 nodes) • Uniformly distributed weights between 0 and 1

  23. 4. Conclusions • Analyzing a real network of bioinformatic databases • Accurate results (even for relatively small networks) • Weighted / unweighted • Current works • Compositions of weights along a path • Semantic random walkers • Public domain simulator • Future works • Analyzing other forwarding behaviors • Implementation in a real PDMS (self-organizing mappings) • GridVine

  24. References A Necessary Condition for Semantic Interoperability in the Large Philippe Cudré-Mauroux and Karl Aberer ODBASE 2004 GridVine: Building Internet-Scale Semantic Overlay Networks Karl Aberer, Philippe Cudré-Mauroux and Tim van Pelt ISWC 2004 Semantic Overlay Networks (Tutorial) Karl Aberer and Philippe Cudré-Mauroux VLDB 2005 … complete reference list at http://lsirpeople.epfl.ch/pcudre/

  25. Thank you for your attention Questions ?

More Related