peer to peer discovery of semantic associations n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Peer-to-Peer Discovery of Semantic Associations PowerPoint Presentation
Download Presentation
Peer-to-Peer Discovery of Semantic Associations

Loading in 2 Seconds...

play fullscreen
1 / 42

Peer-to-Peer Discovery of Semantic Associations - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on

Peer-to-Peer Discovery of Semantic Associations. Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Budak Arpinar, Amit Sheth 2 nd International Workshop on Peer-to-Peer Knowledge Management, San Diego, California, July 17, 2005. Semantic Discovery 1. From …. Finding things.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Peer-to-Peer Discovery of Semantic Associations


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
peer to peer discovery of semantic associations

Peer-to-Peer Discovery of Semantic Associations

Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Budak Arpinar, Amit Sheth

2nd International Workshop on Peer-to-Peer Knowledge Management,

San Diego, California, July 17, 2005

slide2

Semantic Discovery1

From …..

Finding things

To …..

Finding out about things

Relationships!

1. http://lsdis.cs.uga.edu/semdis

semantic associations
Semantic Associations
  • Relationship-centric nature of Semantic Web data models
  • We can ask questions about the relationships between objects
  • How is entity A related to entity B?
  • Applications
    • National Security – Insider Threat1
    • Improved Searching – Bio Patent Miner2
  • B. Aleman-Meza, P. Burns, M. Eavenson, D. Palaniswami, A. Sheth, An Ontological Approach to the Document Access Problem of Insider Threat, Proceedings of the IEEE Intl. Conference on Intelligence and Security Informatics (ISI-2005), May 19-20, 2005
  • Sougata Mukherjea, Bhuvan Bamba, BioPatentMiner: An Information Retrieval System for BioMedical Patents, VLDB 2004.
semantic associations1

fname

lname

Semantic Association

“Matt”

“Perry”

Semantic Associations

Define a set of operators ρ for querying complex relationships between entities (Semantic Associations)1

name

“The University

of Georgia”

&r1

&r6

worksFor

associatedWith

ρ-path

&r5

name

“LSDIS Lab”

  • Adapted From: Kemafor Anyanwu, and Amit Sheth, ρ-Queries: Enabling Querying for Semantic Associations on the Semantic Web, The Twelfth International World Wide Web Conference, Budapest, Hungary, pp. 690-699.
uniqueness of semantic association queries
Uniqueness of Semantic Association Queries
  • Simple query specification (only the two endpoints)
  • Doesn’t require extensive knowledge of schema

ρ-path (A, B)

difficult to express with existing query languages
Difficult to express with existing Query Languages

SELECT ?startURI, ?property_1, ?endURI

FROM (?startURI ?property_1 ?endURI)

SELECT ?startURI, ?property_1, ?endURI

FROM (?endURI ?property_1 ?start)

SELECT ?startURI, ?property_1, ?x, ?property_2, ?endURI

FROM (?startURI ?property_1 ?x)(?x ?property_2 ?endURI)

WHERE ?startURI ne ?x && ?endURI ne ?x

SELECT ?startURI, ?property_1, ?x, ?property_2, ?endURI

FROM (?startURI ?property_1 ?x)(?endURI ?property_2 ?x)

WHERE ?startURI ne ?x && ?endURI ne ?x

SELECT ?startURI, ?property_1, ?x, ?property_2, ?endURI

FROM (?x ?property_1 ?startURI)(?x ?property_2 ?endURI)

WHERE ?startURI ne ?x && ?endURI ne ?x

SELECT ?startURI, ?property_1, ?x, ?property_2, ?endURI

FROM (?x ?property_1 ?startURI)(?endURI ?property_2 ?x)

WHERE ?startURI ne ?x && ?endURI ne ?x

RDQL: Find paths of length at most 2 from startURI to endURI

why semantic associations in p2p
Why Semantic Associations in P2P?
  • Data on the web by its nature is distributed
  • Knowledge will be stored in multiple stores and multiple ontologies
  • Search for semantic paths will have to include many knowledge sources
  • In the spirit of the Semantic Web (collaborative knowledge discovery)
contributions
Contributions
  • Super-Peer Architecture for Querying Semantic Associations
  • Knowledgebase Borders and Distances between Borders
  • Query Planning Algorithm based on Knowledgebase Borders and Distances
assumptions
Assumptions
  • Pair-wise mapping of resources between peers (solution to Entity Disambiguation / Reference Reconciliation problem)
  • We use URIs to solve Entity Disambiguation problem
  • Main focus is Query Planning over P2P network
  • Not concerned with fault tolerance, details of network formation, etc. at this point
slide10

“Bill”

“Jeff”

“Brown”

“Jones”

typeOf(instance)

String

purchased

Passenger

Ticket

subClassOf(isA)

fname

for

String

subPropertyOf

number

lname

forflight

String

paidby

purchased

no

creditedto

Flight

Bank

Account

String

RDF Instance Graph

Customer

Payment

amount

holder

float

ffid

FFlyer

fflierno

FFNo

String

CCard

Cash

Client

&r4

ffid

“XYZ123”

&r11

holder

fflierno

“John”

fname

purchased

&r2

paidby

&r3

&r1

“Smith”

lname

creditedto

paidby

purchased

fname

for

&r5

&r6

lname

fname

paidby

&r7

purchased

&r8

&r9

holder

lname

path problem k hop limited
ρ-path Problem (k-hop limited)
  • Given:
    • An RDF instance graph G, vertices a and b in G, an integer k
  • Find:
    • All simple, undirected paths p, with length less than or equal to k, which connect a and b
slide12
Distributed ρ-path problem:Find all paths from a start node to an end node over the distributed RDF graphs

Knowledge bases - ontologies

what do we need
What do we need?
  • Efficiently explore node neighborhoods
  • When to stop a search in one peer and continue it in another
  • Determine the search distance in each peer
  • Determine which peers to include in the search
approach

RDF data store (sesame, bhrams)

ρ-path (a, b, k)

returns subgraph

Approach

No data store

Responsible for Query Planning

Peer

KB

Peer

Peer

KB

KB

Super-Peer

Peer

subgraph

ρ-path

KB

ρ-sub-plan

ρ-sub-plan

ρ-plan

ρ-path

ρ-sub-plan

ρ-sub-plan

Super-Peer

Peer

Super-Peer

KB

Peer

KB

subgraph

ρ-path

Peer

ρ-sub-plan

ρ-sub-plan

KB

Peer

KB

Peer

KB

subgraph

ρ-path

knowledgebase borders
Knowledgebase Borders

Overlap (Peer_1:Peer_2 Border)

Peer 2

Peer 1

Border Node

distance between borders
Distance Between Borders

P1:P2

Peer 2

Peer 1

Border node

Query end point

End

P1:P3

dist (P1:P2, P1:P3) = 3

dist (P1:P2, P2:P3) = 1

Dist (P1:P3, P2:P3) = 1

P2:P3

Peer 3

Start

query planning graph
Query Planning Graph
  • Directed Graph
  • Node for each distinct border
  • For each pair of connected borders, create 2 edges (one in each direction)
  • Weight is the minimum of the minimum distances (reported by peers)
    • For example you can get from A:B to A:B:C through either A or B
slide18

A

Query Planning Graph

B

C

3

AB

2

4

3

ABC

3

2

BC

AC

5

2

3

using the query planning graph
Using the Query Planning Graph

Example Query: r-path (start, end, 10)

A

1) Find Start and End Points

2

C

2

3

2) Compute Distances to Borders

4

B

end

2

2

start

3 add this information to qpg
3) Add this Information to QPG

3

2

AB

start

2

4

3

ABC

2

4) Find all paths from start to end (including cycles) <= k (10)

4

2

3

BC

AC

In this case 22 paths

5

2

3

3

2

2

end

5 convert set of paths to set of queries
5) Convert Set of Paths to Set of Queries

start – 2  Peer_B:Peer_C – 2  Peer_B:Peer_C – 2  end

start – 2 Peer_A:Peer_B– 3 Peer_A:Peer_C– 3  end

A

3

3

2

C

2

2

B

end

2

start

converting paths to queries
Converting Paths to Queries

3

2

3

start

end

A:B

A:C

  • Each edge (pair of endpoints) represents a query
  • For example, ρ-path (start,Peer_A:Peer_B, 2)

What is the correct hop-limit?

hop-limit = edge weight + (k – path weight)

k = 10

ρ-path (start, Peer_A:Peer_B, 4)

ρ-path (Peer_A:Peer_B, Peer_A:Peer_C, 5)

ρ-path (Peer_A:Peer_C, end, 5)

which peer gets each query
Which Peer gets each query?

ρ-path (Peer_B:Peer_A, Peer_A:Peer_C, 5)

Peer_A

Peer_A

ρ-path (Peer_B:Peer_C, Peer_B:Peer_C, 5)

5

Peer_B and Peer_C

Peer_C

Peer_B

final query plan
Final Query Plan

Queries for Peer_A

FROM: Peer_A:Peer_B:Peer_C TO: Peer_A:Peer_C Hop Limit: 3

FROM: Peer_A:Peer_B TO: Peer_A:Peer_C Hop Limit: 5

FROM: Peer_A:Peer_B TO: Peer_A:Peer_B:Peer_C Hop Limit: 5

FROM: Peer_A:Peer_B TO: Peer_A:Peer_B Hop Limit: 3

Queries for Peer_B

FROM: Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 6

FROM: Peer_B:Peer_C TO: start Hop Limit: 8

FROM: Peer_A:Peer_B TO: start Hop Limit: 5

FROM: Peer_A:Peer_B TO: Peer_A:Peer_B:Peer_C Hop Limit: 5

FROM: Peer_A:Peer_B TO: Peer_A:Peer_B Hop Limit: 3

FROM: Peer_A:Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 5

FROM: Peer_A:Peer_B TO: Peer_B:Peer_C Hop Limit: 6

FROM: Peer_A:Peer_B:Peer_C TO: start Hop Limit: 7

Queries for Peer_C

FROM: Peer_B:Peer_C TO: end Hop Limit: 8

FROM: Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 6

FROM: Peer_A:Peer_C TO: Peer_B:Peer_C Hop Limit: 5

FROM: Peer_A:Peer_B:Peer_C TO: end Hop Limit: 6

FROM: Peer_A:Peer_B:Peer_C TO: Peer_A:Peer_C Hop Limit: 3

FROM: Peer_A:Peer_C TO: end Hop Limit: 5

FROM: Peer_A:Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 5

slide26

Query Execution at Peer

Input:

Set of Queries: { ρ-path ({uri, …}, {uri, …}, k), …}

Algorithm:

Graph Traversal of Main Memory representation

Bi-directional BFS

Results in a set of statements

Output:

Union of each set of statements

query execution at peer
Query Execution at Peer
  • Peer does not enumerate paths
  • Returns a subgraph (set of triples)
  • Benefits
    • Eliminates redundant data transfer
    • Saves computation time
scalability multiple super peers
Scalability: Multiple Super-Peers
  • Super-Peer/Super-Peer Borders
  • Super-Peer_1:Super-Peer_2
  • Super-Peer_1:Super-Peer_3
  • Super-Peer_2:Super-Peer_3

Super-Peer_1

Super-Peer_2

Peer_B

  • Super-Peer/Peer Borders
  • Peer_B:Super-Peer_2
  • Peer_A:Super-Peer_3
  • Peer_C:Super-Peer_3

Super-Peer_1

Peer_A

Peer_C

Super-Peer_3

integration of sp graph and peer graph
Integration of SP graph and Peer Graph

Super-Peer_1’s new Peer-Level QPG

A:B

2

4

B:SP2

4

3

A:SP3

3

2

5

0

A:B:C

3

0

4

SP1:SP2

3

2

5

B:C

A:C

SP1:SP3

2

2

0

4

C:SP3

query planning algorithm
Query Planning Algorithm

SP2

SP1

B

start

D

A

C

E

end

SP3

1) Find start and end points

2) Compute distances to borders

slide31

4

3) Add temporary information for endpoints

(both peer and super-peer QPG)

Super-Peer QPG

SP2:SP3

3

4

4) Find all directed paths <= k connecting

start to end in the Super-Peer QPG

3

6

SP1:SP2

SP1:SP3

2

2

3

6

6

end

start

10

k = 10

start – 6  SP1/SP3 – 2  SP1/SP3 – 2  end

start – 6  SP1/SP3 – 2  end

start – 3  SP1/SP2 – 6  end

start – 10  end

slide32

5) Form a list of sub-query-plan requests for each super-peer

Super-Peer_1

FROM: start TO: end Hop-Limit: 10

FROM: start TO: Super-Peer_1:Super-Peer_2 Hop-Limit: 4

FROM: SuperPeer_1:Super-Peer_2 TO: end Hop-Limit: 7

FROM: start TO: Super-Peer_1:Super-Peer_3 Hop-Limit: 8

FROM: Super-Peer_1:Super-Peer_3 TO: Super-Peer_1:Super-Peer_3 Hop-Limit: 2

FROM: Super-Peer_1:Super-Peer_3 TO: end Hop-Limit: 4

Super-Peer_3

FROM: Super-Peer_1:Super-Peer_3 TO: Super-Peer_1:Super-Peer_3 Hop-Limit: 2

slide33

7) Each super-peer goes through the previous process on its peer QPG

to form a list of ρ-path queries for its peers

Queries for Peer B:

FROM: A:B TO: A:B Hop Limit: 3

FROM: A:B TO: B:C Hop Limit: 6

FROM: A:B:C TO: B:SP2 Hop Limit: 4

FROM: A:B TO: B:SP2 Hop Limit: 2

FROM: A:SP2 TO: start Hop Limit: 4

FROM: B:C TO: B:SP2 Hop Limit: 5

FROM: B:C TO: start Hop Limit: 8

FROM: B:C TO: B:C Hop Limit: 6

FROM: A:B TO: start Hop Limit: 5

FROM: A:B TO: A:B:C Hop Limit: 5

FROM: A:B:C TO: start Hop Limit: 7

FROM: A:B:C TO: B:C Hop Limit: 5

Queries for Peer C:

FROM: A:B TO: B:C Hop Limit: 5

FROM: A:B TO: end Hop Limit: 5

FROM: A:B:C TO: end Hop Limit: 6

FROM: B:C TO: end Hop Limit: 8

FROM: B:C TO: B:C Hop Limit: 6

FROM: B:C TO: C:SP3 Hop Limit: 6

FROM: A:C TO: C:SP3 Hop Limit: 3

FROM: A:B:C TO: A:C Hop Limit: 3

FROM: A:B:C TO: B:C Hop Limit: 5

FROM: A:B:C TO: C:SP3 Hop Limit: 4

FROM: C:SP3 TO: end Hop Limit: 4

Queries for Peer E:

FROM: E:SP1 TO: E:SP1 Hop Limit: 2

Queries for Peer A:

FROM: A:B TO: A:B Hop Limit: 3

FROM: A:B:C TO: A:SP3 Hop Limit: 4

FROM: A:B TO: A:SP3 Hop Limit: 6

FROM: A:B TO: A:C Hop Limit: 5

FROM: A:B TO: A:B:C Hop Limit: 5

FROM: A:B:C TO: A:C Hop Limit: 3

FROM: A:C TO: A:SP3 Hop Limit: 3

8) Querying peer now communicates directly with other peers to execute

the ρ-path queries

conclusions and future work
Conclusions and Future Work
  • Presented a Query-Planning Algorithm for r-path queries over distributed data set
  • Problems
    • Efficiently compute node neighborhoods
    • How to continue searches across KBs
    • How to check for the many possible cases
    • How to determine search length in each KB
conclusions and future work1
Conclusions and Future Work
  • Future Work
    • Performance Testing
    • Effect of relative border size
    • Different criteria for group formation
    • How to accommodate other types of queries
computing borders
Computing Borders

Super-Peer maintains Sorted Map of URIs

  • Peer Border
    • Traverse new list and update Sorted Map
  • Super Peer Border
    • Don’t care about other URIs not in this group
    • Keep total data transferred at a minimum
slide38

Forming the Network

I want to join the network

1) Broadcast

3) List of URIs

SP2

P New

SP1

SP3

P2

P1

2) I am a super-peer

slide39

Forming the Network

6) New peer picks one

super-peer

reject

SP2

accept

P New

SP1

SP3

reject

P2

P1

4) SPs compute overlap

O(n log k) (maintain

border information)

5) Send overlap count to

new peer

slide40

Forming the Network

10) Peers send minimum distances

SP2

P New

SP1

SP3

P2

P1

7) SP1 updates permanent

uri index

9) Here are your borders

8) SP1 recomputes SP

borders

computing super peer borders
Computing Super-Peer Borders

SP2

SP1

(SP1, C, false)

(SP1, U, true)

(SP1, H, false)

(SP1, K, false)

H

H

H

H

K

K

(SP2, G, false)

(SP2, null, null)

(SP2, J, true)

(SP2, R, true)

K

K

R

R

R

R

super peer level qpg
Super-Peer Level QPG

Super-Peer 1

A

B

C

Super-

Peer 3

Super-

Peer 2