MAPPING DATA IN PEER-TO-PEER SYSTEMS:SEMANTICS AND ALGORITHMIC ISSUES
Download
1 / 47

OUTLINE - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

MAPPING DATA IN PEER-TO-PEER SYSTEMS:SEMANTICS AND ALGORITHMIC ISSUES Department of Computer Science University of Toronto Anastasios Kementsietsidis & Marcelo Arenas & Renee J.Miller presented by Ahmet OLGUN& Suzan BAYHAN. OUTLINE. 1-ABSTRACT 2-INTRODUCTION 3-MOTIVATING EXAMPLE

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' OUTLINE' - russ


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

MAPPING DATA IN PEER-TO-PEER SYSTEMS:SEMANTICS AND ALGORITHMIC ISSUESDepartment of Computer Science University of TorontoAnastasios Kementsietsidis & Marcelo Arenas & Renee J.Millerpresented by Ahmet OLGUN& Suzan BAYHAN


Outline
OUTLINE ALGORITHMIC ISSUES

1-ABSTRACT

2-INTRODUCTION

3-MOTIVATING EXAMPLE

4-MAPPING TABLES

5-MAPPING AS CONSTRAINTS

6-CONSISTENCY AND INTERFERENCE

7-THE ALGORITHM

8-EXPERIMENTAL RESULTS

9-CONCLUSIONS


Abstract
ABSTRACT ALGORITHMIC ISSUES

  • PROBLEM OF MAPPING DATA IN PEER-TO-PEER DATA SHARING SYSTEMS(PPDSS)

  • MAPPING TABLES LISTING CORRESPONDING VALUES IN A PPDSS

  • WHY TABLES ARE APPROPRIATE

  • A LANGUAGE TO SPECIFY MAPPING TABLES UNDER DIFFERENT SEMANTICS

  • COMPLEXITY OF THE PROBLEM

  • AN EFFICIENT ALGORITHM FOR ITS SOLUTION

  • IMPLEMENTATION WITH EXPERIMENTAL RESULTS

  • HYPERION PROJECT


Introduction
INTRODUCTION ALGORITHMIC ISSUES

  • Traditionally data integration and exchange bw heterogeneous data sources is provided mainly through use of views i.e., queries

  • Sources share their schemas and cooperate

  • BUT IN OUR WORK SUCH CLOSE COOPERATION IS

  • Not desirable (PRIVACY)

  • Not feasible (maybe due to resource limitations)


Similarity with file sharing systems
SIMILARITY WITH FILE-SHARING SYSTEMS ALGORITHMIC ISSUES

  • TO FIND DATA WHEN THERE IS NO AGREEMENT ON THE LOGICAL DESIGN OF DATA,

    FOCUS ON VALUES AND HOW THEY CORRESPOND

  • IN FILE SHARING SYSTEMS LIKE NAPSTERAND GNUTELLA ,QUERYING IS DONE ON SIMPLE VALUE SEARCH OF FILE NAMES

  • QUERIES ARE OF THE FORM:

    “RETRIEVE ALL FILES NAMED X”

    EASY BECAUSE THERE IS A CONSENSUS ON NAMES


What if no accepted naming standard
WHAT IF NO ACCEPTED NAMING STANDARD??? ALGORITHMIC ISSUES

  • Each peer has to develop its own naming standard

  • Conforming external standards is time-consuming and expensive

    So to search data in such environments MAPPING TABLES that store correspondence between values.

  • At simplest, tables are binary tables corresponding identifiers from two different sources

  • Mapping Tables represent EXPERT KNOWLEDGE


Motivating example
MOTIVATING EXAMPLE ALGORITHMIC ISSUES

  • DOMAIN:BIOLOGICAL DATABASES

    * GENE DATABASEGDB

    * PROTEIN DATABASESwissProt

    * GENETIC DISORDERS AND RELATED GENES DATABASEMIM


Example contd
EXAMPLE (CONTD) ALGORITHMIC ISSUES

  • Integration of these resources is extremely desirable for scientists to have uniforn access BUT SEEMS UNATTAINABLE due to political,financial and technical reasons.

  • Among technical reasons , heterogeneity of sources like formatted files,spreadsheets,relational databases


Main characteristics and use of mapping tables
MAIN CHARACTERISTICS AND USE OF MAPPING TABLES ALGORITHMIC ISSUES

  • Associations within and Across Domains

  • Peer Autonomy

  • Semantics

  • Automated discovery of mappings


Association within and across domains
Association within and Across Domains ALGORITHMIC ISSUES

  • Mapping table is not necessarily a function

  • By mapping tables we associate seemingly unconnect databases

  • Disjoint worlds can be associated since the corresponding worlds are semantically close to each other


Peer autonomy
Peer Autonomy ALGORITHMIC ISSUES

  • Autonomy has high importance in peer-to-peer systems.

  • Mapping tables do not restrict the operation of peers in any way beyond the agreement on values expressed in the tables.


Mapping table 1
Mapping Table 1 ALGORITHMIC ISSUES

Figure 1


Semantics
Semantics ALGORITHMIC ISSUES

  • Experts have varying degree of expertise,so we should better show the confidence level of mapping tables

    A tuple :(X,Y)

  • If X value appearing in a mapping table follows the open-world semantics then it can be associated with any Y value-Partial Information about X


Closed world
Closed World ALGORITHMIC ISSUES

  • If X follows Closed-World semantics, then values in the table can only be associated with the specified Y values.

  • 4 alternatives

    1-OO (No specific information,no practical interest)

    2-OC (Partial knowledge)

    3-CO(Partial knowledge)

    4-CC(complete knowledge)


Open closed world
Open/Closed World ALGORITHMIC ISSUES

Table 1:Alternative open/closed world semantics


Automated discovery
Automated Discovery ALGORITHMIC ISSUES

  • Given a semantics for mapping tables, to reason about them,treat mapping tables as constraints on the exchange of information.

  • Simplest way to combine tables CONJUNCTION


Example mapping tables
Example Mapping Tables ALGORITHMIC ISSUES


Mapping tables
MAPPING TABLES ALGORITHMIC ISSUES

  • A,B,C,D  individual attributes

  • dom(A)  domain of A like integers,characters

  • U,X,Y  set of attributes

  • R  a relational schema

  • R[U]  attributes of a schema

  • r  relation instance

  • t  tuples


Mapping tables contd
MAPPING TABLES(contd) ALGORITHMIC ISSUES

t[X]values of tuple t in attributes of X

X={A1,A2.... Ak}

dom(X)=dom(A1)Xdom(A2)X...Xdom(Ak)

To represent different semantics of mapping tables,it is necessary to introduce variables

V a set of variables where V∩dom(A)=Φ for each attribute of A


Definition 1
DEFINITION 1 ALGORITHMIC ISSUES

  • Given a set of attributes U,t is a mapping over U if for each AєU,t[A] is either a constant in dom(A),a variable in V or an expression of the form v-S,where vєV and S is a finite subset of dom(A)


Definition 2
DEFINITION 2 ALGORITHMIC ISSUES

  • Let X and Y be nonempty disjoint set of attributes. A mapping table m from X to Y is a finite set of mappings over X UYsuch that each variable appears in at most one mapping


Definition 21
DEFINITION 2 ALGORITHMIC ISSUES

  • Set of mappings”mapping table”

  • Tablerelations containing variables

  • RESTRICT:Each variable appears in at most one mapping

  • TWO DIFFERENT MAPPINGS ARE COMPLETELY INDEPENDENT


Definition 3
DEFINITION 3 ALGORITHMIC ISSUES

  • A valuation ρ over a mapping table m is a function that maps each constant value in m to itself and each variable v of m to a value in the intersection of the domains of the attributes where v appears.Furthermore,if v appears in an expression of the form v-S,then ρ(v) is not an element of S.


Mapping as constraints
MAPPING AS CONSTRAINTS ALGORITHMIC ISSUES

  • View mapping tables as constraints on the exchange of information between sources

  • Given a set of mapping constraints,we are able to infer new mapping constraints and check the consistency of the constraints


Consistency inference
CONSISTENCY& INFERENCE ALGORITHMIC ISSUES

  • Infer new mapping tables: Combine the knowledge from mapping tables available in a network of peers

  • Determine consistency of mapping tables:Automated inference and consistency checks will help a curator to see whether semantics are valid


Problem definition
Problem Definition ALGORITHMIC ISSUES

  • Given a mapping constraint formula (MCF) Φ over a set of attributes U, Φ is consistent if there exists a nonempty relation r of U satisfying Φ.

  • Inference problem is the problem of verifying whether a set of MCFs implies another MCF


Theorems
Theorems ALGORITHMIC ISSUES

  • Theorem: The consistency problem for conjunctions of mapping constraints is NP-complete.

  • Theorem: If the length of the paths or number of mapping constraints is fixed then the consistency problem for the conjunctions of mapping constraints is NP-complete.


Assumptions
Assumptions ALGORITHMIC ISSUES

Assumptions to solve the consistency problem:

  • Number of mapping constraints per peer is small

  • The length of paths is small

    For example in Gnutella paths have maximum size of 7


The algorithm
THE ALGORITHM ALGORITHMIC ISSUES

θ=P1,P2,..,Pn a path of peers

Ui set of attributes at each peer

Σset of constraints over path θ

μ :X Y a mapping constraint

ext(μ )={ρ(t) | t єm and ρ is a valuation over m}


The algorithm1
THE ALGORITHM ALGORITHMIC ISSUES

1- Σis consistent iff there exists t єext(μ)

2-μ’:XY, Σ μ’ iff ext(μ)  ext(μ’)

For inference: check 2 if Σ μ’

For consistency:check 1.


Design decisions p 1 p 2 p 3 p 4 path
Design Decisions:P ALGORITHMIC ISSUES1,P2,P3,P4 path


Algorithm for computing the cover
Algorithm for computing the cover ALGORITHMIC ISSUES

  • P1 sends all mapping constraints to P2

  • P2 uses those constraints with his own to create a cover between P1 and P3

  • P2 forwards cover to P3

  • P3 does the same thing to create a cover bw P1 and P4

  • P3 sends the computed cover back to P1


Problems
Problems ALGORITHMIC ISSUES

  • Unnecessary computation

    Cover involving A6 can be done locally

  • Does not work in streaming fashion

    P1 has to wait for the whole computation to finish to get the cover between itself and P4

    So ?...


Partitions
Partitions ALGORITHMIC ISSUES

Peer P2

Peer P1

π5

π1

π6

π7

π2

Peer P3

π3

π8

π4

π9


Description of the algorithm
Description of the Algorithm ALGORITHMIC ISSUES

Two phases:

  • Information gathering

  • Computation


Information gathering
Information Gathering ALGORITHMIC ISSUES

  • P1 sends to P2 the set of attributes at each partition BUT NO MAPPINGS

  • P2 computes inferred partitions

  • Inferred partitions to discover interdependencies or lack thereof bw partitions

  • Then computation phase


Inferred partitions
Inferred Partitions ALGORITHMIC ISSUES

Peer P1 Peer P2


Computation phase
Computation Phase ALGORITHMIC ISSUES

  • The computation starts at penultimate peer

  • Cover between P3 and P4 computed and sent to P2

  • Cover between P2 and P4 computed and streamed to P1

  • Cover between P1 and P4 computed


Experimental results
EXPERIMENTAL RESULTS ALGORITHMIC ISSUES

  • Do our solutions provide added value for communities that already use mapping tables extenxively?

  • Are characteristics of our algorithm appropriate and effective in a peer-to-peer environment?


Implementation
Implementation ALGORITHMIC ISSUES

  • Geographically distributed machines with one peer per machine

  • Each peer has 2 modules:

  • First module interacts with the storage manager to retrieve mappings and perform cover

  • Second is peer-to-peer networking protocol


Implementation1
Implementation ALGORITHMIC ISSUES

  • Each peer decides how much cache to use

  • Biology Domain:6 Biological DB used

    GDBMIMSwissProtHugoLocusUnigene

  • Tabe sizes range from 7000 to 28000 mappings with an average of 13000.

  • B2B Domain:business-to-business setting


Results
Results ALGORITHMIC ISSUES

  • Cache sizes from 64 to 128 mappings result

    the best running times for those data character

  • B2B

    Complex semantics for tables,but still efficient new mappings

    Total execution time scales linearly with the number of computed mappings


Conclusion
CONCLUSION ALGORITHMIC ISSUES

  • Problem of managing collections of mapping tables

  • Alternative semantics for tables

  • A language that allows specification of mapping tables under different semantics

  • Complexity of Inference and consistency

  • An algorithm to solve the problem


ANY QUESTIONS? ALGORITHMIC ISSUES

THANK YOU...


ad