- By
**russ** - Follow User

- 100 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' OUTLINE' - russ

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

MAPPING DATA IN PEER-TO-PEER SYSTEMS:SEMANTICS AND ALGORITHMIC ISSUESDepartment of Computer Science University of TorontoAnastasios Kementsietsidis & Marcelo Arenas & Renee J.Millerpresented by Ahmet OLGUN& Suzan BAYHAN

OUTLINE ALGORITHMIC ISSUES

1-ABSTRACT

2-INTRODUCTION

3-MOTIVATING EXAMPLE

4-MAPPING TABLES

5-MAPPING AS CONSTRAINTS

6-CONSISTENCY AND INTERFERENCE

7-THE ALGORITHM

8-EXPERIMENTAL RESULTS

9-CONCLUSIONS

ABSTRACT ALGORITHMIC ISSUES

- PROBLEM OF MAPPING DATA IN PEER-TO-PEER DATA SHARING SYSTEMS(PPDSS)
- MAPPING TABLES LISTING CORRESPONDING VALUES IN A PPDSS
- WHY TABLES ARE APPROPRIATE
- A LANGUAGE TO SPECIFY MAPPING TABLES UNDER DIFFERENT SEMANTICS
- COMPLEXITY OF THE PROBLEM
- AN EFFICIENT ALGORITHM FOR ITS SOLUTION
- IMPLEMENTATION WITH EXPERIMENTAL RESULTS
- HYPERION PROJECT

INTRODUCTION ALGORITHMIC ISSUES

- Traditionally data integration and exchange bw heterogeneous data sources is provided mainly through use of views i.e., queries
- Sources share their schemas and cooperate
- BUT IN OUR WORK SUCH CLOSE COOPERATION IS
- Not desirable (PRIVACY)
- Not feasible (maybe due to resource limitations)

SIMILARITY WITH FILE-SHARING SYSTEMS ALGORITHMIC ISSUES

- TO FIND DATA WHEN THERE IS NO AGREEMENT ON THE LOGICAL DESIGN OF DATA,
FOCUS ON VALUES AND HOW THEY CORRESPOND

- IN FILE SHARING SYSTEMS LIKE NAPSTERAND GNUTELLA ,QUERYING IS DONE ON SIMPLE VALUE SEARCH OF FILE NAMES
- QUERIES ARE OF THE FORM:
“RETRIEVE ALL FILES NAMED X”

EASY BECAUSE THERE IS A CONSENSUS ON NAMES

WHAT IF NO ACCEPTED NAMING STANDARD??? ALGORITHMIC ISSUES

- Each peer has to develop its own naming standard
- Conforming external standards is time-consuming and expensive
So to search data in such environments MAPPING TABLES that store correspondence between values.

- At simplest, tables are binary tables corresponding identifiers from two different sources
- Mapping Tables represent EXPERT KNOWLEDGE

MOTIVATING EXAMPLE ALGORITHMIC ISSUES

- DOMAIN:BIOLOGICAL DATABASES
* GENE DATABASEGDB

* PROTEIN DATABASESwissProt

* GENETIC DISORDERS AND RELATED GENES DATABASEMIM

EXAMPLE (CONTD) ALGORITHMIC ISSUES

- Integration of these resources is extremely desirable for scientists to have uniforn access BUT SEEMS UNATTAINABLE due to political,financial and technical reasons.
- Among technical reasons , heterogeneity of sources like formatted files,spreadsheets,relational databases

MAIN CHARACTERISTICS AND USE OF MAPPING TABLES ALGORITHMIC ISSUES

- Associations within and Across Domains
- Peer Autonomy
- Semantics
- Automated discovery of mappings

Association within and Across Domains ALGORITHMIC ISSUES

- Mapping table is not necessarily a function
- By mapping tables we associate seemingly unconnect databases
- Disjoint worlds can be associated since the corresponding worlds are semantically close to each other

Peer Autonomy ALGORITHMIC ISSUES

- Autonomy has high importance in peer-to-peer systems.
- Mapping tables do not restrict the operation of peers in any way beyond the agreement on values expressed in the tables.

Mapping Table 1 ALGORITHMIC ISSUES

Figure 1

Semantics ALGORITHMIC ISSUES

- Experts have varying degree of expertise,so we should better show the confidence level of mapping tables
A tuple :(X,Y)

- If X value appearing in a mapping table follows the open-world semantics then it can be associated with any Y value-Partial Information about X

Closed World ALGORITHMIC ISSUES

- If X follows Closed-World semantics, then values in the table can only be associated with the specified Y values.
- 4 alternatives
1-OO (No specific information,no practical interest)

2-OC (Partial knowledge)

3-CO(Partial knowledge)

4-CC(complete knowledge)

Open/Closed World ALGORITHMIC ISSUES

Table 1:Alternative open/closed world semantics

Automated Discovery ALGORITHMIC ISSUES

- Given a semantics for mapping tables, to reason about them,treat mapping tables as constraints on the exchange of information.
- Simplest way to combine tables CONJUNCTION

Example Mapping Tables ALGORITHMIC ISSUES

MAPPING TABLES ALGORITHMIC ISSUES

- A,B,C,D individual attributes
- dom(A) domain of A like integers,characters
- U,X,Y set of attributes
- R a relational schema
- R[U] attributes of a schema
- r relation instance
- t tuples

MAPPING TABLES(contd) ALGORITHMIC ISSUES

t[X]values of tuple t in attributes of X

X={A1,A2.... Ak}

dom(X)=dom(A1)Xdom(A2)X...Xdom(Ak)

To represent different semantics of mapping tables,it is necessary to introduce variables

V a set of variables where V∩dom(A)=Φ for each attribute of A

DEFINITION 1 ALGORITHMIC ISSUES

- Given a set of attributes U,t is a mapping over U if for each AєU,t[A] is either a constant in dom(A),a variable in V or an expression of the form v-S,where vєV and S is a finite subset of dom(A)

DEFINITION 2 ALGORITHMIC ISSUES

- Let X and Y be nonempty disjoint set of attributes. A mapping table m from X to Y is a finite set of mappings over X UYsuch that each variable appears in at most one mapping

DEFINITION 2 ALGORITHMIC ISSUES

- Set of mappings”mapping table”
- Tablerelations containing variables
- RESTRICT:Each variable appears in at most one mapping
- TWO DIFFERENT MAPPINGS ARE COMPLETELY INDEPENDENT

DEFINITION 3 ALGORITHMIC ISSUES

- A valuation ρ over a mapping table m is a function that maps each constant value in m to itself and each variable v of m to a value in the intersection of the domains of the attributes where v appears.Furthermore,if v appears in an expression of the form v-S,then ρ(v) is not an element of S.

MAPPING AS CONSTRAINTS ALGORITHMIC ISSUES

- View mapping tables as constraints on the exchange of information between sources
- Given a set of mapping constraints,we are able to infer new mapping constraints and check the consistency of the constraints

CONSISTENCY& INFERENCE ALGORITHMIC ISSUES

- Infer new mapping tables: Combine the knowledge from mapping tables available in a network of peers
- Determine consistency of mapping tables:Automated inference and consistency checks will help a curator to see whether semantics are valid

Problem Definition ALGORITHMIC ISSUES

- Given a mapping constraint formula (MCF) Φ over a set of attributes U, Φ is consistent if there exists a nonempty relation r of U satisfying Φ.
- Inference problem is the problem of verifying whether a set of MCFs implies another MCF

Theorems ALGORITHMIC ISSUES

- Theorem: The consistency problem for conjunctions of mapping constraints is NP-complete.
- Theorem: If the length of the paths or number of mapping constraints is fixed then the consistency problem for the conjunctions of mapping constraints is NP-complete.

Assumptions ALGORITHMIC ISSUES

Assumptions to solve the consistency problem:

- Number of mapping constraints per peer is small
- The length of paths is small
For example in Gnutella paths have maximum size of 7

THE ALGORITHM ALGORITHMIC ISSUES

θ=P1,P2,..,Pn a path of peers

Ui set of attributes at each peer

Σset of constraints over path θ

μ :X Y a mapping constraint

ext(μ )={ρ(t) | t єm and ρ is a valuation over m}

THE ALGORITHM ALGORITHMIC ISSUES

1- Σis consistent iff there exists t єext(μ)

2-μ’:XY, Σ μ’ iff ext(μ) ext(μ’)

For inference: check 2 if Σ μ’

For consistency:check 1.

Design Decisions:P ALGORITHMIC ISSUES1,P2,P3,P4 path

Algorithm for computing the cover ALGORITHMIC ISSUES

- P1 sends all mapping constraints to P2
- P2 uses those constraints with his own to create a cover between P1 and P3
- P2 forwards cover to P3
- P3 does the same thing to create a cover bw P1 and P4
- P3 sends the computed cover back to P1

Problems ALGORITHMIC ISSUES

- Unnecessary computation
Cover involving A6 can be done locally

- Does not work in streaming fashion
P1 has to wait for the whole computation to finish to get the cover between itself and P4

So ?...

Information Gathering ALGORITHMIC ISSUES

- P1 sends to P2 the set of attributes at each partition BUT NO MAPPINGS
- P2 computes inferred partitions
- Inferred partitions to discover interdependencies or lack thereof bw partitions
- Then computation phase

Inferred Partitions ALGORITHMIC ISSUES

Peer P1 Peer P2

Computation Phase ALGORITHMIC ISSUES

- The computation starts at penultimate peer
- Cover between P3 and P4 computed and sent to P2
- Cover between P2 and P4 computed and streamed to P1
- Cover between P1 and P4 computed

EXPERIMENTAL RESULTS ALGORITHMIC ISSUES

- Do our solutions provide added value for communities that already use mapping tables extenxively?
- Are characteristics of our algorithm appropriate and effective in a peer-to-peer environment?

Implementation ALGORITHMIC ISSUES

- Geographically distributed machines with one peer per machine
- Each peer has 2 modules:
- First module interacts with the storage manager to retrieve mappings and perform cover
- Second is peer-to-peer networking protocol

Implementation ALGORITHMIC ISSUES

- Each peer decides how much cache to use
- Biology Domain:6 Biological DB used
GDBMIMSwissProtHugoLocusUnigene

- Tabe sizes range from 7000 to 28000 mappings with an average of 13000.
- B2B Domain:business-to-business setting

Results ALGORITHMIC ISSUES

- Cache sizes from 64 to 128 mappings result
the best running times for those data character

- B2B
Complex semantics for tables,but still efficient new mappings

Total execution time scales linearly with the number of computed mappings

CONCLUSION ALGORITHMIC ISSUES

- Problem of managing collections of mapping tables
- Alternative semantics for tables
- A language that allows specification of mapping tables under different semantics
- Complexity of Inference and consistency
- An algorithm to solve the problem

ANY QUESTIONS? ALGORITHMIC ISSUES

THANK YOU...

Download Presentation

Connecting to Server..