GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University

Automated Generation of Object Summaries from Relational Databases: A Novel Keyword Searching Paradigm GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University Manchester, UK. g.fakas@mmu.ac.uk

Related Work:Keyword Search in Relational DBs • Full-text Search (e.g. Oracle 9i Text) • Kw Searching in Relational DB (DISCOVER, BANKS) Kw Search: Leverling, Peacock Result: e3-o2-c2 e4-06-c2

Related Work:Web Search Engines: Keyword Search Kw Search: Peacock Result: A ranked set of web pages

A Novel Keyword Searching Paradigm:Object Summaries (OSs) Kw Search: Peacock Result: A Ranked set of OSs

A Novel Keyword Searching Paradigm:Object Summaries (OSs) Kw Search: PeacockResult: A Ranked set of OSs • Problems-Challenges: • How can we automatically • Generate and (2) Rank OSs • liberating users from knowledge of: • (1) Schema and • (2) Query Language • ?

OS Generation - Methodology • tDSa central tuple containing the Kw; tuples around tDS contain additional information about the Data Subject. • RDS the corresponding central Relation; similarly Relations around contain additional information. • Our solutions are based on the assumption that each database has central relations (denoted as RDS) that represent the DS’s. E.g • Northwind RDS = { REmployees, RCustomers } • Relations linked around RDSs include additional information about the DS

OS Generation - Methodology • tDSa central tuple containing the Kw; tuples around tDS contain additional information about the Data Subject. • RDS the corresponding central Relation; similarly Relations around contain additional information.

OS Generation - Methodology GDS

OS Generation - Methodology GDS Problem: Not all Relations in GDS are relevant: How do I decide 1) What relations to select or not 2) When to Stop Traversing Solution: Investigate Relational Semantics: Schema Connectivity, Cardinality, Related Cardinality etc. Quantify Affinity of Relations

: Affinity of Relations to RDS in GDS Distance • Physical (fd), Logical (ld), ld=fd-|M:N|

: Affinity of Relations to RDS in GDS Distance • Physical (fd), Logical (ld), ld=fd-|M:N| • E.g. Orders closer than Customer and CustomerDemo to Employees

: Affinity of Relations to RDS in GDS Distance • Physical (fd), Logical (ld), ld=fd-|M:N| • E.g. Orders closer than Customer and CustomerDemo to Employees • Hubs: spurious shortcuts • Rather irrelevant or lateral information RC(R1, R2)

: Affinity of Relations to RDS in GDS Connectivity • Schema Connectivity (Coi) • Data-graph Connectivity: • Relative Cardinality (RCi→j), i.e. the average number of tuples of Ri that are connected with each tuple from Rj • for 1:M RCi→j=|Ri|/|Rj| • for M:1 RCi→j=1 • Reverse Relative Cardinality (RRCi→j) is the reverse of RCi→j • i.e. RRCi→j=RCi→j).

: Affinity of Relations to RDS in GDS • DAf(Ri)={(m1, w1), (m2, w2),.. (mn, wn)} • m1=f1(ldi), m2=f1(log(10*RCi), m3=f1(log(10*RRCi), m4=f1(log(10*Coi) • f1(α)=(11- α)/10 • For a hub-child m1=f1(ldi *hi) and m2=f1(RCi) • Formula 1 (Semantic Affinity): • The affinity of Ri to RDS, denoted as , with respect to a schema and a database conforming to the schema, can be calculated with the following formula: • Where is the affinity of the Ri’s Parent to RDS or is 1 if RParent≡RDS.□

: Affinity of Relations to RDS in GDS GDS(θ)

OS Ranking A Ranked set of Partial OSs - A complete OS

OS Ranking- Problems and Challenges Existing Keyword Searching ranking semantics the smaller size the higher ranking In contrast, in the proposed paradigm an OS containing many and well connected tuples should have certainly greater importance than an OS with less tuples. For instance, a Customer or Employee OS involved in many Orders or an Author authored many important papers and books.

OS Ranking- Importance Im(OS)= ti is a tuple of OS Im(ti) is the Importance of ti (i.e. PageRank) |OS| is the amount of tuples in OS, AfR(ti) is the affinity of R that ti belongs to

Experimental Evaluation • MS Northwind and TPC-H DBs • Precision, Recall, F-Score • Compare GDSs and OSs produced by 12 GDS(θ) v GDS(h) • GDS(h) was proposed by 10 participants • GDS: average F-score 86.77, OS aver F-score 83

Conclusions –Future Work • Top-k OS results • Top-k size of an OS • Challenge: the weights of new tuples are not monotonic • (since a tuple’s PageRank may increase while its Affinity decrease). • Alternative to PageRank weighting systems are currently investigated; i.e. ObjectRanks

Conclusions -Novel Contributions • The formal definition of the novel Searching Paradigm which automatically produces a ranked set of OSs for a Data Subject. • minimum contribution from the user (i.e. only a Kw) • no prior knowledge of the DB schema or query language needed. • Excellent Precision, Recall and F-score results • The formal definition and quantification of Relation’s Affinity in the context of GDS • consider both Schema Design and Data distributions • A novel ranking paradigm to calculate Im(OS). • The quantification of tuples’ and OSs’Importance. • A Combine Function that considers: • the weight (e.g. PageRank) of tuples, • Affinity and • size of OS

: Affinity of Relations to RDS in GDS

Affinity Ranking Correctness (Average)

GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University

GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University

Presentation Transcript

Department of CS and Mathematics, University of Pitesti

The Manchester Metropolitan University

The Manchester Metropolitan University

EAL at Manchester Metropolitan University Institute of Education

Ken McLaughlin Manchester Metropolitan University United Kingdom

Manchester Metropolitan University

Jim Strom , Manchester Metropolitan University, UK George Neisser , University of Manchester, UK

Manchester Metropolitan University

Primary School Direct with Manchester Metropolitan University

Department of Mathematics, Mahidol University

Angela McLachlan, University of Manchester Gee Macrory, Manchester Metropolitan University

Ross MacIntyre, MIMAS Manchester Computing University of Manchester, UK

Rachel Forsyth, Learning and Teaching Unit, Manchester Metropolitan University

Department of Mathematics Kean University

Manchester Metropolitan University

Manchester Metropolitan University

University of Texas – Department of Mathematics

Manchester Metropolitan University

Manchester Metropolitan University

Nick Lund and Kathy Kinmond Manchester Metropolitan University

Manchester Metropolitan University