No schema sql querying relational databases independent of schema
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

No-Schema SQL Querying Relational Databases Independent of Schema PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on
  • Presentation posted in: General

No-Schema SQL Querying Relational Databases Independent of Schema. Adam BootMichael Maur Keith KellyJing Yu Maryam Karimzadehgan. Agenda. Motivation Approach Method Challenges Demonstration Conclusion Future Work. Motivation (Schema Dependence).

Download Presentation

No-Schema SQL Querying Relational Databases Independent of Schema

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


No schema sql querying relational databases independent of schema

No-Schema SQLQuerying Relational Databases Independent of Schema

Adam BootMichael Maur

Keith KellyJing Yu

Maryam Karimzadehgan

CS511


Agenda

Agenda

  • Motivation

  • Approach

  • Method

  • Challenges

  • Demonstration

  • Conclusion

  • Future Work

CS511


Motivation schema dependence

Motivation(Schema Dependence)

  • Relational Model achieved certain forms of independence:

    • Independent of ordering

    • Independent of indexing

    • Independent of access path

    • Not independent of schema

  • Schema dependence is undesirable

    • Memorizing schema is a burden on the user

    • When schema changes, queries must change

CS511


Motivation the problem with sql

Motivation(The Problem with SQL)

  • SQL is difficult for the causal user

  • Relational schema is double-edged

    • Schema should exist to help in finding relevant data

    • User should not need to understand structure of underlying schema (transparency)

  • Implementation of a full natural-language interface is very hard

    • NaLIX [1]

    • CQL/NL [2]

CS511


Approach

Approach

  • What No-Schema SQL (NS-SQL) is

    • Relaxed form of SQL’s syntactical structure

      • Frees users from having to know schema

      • Allows schema-specific elements to be omitted

    • Source-to-source translator

      • Infers schema from fkpk mappings

      • Qualifies column references

      • Specifies table references and joins (FROM clause)

  • What NS-SQL is not

    • Natural-language interface

      • User must still know basic SQL syntax

    • Mind reader (ambiguities become possible)

      • Column name ambiguity

      • Join path ambiguity

CS511


Approach a simple example

id

name

salary

Employee

id

title

description

id

city

state

Job

Facility

Approach(A Simple Example)

Find the name, city, and state of all employees whose salary is

greater than $70,000 and job title is “Database Designer”

  • SELECT Employee.name, Facility.city, Facility.state FROM Employee JOIN Job ON (Employee.job_id = Job.id) JOIN Facility ON (Employee.facility_id = Facility.id) WHERE (Employee.salary > 70000)AND (Job.title = "Database Designer");

SQL

NS-SQL

  • SELECT name, city, state WHERE (salary > 70000) AND (title = "Database Designer");

CS511


Method system architecture

DBMS(via JDBC)

DB-URL

sql

url

schema

console,

GUI

Hub

NS-SQL

ddl

ambiguity error

Schemafier

schema

dml

ambig

disambig

command-line,

script file,

GUI

SQL

nssql

Statement

Parser

parse error

Method(System Architecture)

Get Schema

Schemafy

Parse NS-SQL

CS511


Method statement processing

Employee.

name

Facility.

city

Facility.

state

Job.

title

Employee.

salary

key mapping for join

Exployee.job_id = Job.id

Employee.facility_id = Facility.id

Legend

SELECT

Column reference

Structure

name

city

state

Image node

WHERE

Table reference

Expression

salary

title

(Salary > 70000) and ( title = “Database Designer” )

Method(Statement Processing)

SELECT name, city, state WHERE (salary > 70000) AND (title = "Database Designer");

Schemafier

CS511


Method statement processing1

Employee.

name

Facility.

city

Facility.

state

Job.

title

Employee.

salary

Exployee.job_id = Job.id

Employee.facility_id = Facility.id

Method(Statement Processing)

SELECT

key mapping for join

name

state

city

WHERE

Expression

salary

title

(Salary > 70000) and ( title = “Database Designer” )

CS511


Method statement processing2

Employee.

name

Facility.

city

Facility.

state

Job.

title

Employee.

salary

Exployee.job_id = Job.id

Employee.facility_id = Facility.id

Employee.

name

Facility.

state

Facility.

city

FROM

Joined

Table

Employee.

facility_id

Facility.

id

Facility

Joined

Table

Job.

id

Job

Employee.

job_id

Employee

Employee.

salary

Job.

title

Method(Statement Processing)

SELECT

key mapping for join

name

state

city

WHERE

Expression

salary

title

(

> 70000) and (

= “Database Designer” )

CS511


Method statement processing3

Method(Statement Processing)

SELECT Empoyee.name, Facility.city, Facility.state FROM Employee JOIN Job ON (Employee.job_id = Job.id) JOIN Facility ON (Employee.facility_id = Facility.id) WHERE (Employee.salary > 70000)AND (Job.title = "Database Designer");

SELECT

Employee.

name

Facility.

state

Facility.

city

FROM

Joined

Table

Employee.

facility_id

Facility.

id

Facility

Joined

Table

Job.

id

Job

Employee.

job_id

WHERE

Employee

Expression

Employee.

salary

Job.

title

(Salary > 70000) and ( title = “Database Designer” )

CS511


Challenges parser

Challenges: Parser

  • Input Requirement

    • Support standard SQL (SQL92) inputs and relaxed SQL (NS-SQL) inputs

    • Extensible (tolerate dialects)

  • AST Requirement

    • Easy to extract column/table references

    • Able to modify column reference and FROM clause given feedback from Schemafier

  • Solution

    • Build our own NS-SQL parser with JavaCC

    • Design a sufficient internal representation

CS511


Challenges schemafier

Challenges: Schemafier

  • Determine how referenced columns are relatable

    • From among referenced column names, determine a unique set of related physical columns

    • Determine a unique join path

  • Types of possible ambiguity

    • Column name ambiguity

    • Join path ambiguity

CS511


Challenges schemafier column name ambiguity

Challenges: Schemafier(Column Name Ambiguity)

  • Column names might be similar between tables

    • eg, column name such as “id” or “name”

  • Assume all referenced columns are related via fkpk (no Cartesian products)

  • No other assumptions about which physical columns are intended

  • If multiple interpretations of column name  physical column, notify user of ambiguity

  • Graph tagging algorithm (“reachability”)

CS511


Challenges schemafier name ambiguity graph tagging

Enroll

Student

Course

Survey

Challenges: Schemafier(Name Ambiguity: Graph Tagging)

tag 1

  • Interpret a schema as a graph

    • Tables are nodes

    • Key mappings are edges

  • Mark all roots (nodes without parents) and their children with a common tag

  • A node may have multiple tags

  • Nodes that share a common tag are reachable amongst each other (valid solution)

tag 2

two tags:

  • 1 = {Enroll, Student, Course}

  • 2 = {Survey, Student, Course}

CS511


Challenges schemafier name ambiguity example

Enroll

Student

Course

Survey

Challenges: Schemafier(Name Ambiguity: Example)

tag 1

SELECT name, title, grade;

Unambiguous: tag 1

SELECT grade, rating;

Unrelatable: no common tag

SELECT name, designator;

Unambiguous: tag 1 and tag 2

SELECT name, credithours;

Ambiguous: two physical interpretations of reference “credithours” and each is a valid solution

SELECT rating, credithours;

Unambiguous: two versions of “credithours”, but only one valid solution (tag 2)

grade

credithours

id

designator

title

credithours

id

name

email

ta

rating

tag 2

CS511


Challenges schemafier join path ambiguity

Challenges: Schemafier(Join Path Ambiguity)

  • Set of columns returned from Column Disambiguation algorithm as input (physical columns)

  • Finds join paths that exist between columns in tables according to mapping between Foreign/Primary key

  • If there is more than one path  unambiguous solution otherwise, ambiguous solution

CS511


Challenges schemafier path ambiguity finding possible paths

Enrollment

Student

Course

Professor

R

R

College

FK

PK

R = referenced in query

Challenges: Schemafier(Path Ambiguity: Finding Possible Paths)

  • Get set of common predecessors for all referenced tables

  • Get set of possible paths from root to each referenced table

  • Generate all possible path combinations reaching all referenced tables

  • Remove useless path combinations

  • Extract joins from path combinations

common predecessors:

Enrollment

Course

Professor

paths from root ‘Course‘:

to College:

- Course, Student, College (C1)

- Course, Professor, College (C2)

to Professor:

- Course, Professor (P1)

possible combinations:

- C1, P1

- C2, P1

possible combinations:

- C1, P1

- C2, P1 (deleted)

CS511


Challenges schemafier path ambiguity example

Enrollment

StudentID,

CourseID,

Grade,

CreditHours

Course

ID,

Designator,

Title,

TA,

Professor,

CreditHours

Student

ID,

Name,

CollegeID,

Email

Professor

ID,

Name,

College,

Email

College

ID,

Name,

PhoneNr

Challenges: Schemafier(Path Ambiguity: Example)

SELECT Title, Grade;

Unambiguous

SELECT College, PhoneNr;

Ambiguous: 3 joins exist

SELECT Grade, CollegeID;

Ambiguous: 2 joins exist

SELECT Designator, PhoneNr;

Ambiguous: 3 joins exist

SELECT Title, CollegeID;

Ambiguous: 2 joins exist

SELECT Course.ID, Professor.Name;

Unambiguous

CS511


Conclusion

Conclusion

  • Current NS-SQL transformer

    • Independent of the schema

    • Able to qualify referenced columns and infer join paths for non-nested SELECT statements

    • Return detailed error message if ambiguity exists

    • Highly extensible

CS511


Future work parser

Future Work(Parser)

  • Support dialects without affecting standard queries

    • Virtually every DBMS extends SQL

  • Example extensions:

    • Arguments:

      • SELECT TOP 10 … (SQL Server)

      • SELECT … LIMIT 0, 10 (MySQL)

      • SELECT … SAMPLE 10 (Oracle)

    • Functions:

      • CUBE and ROLLUP (SQL Server and Oracle)

    • Statements:

      • RENAME (MySQL)

      • PARTITION (Oracle)

  • Parser and internal representation are flexible

CS511


Future work schemafier

Future Work(Schemafier)

  • Support more SQL statements

    • Extends Schemafier to support UPDATE, DELETE, INSERT

    • Support nested queries

    • Support aliasing

  • Usability

    • Given an ambiguity error, provide a more intuitive interface to resolve the ambiguity

    • Synonym support

CS511


References

References

[1]Yunyao Li, Huahai Yang, H. V. Jagadish; “NaLIX: an Interactive Natural Language Interface for Querying XML”; SIGMOD 2005, June 14-16, 2005.

[2]Vesper Owei, Hyeun-Suk (Sue) Rhee, Shamkant Navathe; “Natural Language Query Filtration in the Conceptual Query Language”; IEEE 1997, 1060-3425197.

CS511


Questions

Questions?

CS511


  • Login