Probabilistic ranking of database query results
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Probabilistic Ranking of Database Query Results PowerPoint PPT Presentation


  • 41 Views
  • Uploaded on
  • Presentation posted in: General

Probabilistic Ranking of Database Query Results. Surajit Chaudhuri , Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis , Florida International University Gerhard Weikum , MPI Informatik Presented by: Ranjan alankar raju Sindhu satyanarayana. AGENDA.

Download Presentation

Probabilistic Ranking of Database Query Results

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Probabilistic ranking of database query results

Probabilistic Ranking of Database Query Results

SurajitChaudhuri, Microsoft Research

Gautam Das, Microsoft Research

VagelisHristidis, Florida International University

Gerhard Weikum, MPI Informatik

Presented by:

Ranjanalankarraju

Sindhusatyanarayana


Agenda

AGENDA

  • Introduction & Motivation

  • Problem Definition & Architecture

  • Definition of Ranking Function

  • Implementation

  • Experiments

  • Conclusions & Limitations


Let us see the

LET US SEE THE

  • Introduction & Motivation

  • Problem Definition & Architecture

  • Definition of Ranking Function

  • Implementation

  • Experiments

  • Conclusions & Limitations


Introduction and motivation

Introduction and Motivation

REALTOR_DB


Problem definition many answers

PROBLEM DEFINITION- MANY ANSWERS

  • SELECT * FROM REALTOR_DB

    WHERE CITY=‘SEATTLE’ ;

    RESULT OF THIS QUERY: Too Many Answers


Proposed solutions

PROPOSED SOLUTIONS

  • QUERY REFORMULATION TECHNIQUES:

    -BY PROMPTING THE USER

  • AUTOMATIC RANKING:

    -USING GLOBAL AND CONDITIONAL SCORE


Let us see the1

LET US SEE THE

  • Introduction & Motivation

  • Problem Definition & Architecture

  • Definition of Ranking Function

  • Implementation

  • Experiments

  • Conclusions & Limitations


Definitions and symbols

DEFINITIONS AND SYMBOLS

  • What are Specified Attributes (Denoted as ‘X’)

  • City

  • What are Unspecified Attributes (Denoted as ‘Y’)

  • View

  • Price

  • SchoolDistrict

  • BoatDock


Proposed ranking function

PROPOSED RANKING FUNCTION

  • Global Score : Global importance of unspecified attributes

    Eg: VIEW=‘WATERFRONT’

  • Conditional Score: Correlations between specified and unspecified attributes

    Eg: If CITY=‘SEATTLE’ and VIEW=‘WATERFRONT’

    Will BOATDOCK=‘YES’ interest him?


Architecture

ARCHITECTURE


Ranking functions rules theorems for pir

RANKING FUNCTIONSRules & Theorems For PIR

  • Bayes’ Rule:

    p(a/b) = [ p(b/a) p(a) ] / [p(b)]

    Product Rule:

    p(a,b/c) = p(a/c) * p(b/a,c)


Bayes theorem example

BAYES’ THEOREM EXAMPLE

  • 1% of the population has X disease.. A screening test accurately detects the disease for 90% of people with it. The test also indicates the disease for 15% of the people without it ( the false positives). Suppose a person screened for the disease tests positive. What is the probability they have it?


Bayes theorem cont

BAYES’ THEOREM Cont…

  • Interpretation and Assumption:

    D - Event that person has disease

    T- Test is Positive

  • Given:

    p(D)= 1% p(D|T)=?

    p(T|D) = 90 %

    p(T|D’)=15%


Tree structure interpretation

Tree structure Interpretation

Four Cases

1. (D n T)-Has disease and test +ve. 3. (D’ n T)- No disease and test +ve. 2. (D n T’)-Has disease and test –ve. 4. (D’ n T’)- No disease and test –ve.

1

D’

D

T

T

T’

T’


Let us see the2

LET US SEE THE

  • Introduction & Motivation

  • Problem Definition & Architecture

  • Definition of Ranking Function

  • Implementation

  • Experiments

  • Conclusions & Limitations


Rules theorems for pir cont

Rules & Theorems For PIR cont…

t-Tuple (Document)

R-Relevant Documents

R- Irrelevant Documents


Adaptation of pir

Adaptation of PIR

  • Partition tuple ‘t’ into two parts t(X) and t(Y)

  • Replacing t with ‘X’ & ‘Y’


Adaptation of pir cont

Adaptation of PIR cont…

  • QUERY SPECIFIED BY USER:

    Select * From Realtor_db

    where City=‘Seattle’ and Price=‘High’;

  • FINAL RANKING:

  • Waterfront Views

  • Greenbelt Views

  • Street Views


Limited independence assumption

Limited Independence Assumption

  • X (and Y) values within themselves are assumed to be independent.

  • Dependencies between the X and Y values are allowed


Eliminating r

Eliminating R

Incoming Query:

Select * from Realtor_db where City=‘Seattle’;


Workload based estimation

Workload-Based Estimation

FINAL RANKING FORMULA

Where:

p(y|W) = Relative frequency of unspecified attribute ‘y’ given workload ‘W’

p(y|D)= Relative frequency of unspecified attribute ‘y’ given data base ‘D’

p(x|y,W)=Frequency of correlation between x and y in W

P(x|y,D)=Frequency of correlation between x and y in D


Detailed process

Detailed Process


Let us see the3

LET US SEE THE

  • Introduction & Motivation

  • Problem Definition & Architecture

  • Definition of Ranking Function

  • Implementation

  • Experiments

  • Conclusions & Limitations


Implementation

IMPLEMENTATION

  • Preprocessing:

    1. Computation of modules:

    p(y | W), p(y | D), p(x | y, W), and p(x | y, D) for all distinct values of x and y.

    2. Storing these atomic probabilities as database tables in intermediate knowledge representation layer with appropriate indexes.

    3.Computation of index module resulting in conditional and global lists table.


Implementation cont

IMPLEMENTATION cont…

CONDITIONAL LISTS Cx:

Contains <TID, CondScore> in descending order

GLOBAL LISTS Gx:

Contains <TID,GlobScore> in descending order


Implementation cont1

IMPLEMENTATION cont…


Conditional and global scores

Conditional and Global Scores


Conditional and global list tables

Conditional and Global List tables


Implementation cont2

IMPLEMENTATION cont…

  • Query Processing Component.


List merge algorithm contd

List Merge Algorithm contd...


Let us see the4

LET US SEE THE

  • Introduction & Motivation

  • Problem Definition & Architecture

  • Definition of Ranking Function

  • Implementation

  • Experiments

  • Conclusions & Limitations


Experiments

EXPERIMENTS

  • Datasets:

  • MSN HomeAdvisor database

  • Internet Movie Database(IMDB)


Quality experiments

Quality Experiments

  • Examples of Ranking Results:

    Query:

    select * from SeattleHomes where City=‘Seattle’ and Bedroom=1;

  • Conditional ranked condos with garages the highest

  • Global failed to recognize importance of the unspecified attribute Garage=‘Y’


Quality experiments1

Quality Experiments

  • User Preference of Rankings:

  • Users given top 5 results of rankings for 5 queries

  • Ranking preferred by users indicated below:


Let us see the5

LET US SEE THE

  • Introduction & Motivation

  • Problem Definition & Architecture

  • Definition of Ranking Function

  • Implementation

  • Experiments

  • Conclusions & Limitations


Conclusion limitation

CONCLUSION & LIMITATION

CONCLUSION:

Automated approach leverages data and workload statistics and correlations.

LIMITATION:

Existence of correlations between text and non-text data.


  • Login