Probabilistic Ranking of Database Query Results. Surajit Chaudhuri , Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis , Florida International University Gerhard Weikum , MPI Informatik Presented by: Ranjan alankar raju Sindhu satyanarayana. AGENDA.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Probabilistic Ranking of Database Query Results
SurajitChaudhuri, Microsoft Research
Gautam Das, Microsoft Research
VagelisHristidis, Florida International University
Gerhard Weikum, MPI Informatik
WHERE CITY=‘SEATTLE’ ;
RESULT OF THIS QUERY: Too Many Answers
-BY PROMPTING THE USER
-USING GLOBAL AND CONDITIONAL SCORE
Eg: If CITY=‘SEATTLE’ and VIEW=‘WATERFRONT’
Will BOATDOCK=‘YES’ interest him?
p(a/b) = [ p(b/a) p(a) ] / [p(b)]
p(a,b/c) = p(a/c) * p(b/a,c)
D - Event that person has disease
T- Test is Positive
p(D)= 1% p(D|T)=?
p(T|D) = 90 %
1. (D n T)-Has disease and test +ve. 3. (D’ n T)- No disease and test +ve. 2. (D n T’)-Has disease and test –ve. 4. (D’ n T’)- No disease and test –ve.
R- Irrelevant Documents
Select * From Realtor_db
where City=‘Seattle’ and Price=‘High’;
Select * from Realtor_db where City=‘Seattle’;
FINAL RANKING FORMULA
p(y|W) = Relative frequency of unspecified attribute ‘y’ given workload ‘W’
p(y|D)= Relative frequency of unspecified attribute ‘y’ given data base ‘D’
p(x|y,W)=Frequency of correlation between x and y in W
P(x|y,D)=Frequency of correlation between x and y in D
1. Computation of modules:
p(y | W), p(y | D), p(x | y, W), and p(x | y, D) for all distinct values of x and y.
2. Storing these atomic probabilities as database tables in intermediate knowledge representation layer with appropriate indexes.
3.Computation of index module resulting in conditional and global lists table.
CONDITIONAL LISTS Cx:
Contains <TID, CondScore> in descending order
GLOBAL LISTS Gx:
Contains <TID,GlobScore> in descending order
select * from SeattleHomes where City=‘Seattle’ and Bedroom=1;
Automated approach leverages data and workload statistics and correlations.
Existence of correlations between text and non-text data.