the texture benchmark measuring performance of text queries on a relational dbms n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS PowerPoint Presentation
Download Presentation
The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS

Loading in 2 Seconds...

play fullscreen
1 / 35

The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS. Vuk Ercegovac David J. DeWitt Raghu Ramakrishnan. Applications Combining Text and Relational Data. Query :. SELECT SCORE, P.id, FROM Products P WHERE P.type = ‘PDA’ and

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS' - markku


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the texture benchmark measuring performance of text queries on a relational dbms

The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS

Vuk Ercegovac

David J. DeWitt

Raghu Ramakrishnan

applications combining text and relational data
Applications Combining Text and Relational Data

Query:

SELECT SCORE, P.id,

FROM Products P

WHERE P.type = ‘PDA’ and

CONTAINS(P.complaint, ‘short battery life’, SCORE)

ORDER BY SCORE DESC

ProductComplaints

How should such an application be

expected to perform?

possibilities for benchmarking
Possibilities for Benchmarking

1. http://es.csiro.au/TRECWeb/vlc2info.html

2. http://trec.nist.gov

3. http://www.tpc.org

4. S. DeFazio, Full-text Document Retrieval Benchmark, chapter 8. Morgan Kaufman, 2 edition, 1993

8. P. O’Neil. The Set Query Benchmark. The Benchmark Handbook, 1991

10. C. Turbyfill, C. Orji, and D. Bitton. AS3AP- a Comparative Relational Database Benchmark. IEEE Compcon, 1989.

contributions of texture
Contributions of TEXTURE
  • Design micro-benchmark to compare response time using a mixed relational + text query workload
  • Develop TextGen to synthetically grow a text collection given a real text collection
  • Evaluate TEXTURE on 3 commercial systems
why a micro benchmark design
Why a Micro-benchmark Design?
  • A fine level of control for experiments is needed to differentiate effects due to:
    • How text data is stored
    • How documents are assigned a score
    • Optimizer decisions
why use synthetic text
Why use Synthetic Text?
  • Allows for systematic scale-up
    • User’s current data set may be too small
  • Users may be more willing to share synthetic data

Measurements on synthetic data shown empirically by us to be close to same measurements on real data

a note on quality
A Note on Quality
  • Measuring quality is important!
    • Easy to quickly return poor results
  • We assume that the three commercial systems strive for high quality results
    • Some participated at TREC
    • Large overlap between result sets
outline
Outline
  • TEXTURE Components
  • Evaluation
  • Synthetic Text Generation
texture components
TEXTURE Components

Query Templates

Query 1

Query 2

Query n

Response Time A

Response Time B

QueryGen

DBGen

TextGen

System A

System B

Relational

Text Attributes

overview of data
Overview of Data
  • Schema based on Wisconsin Benchmark [5]
    • Used to control relational predicate selectivity
  • Relational attributes populated by DBGen [6]
  • Text attributes populated by TextGen (new)
    • Input:
      • D: document collection, m: scale-up factor
    • Output:
      • D’: document collection with |D| x m documents
      • Goal: Same response times for workloads on D’ and corresponding real collection

5. D. DeWitt. The Wisconsin Benchmark: Past, Present, and Future. The Benchmark Handbook, 1991.

6. J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger.

Quickly Generating Billion-record Synthetic Databases. ACM SIGMOD, 1994

overview of queries
Overview of Queries
  • Query workloads derived from query templates with following parameters
  • Text expressions:
    • Vary number of keywords, keyword selectivity, and type of expression (i.e., phrase, Boolean, etc.)
    • Keywords chosen from text collection
  • Relational expression:
    • Vary predicate selectivity, join condition selectivity
  • Sort order:
    • Choose between relational attribute or score
  • Retrieve ALL or TOP-K results
example queries
Example Queries
  • Example of a single relation, mixed relational and text
  • query that sorts according to a relevance score.

SELECT SCORE, num_id, txt_short

FROM R

WHERE NUM_5 = 3 and

CONTAINS(R.txt_long, ‘foo bar’, SCORE)

ORDER BY SCORE DESC

  • Example of a join query, sorting according to a

relevance score on S.txt_long.

SELECT S.SCORE, S.num_id, S.txt_short

FROM R, S

WHERE R.num_id = S.num_id and S.NUM_05 = 2

and CONTAINS(S.txt_long, ‘foo bar’, S.SCORE)

ORDER BY S.SCORE DESC

outline1
Outline
  • TEXTURE Components
  • Evaluation
  • Synthetic Text Generation
overview of experiments
Overview of Experiments
  • How is response time affected as the database grows in size?
  • How is response time affected by sort order and top-k optimizations?
  • How do the results change when input collection to TextGen differs?
data and query workloads
Data and Query Workloads
  • TextGen input is TREC AP Vol.1[1] and VLC2 [2]
    • Output: relations w/ {1, 2.5, 5, 7.5, 10} x 84,678 tuples
    • Corresponds to ~250 MB to 2.5 GB of text data
  • Text-only queries:
    • Low (< 0.03%) vs. high selectivity (< 3%)
    • Phrases, OR, AND
  • Mixed, single relation queries:
    • Low (<0.01%) vs. high selectivity (5%)
    • Pair with all text-only queries
  • Mixed, multi relation queries:
    • 2, 3 relations, vary text attribute used, vary selectivity
  • Each query workload consists of 100 queries

1. http://es.csiro.au/TRECWeb/vlc2info.html

2. http://trec.nist.gov

methodology for evaluation
Methodology for Evaluation
  • Setup database and query workloads
  • Run workload per system multiple times to obtain warm numbers
    • Discard first run, report average of remaining
  • Repeat for all systems (A, B, C)
  • Platform: Microsoft Windows 2003 Server, dual processor 1.8 GHz AMD, 2 GB of memory, 8 120 GB IDE drives
scaling text only workloads
Scaling: Text-Only Workloads
  • How does response time vary per system as the data set scales up?
    • Query workload: low text selectivity (0.03%)
    • Text data: synthetic based on TREC AP Vol. 1
mixed text relational workloads
Mixed Text/Relational Workloads
  • Drill down on scale factor 5 (~450K tuples)
    • Query workload Low: text selectivity (0.03%)
    • Query workload High: text selectivity (3%)
  • Do the systems take advantage of relational predicate for mixed workload queries?
    • Query workload Mix: High text, low relational selectivity (0.01%)

Seconds per system and workload (synthetic TREC)

top k vs all results
Top-k vs. All Results
  • Compare retrieving all vs. top-k results
  • Query workload is Mix from before
    • High selectivity text expression (3%)
    • Low selectivity relational predicate (0.01%)

Seconds per system and workload (450K tuples, synthetic TREC)

varying sort order
Varying Sort Order
  • Compare sorting by score vs. sorting by relational attribute
    • When retrieving all, results similar to previous
    • Results for retrieving top-k shown below

Seconds per system and workload (450K tuples, synthetic TREC)

varying the input collection
Varying the Input Collection
  • What is the effect of different input text collections on response time?
    • Query workload: low text selectivity (0.03%)
      • All results retrieved
    • Text Data: synthetic TREC and VLC2

Seconds per system and collection (450K tuples)

outline2
Outline
  • Benchmark Components
  • Evaluation
  • Synthetic Text Generation
synthetic text generation
Synthetic Text Generation
  • TextGen:
    • Input: document collection D, scale-up factor m
    • Output: document collection D’ with |D| x m documents
  • Problem: Given documents D, how do we add documents to obtain D’ ?
    • Goal: Same response times for workloads on D’ and corresponding real collection C, |C|=|D’|
  • Approach: Extract “features” from D and draw |D’| samples according to features
document collection features
Document Collection Features
  • Features considered
    • W(w,c) : word distribution
    • G(n, v) : vocabulary growth
    • U,L : number of unique, total words per document
    • C(w1, w2, …, wn, c) : co-occurrence of word groups
  • Each feature is estimated by a model
    • Ex. Zipf[11] or empirical distribution for W
    • Ex. Heaps Law for G[7]

7. H. S. Heaps, Information Retrieval, Computational and Theoretical Aspects. Academic Press, 1978.

11. G. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Hafner Publications, 1949.

process to generate d
Process to Generate D’
  • Pre-process: estimate features
    • Depends on model used for feature
  • Generate |D’| documents
    • Generate each document by sampling W according to U and L
    • Grow vocabulary according to G
  • Post-process: Swap words between documents in order to satisfy co-occurrence of word groups C
feature model combinations
Feature-Model Combinations
  • Considered 3 instances of TextGen, each a combination of features/models
which textgen is a good generator
Which TextGen is a Good Generator?
  • Goal: response time measured on synthetic (S) and real (D) should be similar across systems
  • Does the use of randomized words in D’ affect response time accuracy?
  • How does the choice of features and models effect response time accuracy as the data set scales?
use of random words
Use of Random Words
  • Words are strings composed of a random permutation of letters
  • Random words are useful for:
    • Vocabulary growth
    • Sharing text collections
  • Do randomized words affect measured response times?
    • What is the affect on stemming, compression, and other text processing components?
effect of randomized words
Effect of Randomized Words
  • Experiment: create two TEXTURE databases and compare across systems
    • Database AP based on TREC AP Vol. 1
    • Database R-AP: randomize each word in AP
    • Query workload: low & high selectivity keywords
  • Result: response times differ on average by < 1%, not exceeding 4.4%
  • Conclusion: using random words is reasonable for measuring response time
effect of features and models
Effect of Features and Models
  • Experiment: compare response times over same sized synthetic (S) and real (D) collections
    • Sample s documents of D
    • Use TextGen to produce S at several scale factors
      • |S| = 10, 25, 50, 75, and 100% of |D|
    • Compare response time across systems
    • Must repeat for each type of text-only query workload
  • Used as framework for picking features/models
textgen evaluation results
TextGen Evaluation Results
  • How does response time measured on real data compare to the synthetic TextGen collections?
    • Query workload: low selectivity text only query (0.03%)
  • Graph is for System A
    • Similar results obtained for other systems
future work
Future Work
  • How should quality measurements be incorporated?
  • Extend the workload to include updates
  • Allow correlations between attributes when generating database
conclusion
Conclusion
  • We propose TEXTURE to fill the gap seen by applications that use mixed relational and text queries
  • We can scale-up a text collection through synthetic text generation in such a way that response time is accurately reflected
  • Results of evaluation illustrate significant differences between current commercial relational systems
references
References
  • http://es.csiro.au/TRECWeb/vlc2info.html
  • http://trec.nist.gov
  • http://www.tpc.org
  • S. DeFazio, Full-text Document Retrieval Benchmark, chapter 8. Morgan Kaufman, 2 edition, 1993
  • D. DeWitt. The Wisconsin Benchmark: Past, Present, and Future. The Benchmark Handbook, 1991.
  • J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger. Quickly Generating Billion-record Synthetic Databases. ACM SIGMOD, 1994
  • H. S. Heaps, Information Retrieval, Computational and Theoretical Aspects. Academic Press, 1978.
  • P. O’Neil. The Set Query Benchmark. The Benchmark Handbook, 1991
  • K. A. Shoens, A. Tomasic, H. Garcia-Molina. Synthetic Workload Performance Analysis of Incremental Updates. In Research and Development in Information Retrieval, 1994.
  • C. Turbyfill, C. Orji, and D. Bitton. AS3AP- a Comparative Relational Database Benchmark. IEEE Compcon, 1989.
  • G. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Hafner Publications, 1949.