1 / 21

Indexing Data Relationships

Indexing Data Relationships. Michael J. Franklin University of California, Berkeley & RightOrder Inc. Overview. Data relationships can be complex. Hierarchical views: XML, LDAP, … Semistructure & dynamic schema Approach:Encode paths as tagged strings “raw” paths encode structure

Download Presentation

Indexing Data Relationships

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing Data Relationships Michael J. Franklin University of California, Berkeley & RightOrder Inc.

  2. Overview • Data relationships can be complex. • Hierarchical views: XML, LDAP, … • Semistructure & dynamic schema • Approach:Encode paths as tagged strings • “raw” paths encode structure • “refined” paths accelerate lookups • Index strings in a highly-compact structure. • Live on top of, next to or inside DBMS. • Benefits • Performance, Scalability + Adaptivity • Leverages mature DBMS technology

  3. Raw paths w/Designators Invoice as a tree a Invoice p c b Seller Itemlist Buyer g d Name e e d e g Name Item Address Item Item Address ABC Corp. 123 ABC Way 17 Main St. widget thingy jobber Goods Inc. abdABC Corp. apg17 Main St. acewidget acejobber abg123 ABC Way apdGoods Inc. acethingy

  4. Refined paths tABC Corp. Goods Inc. tXY tXYZ Corp. Acme Inc. fABC Corp. jobber widget fXY Z hXY Z fXYZ Corp. drill hammer hjobber thingy widget hdrill hammer nail • Optimize specific access paths “Find invoices where X sold to Y ” “Find invoices where X bought Y and Z” “Find invoices where a buyer bought X, Y and Z ”

  5. Index Fabric • An index structure for long strings. • Provides fast lookups • Handles long strings • Ideal substrate for designated keys • Based on Patricia tries • Highly compressed string representation • Cost in index independent of string length • But, need to balance.

  6. Patricia tries g c e a w 0 r 2 t grass corn cow b 2 2 5 greenbeans greentea Indexes first point of difference between keys greenbeans greentea D. R. Morrison. “PATRICIA – Practical algorithm to retrieve information coded in alphanumeric.” J. ACM, 15 (1968) pp. 514-534

  7. Multiple Hierarchical Views a a cow cow b b corn corn • Can store multiple permulations of relationships • Find animals and the plants they eat • Find plants and the animals that eat them • Represent as a new set of keys • Store data once using “permutation records”

  8. Example a a cow cow a cat 0 b b corn wheat b corn 5 1 2 4 6 5 a b a w o c b a c c

  9. Example a cow a cat 0 b b wheat corn 5 2 4 5 6 1 a b a w o c b a c c a b

  10. Balancing Patricia tries g c e a w 0 r 2 t grass corn cow b 5 2 2 greenbeans greentea

  11. Balancing Patricia tries g c e a w 0 r 2 t grass corn cow b 2 5 2 greenbeans greentea Step 1: divide trie into blocks

  12. Balancing Patricia tries g c e a w 0 0 r 2 t grass corn cow b 5 2 2 2 greenbeans greentea Step 2: build another layer g e Layer 1 Layer 0

  13. Balancing Patricia tries g c e a w 0 0 r 2 t grass corn cow b 2 2 2 5 greenbeans greentea Search for “cash” greenbeans g e Layer 1 Layer 0

  14. Balancing Patricia tries 0 5 2 Search for “cash” 0 g c g 2 2 e a w r e 2 t grass corn cow b greenbeans greenbeans greentea Layer 1 Layer 0

  15. Balancing Patricia tries 0 5 2 Search for “cash” 0 g c g 2 2 e a w r greenbeans e 2 t grass corn cow b greenbeans greentea Layer 1 Layer 0

  16. Balancing Patricia tries Search Layer 2 Layer 2 Layer 3 Layer 1 Layer 1 Data Layer 0 Layer 0

  17. Performance • Number of layers is small • Fixed (small) space per key • High branching factor per block • Bushy, shallow tree • Example: • 8 KB blocks • 32 bit pointers + 2 bytes for keys/structure • = 1000+ pointers per block • = 3 layers for 1 billion pointers to data (10003) • Upper layers are tiny (10 megabytes), in RAM • Only layer 0 on disk • Usually one index I/O per key lookup Data

  18. Find publications by co-authors 2.5 : 1 5 : 1 25 : 1 Index Fabric Refined Paths Index Fabric Raw Paths RDBMS STORED 10,000 queries RDBMS Edge mapping

  19. Find publications by co-authors 2.1 : 1 4 : 1 20 : 1 Index Fabric Refined Paths Index Fabric Raw Paths RDBMS STORED RDBMS Edge mapping 10,000 queries

  20. Conclusion • Index arbitrary relationships • Encode as designated strings • Relationships and structures can be complex • Index many data access paths • No need for DTD or pre-defined schema • Index Fabric • Special data structure for long keys • High performance key lookups • Supports designator encoding

  21. For more information • technology@rightorder.com • www.rightorder.com

More Related