Optimizing Query Processing on SSDs: Flash Group Methodologies

Query Processing and Optimizing on SSDs Flash Group Qingling Cao qingling1220@sina.com

Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion

Introduction • Page layout and data structure • Leverage fast random read to speed up selection、projectionand join operation • Database query processing engines traditionally emphasize on sequential I/O

Page Layout on SSD Row Layout slot Column Layout -Attributes of one column stored in continuous pages

Page Layout on SSD PAX Layout PAX Layout is efficient for SSD but not for disk. Why?

Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.

Scan Approaches • NSMScan – Always read the whole relation. • FlashScan – Read only the related columns. e.g. select S from R where J

Scan Approaches • FlashScanOPT(U) – read only the mini-pages consist the tuples needed. e.g. select S from R where J • FlashScanOPT(S) – Attributes are sorted, so the mini-pages are read at most once.

Scan Approaches Table: 70m tuples, 11columns, 10GB System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM Mtron 32GB SSD

Join Algorithms – past lessons Block Nested Loops Join Sort-Merge Join Grace Hash Join Hybrid Hash Join

Join Algorithms – past lessons Customer: 450w tuples, 730MB Order: 4500w tuples, 5GB HDD: 5400RPM, 320GB SSD: OCZ Core series 60GB SATA II ☆Algorithms that stress random reads , and avoid random writes as much as possible see bigger improvements on flash

Join Algorithms – RARE-join Select Name, Team from Player, Game where Player.Team=Game.Geam J1 J2 Player Game

Join Algorithms – RARE-join Join Index： Join Result： Total I/O cost: |J1|+ σ1|V1|+|J2|+ σ2|V2|

Join Algorithms – FlashJoin id1,id2,id3 hashG, id1,id2 hashK, id3 id1,id2 hashA, id1 hashD, id2 Read(A) Read(D)

Join Algorithms – Fetch Kernel Join Index： Join Index： Each page is read no more than once.

Join Algorithms – Fetch Kernel Join Index： Join Index：

Join Algorithms – FlashJoin R: 70m tuples, 10GB S: 7m tuples, 1GB System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM Mtron 32GB SSD

Join Algorithms – DigestJoin • Row-based • {JI, idx, idy} • Minimize the IO to fetch the join result

Join Algorithms – Page Fetching(1) • Sort-merge join • Join results are clustered • Memory is enough • Fetch the pages of the tuples as soon as they are produced

Join Algorithms – Page Fetching(2) ft1={A:1, A:2, B:1, B:2} ft2={C:1, C:2, D:1, D:2} jct1={x1,x2,x3,x4} jct2={y1,y2,y3,y4} Fetching instruction table Join candidate table Join Index: (x1,A:1,C:1) (x2,B:1,D:1) (x3,A:2,C:2) (x4,B:2,D:2) ft1={A:1, B:1, A:2, B:2} ft2={C:1, D:1, C:2, D:2}

Join Algorithms – Page Fetching(3) • Join Graph G=(V1 ∪ V2, E) E  V1 V2 • Segment e.g. {1, a, b, c}, {a, 1, 2}

Join Algorithms – Page Fetching(3) • Required storage size(RSS) • Required cache size(RCS) • <join_atrr,tid1,tid2>

Conclusion PAX: • Scan algorithm has little room for improvement. • RARE-Join、FlashJoin. • No write. • Join index will be sorted many times. • The size of minipage is not fixed.

Conclusion Row: • DigestJoin. • IO is much more than other join algorithms. Column: • None • Storage is more flexible. • Utilize the technology of tuple reconstruction.

Thank you~

Optimizing Query Processing on SSDs: Flash Group Methodologies

Optimizing Query Processing on SSDs: Flash Group Methodologies

Presentation Transcript

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

QUERY OPTIMIZATION AND QUERY PROCESSING

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing and Query Optmization

Query Processing

Optimizing Query Processing In Sensor Networks

Query processing

Query Processing and Query Optimization

Query Processing

Query Processing

Query Processing