380 likes | 484 Views
Explore the benefits of SSDs for query processing, from page layout to join algorithms and more. Understand the impact of flash storage on database query performance and learn about innovative approaches for optimization.
E N D
Query Processing and Optimizing on SSDs Flash Group Qingling Cao qingling1220@sina.com
Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion
Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion
Introduction • Page layout and data structure • Leverage fast random read to speed up selection、projectionand join operation • Database query processing engines traditionally emphasize on sequential I/O
Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion
Page Layout on SSD Row Layout slot Column Layout -Attributes of one column stored in continuous pages
Page Layout on SSD PAX Layout PAX Layout is efficient for SSD but not for disk. Why?
Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.
Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.
Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.
Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.
Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.
Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.
Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.
Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.
Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB.
Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion
Scan Approaches • NSMScan – Always read the whole relation. • FlashScan – Read only the related columns. e.g. select S from R where J
Scan Approaches • FlashScanOPT(U) – read only the mini-pages consist the tuples needed. e.g. select S from R where J • FlashScanOPT(S) – Attributes are sorted, so the mini-pages are read at most once.
Scan Approaches Table: 70m tuples, 11columns, 10GB System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM Mtron 32GB SSD
Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion
Join Algorithms – past lessons Block Nested Loops Join Sort-Merge Join Grace Hash Join Hybrid Hash Join
Join Algorithms – past lessons Customer: 450w tuples, 730MB Order: 4500w tuples, 5GB HDD: 5400RPM, 320GB SSD: OCZ Core series 60GB SATA II ☆Algorithms that stress random reads , and avoid random writes as much as possible see bigger improvements on flash
Join Algorithms – RARE-join Select Name, Team from Player, Game where Player.Team=Game.Geam J1 J2 Player Game
Join Algorithms – RARE-join Join Index: Join Result: Total I/O cost: |J1|+ σ1|V1|+|J2|+ σ2|V2|
Join Algorithms – FlashJoin id1,id2,id3 hashG, id1,id2 hashK, id3 id1,id2 hashA, id1 hashD, id2 Read(A) Read(D)
Join Algorithms – Fetch Kernel Join Index: Join Index: Each page is read no more than once.
Join Algorithms – Fetch Kernel Join Index: Join Index:
Join Algorithms – FlashJoin R: 70m tuples, 10GB S: 7m tuples, 1GB System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM Mtron 32GB SSD
Join Algorithms – DigestJoin • Row-based • {JI, idx, idy} • Minimize the IO to fetch the join result
Join Algorithms – Page Fetching(1) • Sort-merge join • Join results are clustered • Memory is enough • Fetch the pages of the tuples as soon as they are produced
Join Algorithms – Page Fetching(2) ft1={A:1, A:2, B:1, B:2} ft2={C:1, C:2, D:1, D:2} jct1={x1,x2,x3,x4} jct2={y1,y2,y3,y4} Fetching instruction table Join candidate table Join Index: (x1,A:1,C:1) (x2,B:1,D:1) (x3,A:2,C:2) (x4,B:2,D:2) ft1={A:1, B:1, A:2, B:2} ft2={C:1, D:1, C:2, D:2}
Join Algorithms – Page Fetching(3) • Join Graph G=(V1 ∪ V2, E) E V1 V2 • Segment e.g. {1, a, b, c}, {a, 1, 2}
Join Algorithms – Page Fetching(3) • Required storage size(RSS) • Required cache size(RCS) • <join_atrr,tid1,tid2>
Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion
Conclusion PAX: • Scan algorithm has little room for improvement. • RARE-Join、FlashJoin. • No write. • Join index will be sorted many times. • The size of minipage is not fixed.
Conclusion Row: • DigestJoin. • IO is much more than other join algorithms. Column: • None • Storage is more flexible. • Utilize the technology of tuple reconstruction.