1 / 47

Applying Thread Level Speculation to Database Transactions

Applying Thread Level Speculation to Database Transactions. Chris Colohan (Adapted from his Thesis Defense talk). Chip Multiprocessors are Here!. 2 cores now, soon will have 4, 8, 16, or 32 Multiple threads per core How do we best use them?. AMD Opteron. IBM Power 5. Intel Yonah.

wfranco
Download Presentation

Applying Thread Level Speculation to Database Transactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying Thread Level Speculation to Database Transactions Chris Colohan (Adapted from his Thesis Defense talk)

  2. Chip Multiprocessors are Here! • 2 cores now, soon will have 4, 8, 16, or 32 • Multiple threads per core • How do we best use them? AMD Opteron IBM Power 5 Intel Yonah

  3. Transactions DBMS Database Multi-Core Enhances Throughput Users Database Server Cores can run concurrent transactions and improve throughput

  4. Transactions DBMS Database Multi-Core Enhances Throughput Users Database Server Can multiple cores improve transaction latency?

  5. Parallelizing transactions DBMS SELECT cust_info FROM customer; UPDATE district WITH order_id; INSERT order_id INTO new_order; foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line; } • Intra-query parallelism • Used for long-running queries (decision support) • Does not work for short queries • Short queries dominate in commercial workloads

  6. Parallelizing transactions DBMS SELECT cust_info FROM customer; UPDATE district WITH order_id; INSERT order_id INTO new_order; foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line; } • Intra-transaction parallelism • Each thread spans multiple queries • Hard to add to existing systems! • Need to change interface, add latches and locks, worry about correctness of parallel execution…

  7. Parallelizing transactions DBMS SELECT cust_info FROM customer; UPDATE district WITH order_id; INSERT order_id INTO new_order; foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line; } • Intra-transaction parallelism • Breaks transaction into threads • Hard to add to existing systems! • Need to change interface, add latches and locks, worry about correctness of parallel execution… Thread Level Speculation (TLS) makes parallelization easier.

  8. Epoch 1 Epoch 2 Time =*p =*p =*q =*q Thread Level Speculation (TLS) *p= *p= *q= *q= =*p =*q Sequential Parallel

  9. Time =*p =*q Thread Level Speculation (TLS) Epoch 1 Epoch 2 • Use epochs • Detect violations • Restart to recover • Buffer state • Worst case: • Sequential • Best case: • Fully parallel =*p Violation! *p= *p= R2 *q= *q= =*p =*q Sequential Parallel Data dependences limit performance.

  10. A Coordinated Effort TPC-C Transactions DBMS BerkeleyDB Hardware Simulated machine

  11. A Coordinated Effort Choose epoch boundaries TransactionProgrammer DBMS Programmer Remove performance bottlenecks Hardware Developer Add TLS support to architecture

  12. Outline • Introduction • Related work • Dividing transactions into epochs • Removing bottlenecks in the DBMS • Hardware Support • Results • Conclusions

  13. 78% of transaction execution time Case Study: New Order (TPC-C) GET cust_info FROM customer; UPDATE district WITH order_id; INSERT order_id INTO new_order; foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line; } • Only dependence is the quantity field • Very unlikely to occur (1/100,000)

  14. Case Study: New Order (TPC-C) GET cust_info FROM customer; UPDATE district WITH order_id; INSERT order_id INTO new_order; foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line; } GET cust_info FROM customer; UPDATE district WITH order_id; INSERT order_id INTO new_order; TLS_foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line; }

  15. Outline • Introduction • Related work • Dividing transactions into epochs • Removing bottlenecks in the DBMS • Hardware support • Results • Conclusions

  16. Time Dependences in DBMS

  17. Time Dependences in DBMS Dependences serialize execution! Performance tuning: • Profile execution • Remove bottleneck dependence • Repeat

  18. Buffer Pool Management CPU get_page(5) put_page(5) Buffer Pool ref: 1 ref: 0

  19. get_page(5) put_page(5) Time Buffer Pool Management CPU get_page(5) get_page(5) put_page(5) put_page(5) get_page(5) Buffer Pool put_page(5) TLS ensures first epoch gets page first. Who cares? ref: 0

  20. Time = Escape Speculation Buffer Pool Management • Escape speculation • Invoke operation • Store undo function • Resume speculation CPU get_page(5) get_page(5) get_page(5) put_page(5) put_page(5) put_page(5) get_page(5) Buffer Pool put_page(5) ref: 0

  21. get_page() wrapper page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret; tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut); ret = get_page(id); tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation() return ret; }  Wrapsget_page()

  22. get_page() wrapper page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret; tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut); ret = get_page(id); tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation() return ret; }  No violations while calling get_page()

  23. get_page() wrapper page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret; tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut); ret = get_page(id); tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation() return ret; }  May get bad input data from speculative thread!

  24. get_page() wrapper page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret; tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut); ret = get_page(id); tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation() return ret; }  Only one epoch per transaction at a time

  25. get_page() wrapper page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret; tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut); ret = get_page(id); tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation() return ret; }  How to undo get_page()

  26. get_page() wrapper • Isolated • Undoing this operation does not cause cascading aborts • Undoable • Easy way to return system to initial state • Can also be used for: • Cursor management • malloc() page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret; tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut); ret = get_page(id); tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation() return ret; }

  27. put_page(5) put_page(5) put_page(5) Time = Escape Speculation Buffer Pool Management CPU get_page(5) get_page(5) get_page(5) put_page(5) get_page(5) Buffer Pool Not undoable! ref: 0

  28. put_page(5) put_page(5) Time = Escape Speculation Buffer Pool Management CPU get_page(5) • Delayput_page until end of epoch • Avoid dependence get_page(5) get_page(5) put_page(5) Buffer Pool ref: 0

  29. Removing Bottleneck Dependences We introduce three techniques: • Delay operations until non-speculative • Mutex and lock acquire and release • Buffer pool, memory, and cursor release • Log sequence number assignment • Escape speculation • Buffer pool, memory, and cursor allocation • Traditional parallelization • Memory allocation, cursor pool, error checks, false sharing

  30. Outline • Introduction • Related work • Dividing transactions into epochs • Removing bottlenecks in the DBMS • Hardware support • Results • Conclusions

  31. Time Concurrenttransactions TLS in Database Systems • Large epochs: • More dependences • Must tolerate • More state • Bigger buffers Non-Database TLS TLS in Database Systems

  32. think I know this is parallel! par_for() { do_work(); } Must…Make…Faster feed back feed back feed back feed back feed back feed back feed back feed back feed back feed back feed Feedback Loop for() { do_work(); }

  33. Time Must…Make…Faster =*p 0x0FD8 0xFD20 0x0FC0 0xFC18 =*q Violations == Feedback =*p Violation! *p= *p= R2 *q= *q= =*p =*q Sequential Parallel

  34. =*p Violation! Violation! *p= R2 =*q *q= *q= Time =*q =*p Optimization maymake slower? =*q Parallel Eliminate *p Dep. Eliminating Violations 0x0FD8 0xFD20 0x0FC0 0xFC18

  35. Violation! =*q *q= Time =*q =*q =*q Eliminate *p Dep. Tolerating Violations: Sub-epochs Violation! *q= Sub-epochs

  36. =*q =*q Sub-epochs • Started periodically by hardware • How many? • When to start? • Hardware implementation • Just like epochs • Use more epoch contexts • No need to check violations between sub-epochs within an epoch Violation! *q= Sub-epochs

  37. L1 $ L1 $ L1 $ L1 $ Old TLS Design Buffer speculative state in write back L1 cache CPU CPU CPU CPU L1 $ L1 $ L1 $ L1 $ Restart by invalidating speculative lines Invalidation Detect violations through invalidations • Problems: • L1 cache not large enough • Later epochs only get values on commit L2 $ Rest of system only sees committed data Rest of memory system

  38. L1 $ L1 $ L1 $ L1 $ New Cache Design CPU CPU CPU CPU Speculative writes immediately visible to L2 (and later epochs) L1 $ L1 $ L1 $ L1 $ Restart by invalidating speculative lines Buffer speculative and non-speculative state for all epochs in L2 L2 $ L2 $ Invalidation Detect violations at lookup time Rest of memory system Invalidation coherence between L2 caches

  39. L1 $ L1 $ L1 $ L1 $ New Features New! CPU CPU CPU CPU Speculative state in L1 and L2 cache L1 $ L1 $ L1 $ L1 $ Cache line replication (versions) L2 $ L2 $ Data dependence tracking within cache Speculative victim cache Rest of memory system

  40. Outline • Introduction • Related work • Dividing transactions into epochs • Removing bottlenecks in the DBMS • Hardware support • Results • Conclusions

  41. CPU CPU CPU CPU 32KB 4-way L1 $ 32KB 4-way L1 $ 32KB 4-way L1 $ 32KB 4-way L1 $ 2MB 4-way L2 $ Rest of memory system Experimental Setup • Detailed simulation • Superscalar, out-of-order, 128 entry reorder buffer • Memory hierarchy modeled in detail • TPC-C transactions on BerkeleyDB • In-core database • Single user • Single warehouse • Measure interval of 100 transactions • Measuring latency not throughput

  42. Idle CPU Violated Cache Miss Busy Locks B-Tree Logging Latches Buffer Pool Malloc/Free Error Checks False Sharing Cursor Queue No Optimizations Optimizing the DBMS: New Order 1.25 26% improvement 1 0.75 Time (normalized) Other CPUs not helping 0.5 Can’t optimize much more Cache misses increase 0.25 0 Sequential

  43. Idle CPU Violated Cache Miss Busy Locks B-Tree Logging Latches Buffer Pool Malloc/Free Error Checks False Sharing Cursor Queue No Optimizations Optimizing the DBMS: New Order 1.25 1 0.75 Time (normalized) 0.5 0.25 0 This process took me 30 days and <1200 lines of code. Sequential

  44. 3/5 Transactions speed up by 46-66% Other TPC-C Transactions 1 0.75 Idle CPU Violated Time (normalized) Cache Miss 0.5 Busy 0.25 0 New Order Delivery Stock Level Payment Order Status

  45. Idle CPU Violated Latch Stall Cache Miss Busy Scaling 1 0.75 Time (normalized) 0.5 0.25 0 Seq. 2 CPUs 4 CPUs 8 CPUs

  46. Idle CPU Violated Latch Stall Cache Miss Busy Scaling New Order 150 1 0.75 Time (normalized) 0.5 0.25 0 Seq. 2 CPUs 4 CPUs 8 CPUs Seq. 2 CPUs 4 CPUs 8 CPUs

  47. Conclusions • A new form of parallelism for databases • Tool for attacking transaction latency • Intra-transaction parallelism • Without major changes to DBMS • With feasible new hardware • TLS can be applied to more than transactions • Halve transaction latency by using 4 CPUs

More Related