1 / 49

Solving Difficult HTM Problems Without Difficult Hardware

Solving Difficult HTM Problems Without Difficult Hardware. Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University of Texas at Austin. Intro. Processors now scaling via cores, not clock rate 8 cores now, 64 tomorrow, 4096 next week?

Download Presentation

Solving Difficult HTM Problems Without Difficult Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Solving Difficult HTM Problems Without Difficult Hardware • Owen Hofmann, Donald Porter, Hany Ramadan, • Christopher Rossbach, and Emmett Witchel • University of Texas at Austin

  2. Intro • Processors now scaling via cores, not clock rate • 8 cores now, 64 tomorrow, 4096 next week? • Parallel programming increasingly important • Software must keep up with hardware advances • But parallel programming is really hard • Deadlock, priority inversion, data races, etc. • STM is here • We would like HTM

  3. Difficult HTM Problems • Enforcing atomicity and isolation requires conflict detection and rollback • TM Hardware only applies to memory and processor state • I/O, System calls may have effects that are not isolated, cannot be rolled back mov $norad, %eax mov $canada, %ebx launchmissiles %eax, %ebx

  4. Outline • Handling kernel I/O with minimal hardware • User-level system call rollback • Conclusions

  5. Outline • Handling kernel I/O with minimal hardware • User-level system call rollback • Conclusions

  6. cxspinlocks • Cooperative transactional spinlocks allow kernel to take advantage of limited TM hardware • Optimistically execute with transactions • Fall back on locking when hardware TM is not enough • I/O, page table operations • overflow? • Correctness provided by isolation of lock variable • Transactional threads read lock • Non-transactional threads write lock

  7. cxspinlock guarantees • Multiple transactional threads in critical region • Non-transactional thread excludes all others

  8. read read write write cxspinlocks in action lockA: unlocked Thread 1 cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2 cx_optimistic(lockA); modify_data(); cx_end(lock);

  9. read read write write lockA lockA cxspinlocks in action • Transactional threads read unlocked lock variable lockA: unlocked Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  10. read read write write lockA lockA cxspinlocks in action • Transactional threads read unlocked lock variable lockA: unlocked Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  11. read read write write lockA lockA cxspinlocks in action • Transactional threads read unlocked lock variable lockA: unlocked Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock);

  12. read read write write cxspinlocks in action lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock);

  13. read read write write cxspinlocks in action lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock);

  14. read read write write lockA lockA cxspinlocks in action lockA: unlocked Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  15. read read write write lockA lockA cxspinlocks in action lockA: unlocked Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  16. read read write write lockA lockA cxspinlocks in action lockA: unlocked Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  17. read read write write lockA lockA cxspinlocks in action • Hardware restarts transactions for I/O lockA: unlocked Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  18. read read write write lockA cxspinlocks in action lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  19. read read write write lockA cxspinlocks in action • contention managed CAS lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  20. read read write write lockA cxspinlocks in action • contention managed CAS lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  21. read read write write cxspinlocks in action • contention managed CAS lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock);

  22. read read write write cxspinlocks in action lockA: locked Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock);

  23. read read write write cxspinlocks in action lockA: locked Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock);

  24. read read write write cxspinlocks in action lockA: locked Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock);

  25. read read write write cxspinlocks in action • conditionally add variable to read set lockA: locked Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  26. read read write write cxspinlocks in action • conditionally add variable to read set lockA: locked Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  27. read read write write cxspinlocks in action • conditionally add variable to read set lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  28. read read write write lockA cxspinlocks in action lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  29. read read write write lockA cxspinlocks in action lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock);

  30. read read write write cxspinlocks in action lockA: unlocked Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock);

  31. Implementing cxspinlocks • Return codes: Correctness • Hardware returns status code from xbegin to indicate when hardware has failed (I/O) • xtest: Performance • Conditionally add a memory cell (e.g. lock variable) to the read set based on its value • xcas: Fairness • Contention managed CAS • Non-transactional threads can wait for transactional threads • Simple hardware primitives support complicated behaviors without implementing them

  32. Outline • Handling kernel I/O with minimal hardware • User-level system call rollback • Conclusions

  33. User-level system call rollback • Open nesting requires user-level syscall rollback • Many calls have clear inverses • mmap, munmap • Even simple calls have many side effects • e.g. file write • Even simple calls might be irreversible

  34. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  35. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  36. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  37. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  38. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  39. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  40. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  41. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  42. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  43. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  44. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  45. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  46. Users can’t always roll back • fd_A = open(“file_A”); • unlink(“file_A”); • void *map_A = • mmap(fd=fd_A, • size=4096); • close(fd_A); • xbegin; • modify_data(); • fd_B = open(“file_B”); • xbegin_open; • void *map_B = • mmap(fd=fd_B, • start=map_A, • size=4096); • xend_open( • abort_action=munmap); • xrestart;

  47. Kernel participation required • Users can’t track all syscall side effects • Must also track all non-tx syscalls • Kernel must manage transactional syscalls • Kernel enhances user-level programming model • Seamless transactional system calls • Strong isolation for system calls?

  48. Related Work • Other I/O solutions • Hardware open nesting [Moravan 06] • Unrestricted transactions [Blundell 07] • Transactional Locks • Speculative lock elision [Rajwar 01] • Transparently Reconciling Transactions & Locks [Welc 06] • Simple hardware models • Hybrid TM [Damron 06, Shriraman 07] • Hardware-accelerated STM [Saha 2006]

  49. Conclusions • Kernel can use even minimal hardware TM designs • May not get full programming model • Simple primitives support complex behavior • User-level programs can’t roll back system calls • Kernel must participate

More Related