1 / 19

Overview of a Google Tool Thread Sanitizer v2

Overview of a Google Tool Thread Sanitizer v2. Introduction. Race Detector based on Shadow Memory Faster than Valgrind Intel Parallel Inspector (PIN) Fully parallel No expensive synchronization (atomics/locks) on fast path Scales to huge apps Predictable memory footprint

craig
Download Presentation

Overview of a Google Tool Thread Sanitizer v2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of a Google ToolThread Sanitizer v2

  2. Introduction • Race Detector based on Shadow Memory • Faster than • Valgrind • Intel Parallel Inspector (PIN) • Fully parallel • No expensive synchronization (atomics/locks) on fast path • Scales to huge apps • Predictable memory footprint • Informative reports

  3. Data Race Example script.sh: -------------------------------------- #!/bin/bash for i in {0..10..1} do ./tsan_example sleep 1 done #include <stdio.h> #include <pthread.h> int Global[4]; void *Thread1(void *x) { Global[0] = -1; return NULL; } void *Thread2(void *x) { printf("Global[2] = %d\n", Global[2]); printf("Global[3] = %d\n", Global[3]); return NULL; } void *Thread3(void *x) { printf("Global[0] = %d\n", Global[0]); printf("Global[1] = %d\n", Global[1]); return NULL; } intmain() { for(int i = 0; i < 4; i++) Global[i] = i; pthread_t t[3]; pthread_create(&t[0], NULL, Thread1, NULL); pthread_create(&t[1], NULL, Thread2, NULL); pthread_create(&t[2], NULL, Thread3, NULL); pthread_join(t[0], NULL); pthread_join(t[1], NULL); pthread_join(t[2], NULL); return 0; } C/C++ Program: Thereisa data race on the global vector, indeeddepending on the threadscheduling, T1 can write Global[0] before T3 readit or viceversa, printingdifferentvalues.

  4. Data Race Example Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[0] = -1 Global[1] = 1 Global[2] = 2 Global[3] = 3 ---------- Global[0] = 0 Global[1] = 1 Global[2] = 2 Global[3] = 3 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[0] = 0 Global[1] = 1 Global[2] = 2 Global[3] = 3 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ----------

  5. Tsan: Data Race Example Global[2] = 2 Global[3] = 3 ================== WARNING: ThreadSanitizer: data race (pid=25893) Read of size 4 at 0x7f5ab4c16cc0 by thread T3: #0 Thread3(void*) /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:18 (exe+0x0000000657b9) Previouswrite of size 4 at 0x7f5ab4c16cc0 by thread T1: #0 Thread1(void*) /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:7 (exe+0x000000065739) Thread T3 (tid=25901, running) created by mainthreadat: #0 pthread_create /home/simone/works/projects/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:820 (exe+0x0000000248e3) #1 main /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:29 (exe+0x00000006586e) Thread T1 (tid=25899, finished) created by mainthreadat: #0 pthread_create /home/simone/works/projects/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:820 (exe+0x0000000248e3) #1 main /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:27 (exe+0x00000006583e) SUMMARY: ThreadSanitizer: data race /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:18 Thread3(void*) ================== Global[0] = -1 Global[1] = 1 ---------- ThreadSanitizer: reported 1 warnings

  6. How ThreadSanitizerworksCompiling Time • Instrumenteverymemoryaccess in the programprepending a function call: • __tsan_read4(addr) • Atomicmemoryaccessusing: • __tsan_atomic_callbacks • Read from vtable: • __tsan_vptr_update • Function entry and exit: • __tsan_func_entry(caller_pc) • __tsan_func_exit. • Inizialization: • __tsan_init

  7. ThreadSanitizer: Algorithm • Direct ShadowMapping (64-bit linux) • Shadow = 4 * (Addr & kMask); Application 0x7fffffffffff 0x7f0000000000 Protected 0x7effffffffff 0x200000000000 Shadow 0x1fffffffffff 0x180000000000 Protected 0x17ffffffffff 0x00000000000

  8. How ThreadSanitizerworksRun-Time Library • Shadow Cell • 64 bits word, represents a single memoryaccess (happened) to a subset of byteswithin the 8-byte word of applicationmemory • ShadowStates • NShadowWords(2, 4, or 8: represents the numberof accesses to the correspondingapplicationmemoryregion by the threads)

  9. ThreadSanitizer: Algorithm • State Machine • Core of the algorithmthatupdates the Shadow State on everymemoryaccess • Steps: • Thread’s clock isincremented and a new Shadow Word (corresponding to the currentmemoryaccess) iscreated • State Machine iterates over allShadowWordsin the Shadow State: ifone of the ShadowWordsconsitutes a race with the new Shadow Word a warningwill be reported • The new Shadow Word isinserted in place of an emptyShadow Word or in place of a Shadow Word happened-before the new one (if no space a random Shadow Word isevicted)

  10. ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) TID TID TID TID Epoch Epoch Epoch Epoch Pos Pos Pos Pos IsW IsW IsW IsW • Program with 3 threads

  11. ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) T1 First Access E1 0:2 Write in thread T1 W T1 write 2 bytes on a memory location • Program with 3 threads

  12. ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) T2 T1 Second Access E2 E1 4:8 0:2 Read in thread T2 R W T2 read 4 bytes from anothermemory location • Program with 3 threads

  13. ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) T2 T1 T3 Third Access E3 E1 E2 0:4 0:2 4:8 Read in thread T3 W R R T3 read 4 bytes from a memory location, part of that (2 bytes) waspreviouslywritten by T1 • Program with 3 threads

  14. ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) T2 T1 T3 E2 E1 E3 4:8 0:2 0:4 R W R Thereis a RACEbecausethereisnot an “happenbefore” relation, betweenE1 and E3, E1 || E3 • Program with 3 threads

  15. ThreadSanitizer: Algorithm defHandleMemoryAccess(addr, tid, is_write, size, pc): shadow_address= MapApplicationToShadow(addr) IncrementThreadClock(tid) LogEvent(tid, pc); new_shadow_word= {tid, CurrentClock(tid), is_write, size, addr & 7} store_word= new_shadow_word for i in 1..N: UpdateOneShadowState(shadow_address, i, new_shadow_word, store_word) ifstore_word: # Evict a random Shadow Word shadow_address[Random(N)] = store_word  # Atomic

  16. ThreadSanitizer: Algorithm defUpdateOneShadowState(shadow_address, i, new_shadow_word, store_word): idx= (i + new_shadow_word.offset) % N old_shadow_word= shadow_address[idx]  # Atomic ifold_shadow_word == 0: # The old state isempty ifstore_word: StoreIfNotYetStored(shadow_address[idx], store_word) return ifAccessedSameRegion(old_shadow_word, new_shadow_word): ifSameThreads(old_shadow_word, new_shadow_word): StoreIfNotYetStored(shadow_address[idx], store_word) return else:  # Differentthreads ifnotHappensBefore(old_shadow_word, new_shadow_word): ReportRace(old_shadow_word, new_shadow_word) elifAccessedIntersectingRegions(old_shadow_word, new_shadow_word): ifnotSameThreads(old_shadow_word, new_shadow_word) ifnotHappensBefore(old_shadow_word, new_shadow_word) ReportRace(old_shadow_word, new_shadow_word) else: # regionsdidnotintersect pass # do nothing

  17. ThreadSanitizer: Algorithm • Constant-time operation • Get TID and Epoch from the Shadow Cell • 1 load from thread-localstorage • 1 comparison • Similar idea to FastTrack

  18. ThreadSanitizer: Algorithm • Stack Trace for previousaccess • Per-threadcyclic buffer of event • 64 bit per event (type + PC) • Events: memoryaccess (read/write), function entry/exit • Information will be lostaftersometime (cyclic buffer) • Buffer sizeisconfigurable • Functioninterceptors • malloc, free, … • pthread_mutex, lock, … • strlen, memcmp, … • read, write, …

  19. Pros and Cons • Pros • Speed, >10x fasterthanothertools • Native support for atomicsoperations • Numbers: 200+ races in google server-side apps, 80+ in Go programs and lib, severalraces in SSL • Cons • Only 64-bit Linux • Hard to port to 32-bit platforms (small address-spaces)

More Related