1 / 40

Notary: Hardware Techniques to Enhance Signatures

Notary: Hardware Techniques to Enhance Signatures. Luke Yen Collaborator: Prof. Stark C. Draper Advisor: Prof. Mark D. Hill University of Wisconsin, Madison MICRO-41 - November 11, 2008 www.cs.wisc.edu/multifacet/papers/micro08_notary.pdf. Executive Summary.

lecea
Download Presentation

Notary: Hardware Techniques to Enhance Signatures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Notary: Hardware Techniques to Enhance Signatures • Luke Yen • Collaborator: Prof. Stark C. Draper • Advisor: Prof. Mark D. Hill • University of Wisconsin, Madison • MICRO-41 - November 11, 2008 • www.cs.wisc.edu/multifacet/papers/micro08_notary.pdf

  2. Executive Summary University of Wisconsin-Madison Tackle 2 problems with hardware signatures: • Problem 1: Best signature hashing (i.e., H3) has high area & power overheads • Solution 1: Use entropy analysis to guide lower-cost hashing (Page-Block-XOR, PBX) that performs similar to H3 • Ex: 160 gates for H3 vs 20 gates for PBX • Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs • Solution 2: Avoid inserting private stack addrs, propose privatization interface for higher performance

  3. Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work

  4. Signature background University of Wisconsin-Madison • Signatures (hardware Bloom filters) used to summarize and detect conflicts with a transaction’s read- and write-sets • Inspired by Bulk system [Ceze,ISCA’06] • Implemented in LogTM-SE [Yen,HPCA’07] • Can have false positives, but never false negatives • Also proposed for non-TM purposes (e.g., SC violation detection, atomicity violation detection, race recording) • Ex: Use k Bloom filters of size m/k, with independent hash functions

  5. Signature hash functions LogTM-SE w/ 2kb signatures • Result: H3 better with >=2 hash functions • However, H3 uses many multi-level XOR trees • Can we improve this? University of Wisconsin-Madison • Which hash function is best? [Sanchez, MICRO’07] • Bit-selection? Hash simply decodes some number of input bits • H3? Each bit of a hash value is an XOR of (on avg.) half of the input address bits

  6. H3 implementation University of Wisconsin-Madison Num XOR Ex: 2kb signatures, k=2, c=10, 32-bit addr = 160 XOR gates per signature Can we reduce the total gate count?

  7. Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work

  8. Entropy overview University of Wisconsin-Madison • Not all address bits have equal randomness • Ex: High-level address bits unlikely to change if working set size is small • Key insight: If input bits are random and those bits are used as inputs to hash functions, random hash values result • Use entropy to measure bit randomness • Entropy – measure of the uncertainty of a random variable x

  9. Entropy formally defined n bits 0 bits Other cases max min Entropy value of n-bit field All bit patterns in n-bit field equally likely n-bit field has constant value University of Wisconsin-Madison • Entropy = • p(xi) = the probability of the occurrence of value xi • N = number of sample values random variable x can take on • Entropy = amount of information required on average to describe outcome of variable x (in bits) • Ex: What is the best possible lossless compression?

  10. Our measures of entropy Local entropy 6 6 31 31 Addr Addr Global entropy NSkip University of Wisconsin-Madison • For our workloads, we care about: • Q1: What is the best achievable entropy? • Global entropy – upper bound on entropy of address • Q2: How does entropy change within an address? • Local entropy – entropy of bit-field within the address

  11. Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work

  12. Entropy results University of Wisconsin-Madison • Workloads to be described later • Global entropy is at most 16 bits • Bit-window for local entropy is 16 bits wide (NSkip from 0-10) • Smaller windows (<16b) may not reach global entropy value • Larger windows (>16b) hides some fine-grain info

  13. Entropy results summary University of Wisconsin-Madison • More entropy results in our MICRO paper • In summary, for our workloads entropy monotonically decreases when moving towards high-order bits • We calculate the average entropy across the entire workload’s execution • May miss entropy changes due to program phase behavior • Our Page-Block-XOR (PBX) hash takes advantage of this overall trend

  14. Page-Block-XOR (PBX) University of Wisconsin-Madison • Motivated by 3 findings: • (1) Lower-order bits have most entropy • Follows from our entropy results • (2) XORing two bit-fields produces random hash values • From prior work on XOR hashing (e.g., data placement in caches, DRAM) • (3) Bit-field overlaps can lead to higher false positives • Correlation between the two bit-fields can reduce the range of hash values produced (worse for larger signatures)

  15. PBX implementation • PPN and Cache-index fields not tied to system params: • Use entropy to find two non-overlapping bit-fields with high randomness University of Wisconsin-Madison • For 2kb signatures with 2 hash functions: • 20 XOR gates for PBX vs 160 XOR gates for H3!

  16. Summary thus far University of Wisconsin-Madison • Problem 1: H3 has high area & power overheads • Solution 1: Use entropy analysis to guide lower-cost PBX • Ex: 160 gates for H3 vs 20 gates for PBX • Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs • Solution 2: To be described

  17. Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work

  18. Motivation University of Wisconsin-Madison • False conflicts caused by thread-private addrs • Avoid conflicts if addrs not inserted in thread’s signatures

  19. Privatization solutions University of Wisconsin-Madison • Two solutions proposed: • (1) Remove private stack references from sigs. • Very little work for programmer/compiler • Benefits depend on fraction of stack addresses versus all transactional references • (2) Language-level interface (e.g., private_malloc(), shared_malloc()) • Even higher performance boost • For skilled programmer • WARNING: Incorrectly marking shared objects as private can lead to program errors!

  20. Page-based implementation University of Wisconsin-Madison • Each page is assigned a status, private or shared • Invariant: Page is shared if any object is shared • If stack is private, library marks stack pages as private • If using privatization heap functions, mark heap pages accordingly

  21. OS support University of Wisconsin-Madison • OS allocates different physical page frames for shared and private pages • Sets a per-frame bit in translation entry if shared • Reduce number of page frames used by packing objects with same status together • Signatures insert memory addresses of transactional references to shared pages • Query page sharing bit in HW TLB & current transactional status

  22. Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work

  23. Methodology University of Wisconsin-Madison • Full-system simulation using Simics and Wisconsin GEMS timing modules • Transistor-level design for area & power of XOR gates • CACTI for Bloom filter bit array area & power • Simulated system • Single-chip CMP • 16 single-threaded,in-order cores • 32kB, 4-way private L1 I & D, write-back • 8MB, 8-way shared L2 cache • MESI directory protocol • Signatures from 64b-64kb (8B-8kB) & “Perfect”

  24. Workloads University of Wisconsin-Madison • Micro-benchmarks • BTree – read and write ops on shared tree • Sparse Matrix – algorithm from dense column vector multiplication kernel • SPLASH-2 apps • Barnes & Raytrace – exert most signature pressure • Stanford STAMP apps • Vacation, Genome, Delaunay, Bayes, Labyrinth • DNS server • BIND

  25. Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work

  26. PBX vs H3 area & power University of Wisconsin-Madison Area & power overheads (2kb, k=4):

  27. PBX vs H3 execution time PBX performs similar to H3 Additional workload results in paper University of Wisconsin-Madison

  28. Privatization results summary University of Wisconsin-Madison • Removing private stack references from signatures did not help much • Most addr references not to stack • Most likely because running with SPARC ISA. Other ISAs (e.g., x86) likely has more benefits • Privatization interface helps four workloads • Remainder either does not have private heap structures or does not have high transactional duty cycle

  29. Privatization interface results University of Wisconsin-Madison

  30. Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work

  31. Conclusions University of Wisconsin-Madison • Tackle 2 problems with signature designs: • (1) Area and power overheads of H3 hashing • E.g., 160 XOR gates for H3, 20 for PBX • (2) False conflicts due to signature bits set by private memory references • Our solutions: • (1) Use entropy analysis to guide hashing function (PBX), a low-cost alternative that performs similarly to H3 • (2) Prevent private stack references from entering signatures, and propose a privatization interface for heap allocations • Notary can be applied to non-TM uses: • PBX hashing can directly transfer • Privatization may transfer if addr filtering applies

  32. Future Work University of Wisconsin-Madison • Dynamic entropy calculation: • How to adapt PBX hashing to entropy changes over time? • Dynamic privatization characteristics: • How common is it for objects to change sharing status (i.e., from private to shared, and vice versa)?

  33. BACKUP SLIDES University of Wisconsin-Madison

  34. Privatization interface University of Wisconsin-Madison

  35. Dynamic privatization University of Wisconsin-Madison • Dynamically switch from private to shared, and vice versa • If transitioning from private -> shared, safe to mark page as shared (at cost of performance) • If transitioning from shared -> private, default policy is to disallow if there exists other shared objects on same page • Otherwise, trap to user software and let programmer call shared_free(), followed by private_malloc() on object

  36. Bit-field overlaps harmful for PBX University of Wisconsin-Madison

  37. Removing stack refs doesn’t help significantly University of Wisconsin-Madison

  38. Entropy of commercial workloads University of Wisconsin-Madison

  39. Signature Operation Example Program: xbegin LD A ST B LD C LD D ST C … External ST E External ST F A C D B FALSE POSITIVE: CONFLICT! ALIAS Hash Function(s) NO CONFLICT 00100100 00000100 00100100 00000000 00100100 00100100 R W 00100010 00000000 00100010 00000010 00100010 University of Wisconsin-Madison

  40. Type of Hash Functions Bit-selection H3 [Carter, CSS79] (inexpensive, low quality) (moderate, higher quality) University of Wisconsin-Madison In real programs, addresses neither independent nor uniformly distributed (key assumptions to derive PFP(n)) But can generate hash values that are almostuniformly distributed and uncorrelated with good (universal/almost universal) hash functions Hash functions considered:

More Related