an integrated hardware software approach to transactional memory
Download
Skip this Video
Download Presentation
An Integrated Hardware-Software Approach to Transactional Memory

Loading in 2 Seconds...

play fullscreen
1 / 16

An Integrated Hardware-Software Approach to Transactional Memory - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

An Integrated Hardware-Software Approach to Transactional Memory. Sean Lie 6.895 Theory of Parallel Systems Monday December 8 th , 2003. if (i<j) { a = i; b = j; } else { a = j; b = i; } Lock(L[a]); Lock(L[b]); Flow[i] = Flow[i] – X; Flow[j] = Flow[j] + X;

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' An Integrated Hardware-Software Approach to Transactional Memory' - unity-palmer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an integrated hardware software approach to transactional memory

An Integrated Hardware-Software Approach to Transactional Memory

Sean Lie

6.895 Theory of Parallel Systems

Monday December 8th, 2003

transactional memory
if (i<j) {

a = i; b = j;

} else {

a = j; b = i; }

Lock(L[a]); Lock(L[b]);

Flow[i] = Flow[i] – X;

Flow[j] = Flow[j] + X;

Unlock(L[b]); Unlock(L[a]);

StartTransaction;

Flow[i] = Flow[i] – X;

Flow[j] = Flow[j] + X;

EndTransaction;

Transactional Memory
  • Transactional memory provides atomicity without the problems associated with locks.

Locks

Transactional Memory

  • I propose an integrated hardware-software approach to transactional memory.
hardware transactional memory
HTM: Transactional memory can be implemented in hardware using the cache and cache coherency mechanism. [Herlihy & Moss]

Uncommitted transactional data is stored in the cache.

Transactional data is marked in the cache using an additional bit per cache line.

HTM has very low overhead but has size and length limitations.

Trans Tag Status

Data

1

0

1

0

Hardware Transactional Memory
software transactional memory
STM: Transactional memory can be implemented in software using compiler and library support.

Uncommitted transactional data is stored in a copy of the object.

Transactional data is marked by flagging the object field.

STM does not have size or length limitations but has high overhead.

Original Object

Trans Version

Readers List

NULL

248

NULL

NULL

‘a’

FLAG

48

77

NULL

NULL

“Mass. Ave.”

Software Transactional Memory

FLEX Software

Transaction System

results
Results
  • An integrated approach gives the best of both worlds.
    • Common case:
      • HTM mode - Small/short transactions run fast.
    • Uncommon case:
      • STM mode - Large/long transactions are slower but possible.
  • An integrated hardware-software transactional memory system was implemented and evaluated.
    • HTM was implemented in the UVSIM software simulator.
    • A subset of STM functionality was implemented for the benchmark applications.
    • HTM was modified to be software-compatible.
hardware vs software
Hardware vs. Software
  • HTM has much lower overhead than STM.
  • A network flow µbenchmark (node-push) was implemented for evaluating overhead.
  • However, HTM has 2 serious limitations.

1 processor overheads:

“worst” case:

Back-to-back small transactions

“more realistic” case:

Some processing between small transactions

hardware limitation cache capacity
HTM uses the cache to hold all transactional data.

Therefore, HTM aborts transactions larger than the cache.

Restricting transaction size is awkward and not modular.

Size will depend on associativity, block size, etc. in addition to cache size.

Cache configuration change from processor to processor.

Trans Tag Status

Data

1

1

1

1

Hardware Limitation:Cache Capacity
hardware limitation context switches
The cache is the only transactional buffer for all threads.

Therefore, HTM aborts transactions on context switches.

Restricting context switches is awkward and not modular.

Context switches occur regularly in modern systems (e.g.. TLB exceptions).

Cache

Processor

Trans data for thread a

Thread a

Thread b

Hardware Limitation:Context Switches
hstm an integrated approach
Transactions are switched from HTM to STM when necessary.

When a transaction aborts in HTM, it is restart in STM.

HTM is modified to be software-compatible.

Trans Tag Status

Data

1

0

Readers List

248

‘a’

FLAG

1

0

77

“Mass. Ave.”

HSTM: An Integrated Approach
software compatible htm
1 new instruction: xAbort

Additional checks are performed in software:

On loads, check if the memory location is set to FLAG.

On stores, check if there is a readers list.

Software checks are slow.

Original Object

Store Check

null ?

Normal Memory

Operation

Y

Readers List

Load Check

248

flag ?

N

‘a’

FLAG

N

Y

Abort

77

“Mass. Ave.”

Software-Compatible HTM
  • Performing checks adds a 2.2x performance overhead over pure HTM in “worst” case (1.1x in “more realistic” case).
overcoming size limitations
Overcoming Size Limitations
  • The node-push benchmark was modified to touch more nodes to evaluate size limitations.
  • HSTM uses HTM when possible and STM when necessary.

HTM Transactions stop fitting after this point

overcoming size limitations1
Overcoming Size Limitations
  • The node-push benchmark was modified to touch more nodes to evaluate size limitations.
  • HSTM uses HTM when possible and STM when necessary.

HTM Transactions stop fitting after this point

overcoming context switching limitations
Overcoming Context Switching Limitations
  • Context switches occur on TLB exceptions.
  • The node-push benchmark was modified to choose from a larger set of nodes.
    • More nodes  higher probability of TLB miss (Pabort).
  • HSTM behaves like HTM when Pabort is low and like STM when Pabortis high.
overcoming context switching limitations1
Overcoming Context Switching Limitations
  • Context switches occur on TLB exceptions.
  • The node-push benchmark was modified to choose from a larger set of nodes.
    • More nodes  higher probability of TLB miss (Pabort).
  • HSTM behaves like HTM when Pabort is low and like STM when Pabortis high.
conclusions
Conclusions
  • An integrated approach gives the best of both worlds.
    • Common case:
      • HTM mode - Small/short transactions run fast.
    • Uncommon case:
      • STM mode - Large/long transactions are slower but possible.
  • Trade-offs:
    • “STM mode” is not has fast as pure STM.
      • This is acceptable since it is uncommon.
    • “HTM mode” is not has fast as pure HTM.
      • Is this acceptable?
future work
Future Work
  • Full implementation of STM in UVSIM
  • Integration of software-compatible HTM into the FLEX compiler
  • Evaluate how software-compatible HTM performs for parallel applications
  • Should software-compatible modifications be moved into hardware?
  • Can a transaction be transferred from hardware to software during execution?
ad