1 / 15

SCC Development Experiences

SCC Development Experiences. Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th , 2011. Overview. Black Cloud OS: A fork of Singularity OS Our playground for experimenting with message passing in non-cache coherent environment

saniya
Download Presentation

SCC Development Experiences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SCC Development Experiences Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30th, 2011

  2. Overview • Black Cloud OS: • A fork of Singularity OS • Our playground for experimenting with message passing in non-cache coherent environment • This presentation covers only our development experiences on the SCC • Submission of the paper is on its way

  3. What is Singularity? • A quote from Singularity home page: “A research operating system prototype, extending programming languages, and developing new techniques and tools for specifying and verifying program behavior” • Written in managed code • Some Assembler and C++ in the boot loader and kernel • IPC and inter-component communications are based on passing messages

  4. Our setup Tile Tile Tile Tile Tile Tile R R R R R R Tile Tile Tile Tile Tile Tile DDR3 MC DDR3 MC R R R R R R Tile Tile Tile Tile Tile Tile PCI-E R R R R R R Management Console (Linux) sccTcpServer/mceGui TCP/IP Desktop PC (Windows) RcLoader.Net, KdProxy, WinDbg, etc. Tile Tile Tile Tile Tile Tile DDR3 MC DDR3 MC R R R R R R VRC System Interface

  5. RcLoader.Net • Configuration • Generates the system memory map • Configures the SCC registers • Uploads the boot loader and OS images • Supports manual editing of the SCC configuration • Debugging • Allows inspecting the memory and configuration registers

  6. The memory map Shared memory (OS image, the initial jmp) 0xFC000000 – 0xFFFFFFFF Unused Shared memory buffers (256KB per core) 0xC0000000 – 0xC3FFFFFF Configuration space 0xA0000000 – 0xB7FFFFFF MPB (16KB per tile) 0x80000000 – 0x97FFFFFF Unused Private Memory (336 MB - 1360 MB) 0x00000000 - up to 0x54FFFFFF

  7. Debugging challenges • No serial port or console • Memory at 0xb8000 is the console buffer • I/O redirection doesn’t work as expected • Execution of IN or OUT instruction effectively halts the core and sccTcpServer • Serial KD transport is emulated • A couple of ring buffers on the SCC side • KdProxy.exe exposes a named pipe interface for the debugger

  8. Porting challenges • No BIOS • The system memory map is patched directly in the boot loader • No standard devices • Local APIC is used instead of i8254 timer and PIC • No RTC clock • No modern instruction supported • Context handling code was updated due to lack of MMX • 32bit flavor of Singularity uses only x87 for floating point calculations • Bartok compiler was patched due to lack of CMOV instructions

  9. Experimental hardware • Turning on MPB bypass bit causes a race causing memory corruptions • Minus three days of debugging :-) • We couldn’t take advantage of fast MPB access • Large pages cannot be used together with MPB • Singularity uses large pages to create the identity mapping spanning 4GB

  10. Interface • A telnet connection to each core • The same serial transport emulation via KdProxy.exe was used

  11. Cache coherency matters • A read-only OS image is shared among all cores • Message passing code uses MPB-mapped buffers and CL1FLUSH-aware memcpy() • Large shared memory storage is accessible via dynamically remapped LUTs • R/W access is possible with proper cache flushing and/or caching settings in PTEs

  12. Performance • Core’s memory interface bandwidth is limited • One outstanding memory operation

  13. Performance • Memory controller bandwidth is limited

  14. Conclusions • The SCC is an experimental platform tailored for message passing • Lack of cache coherency makes us think hard how about message passing • The chip has enough cores to play with scalability • Compare apples to apples • The cache and memory subsystems are significantly different • The SCC is super parallel, not super fast

  15. Q&A

More Related