1 / 35

RAMP Retreat Summer 2006

RAMP Retreat Summer 2006. Break Session Leaders & Questions Greg Gibeling, Derek Chiou, James Hoe, John Wawrzynek & Christos Kozyrakis 6/21/2006. Breakout Topics. RDL & Design Infrastructure RAMP White Caches, Network & IO (Uncore) RAMP2 Hardware BEE3 OS, VM and Compiler Software Stack.

Download Presentation

RAMP Retreat Summer 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAMP Retreat Summer 2006 Break Session Leaders & Questions Greg Gibeling, Derek Chiou, James Hoe, John Wawrzynek & Christos Kozyrakis 6/21/2006

  2. Breakout Topics • RDL & Design Infrastructure • RAMP White • Caches, Network & IO (Uncore) • RAMP2 Hardware • BEE3 • OS, VM and Compiler • Software Stack

  3. RDL & Design Infrastructure • Leader/Reporter: Greg Gibeling • Topics • Features & Schedule • Proposals • Multi-platform migration • Languages • Which languages, priorities • Assignments for support • Debugging – Models & Requirements • Retargeting to ASICs (Platform Optimization)

  4. RDL & DI Notes (1) • Languages • Hardware • Verilog • BlueSpec • IBM uses VHDL • Software? • Multi-Platform • Integration of hardware simulations • Control of multiplexing • Needed for efficiency! • Possible through channel & link parameters • Features • Meta-types • Component (and unit) libraries

  5. RDL & DI Notes (2) • Debugging • Split target model • RDL Target Design • Exposed to a second level of RDL • Allows statistics aggregation • Modeling of noisy channels • Integration with unit internals • Event & State Extraction • Connection to processor debugging tools • People clearly want this ASAP

  6. RDL & DI Notes (3) • Debugging (Integrated) • Message tracing • Causality Diagrams • Framework to debug through units • Checkpoints • Injection • Single stepping • May not be widely used • But cheap to implement • Watch/Breakpoints

  7. RDL & DI Notes (4) • Why Java? • Runs on various platforms • Recompilation is generally pretty painful • Decent type system in Java 1.5 • Perfect for plugin infrastructure (e.g. OSGi) • When to use RDL • Detailed timing model • Great at abstracting inter-chip comm • Perfect platform for partitioning designs • Concise, logical specification • Support for the debugging framework • With standard interfaces, good for sharing

  8. RDL & DI Notes (5) • Basic Infrastructure • First system bringup • Interfaces with workstations • Initial board support • Standard interfaces (RDL and otherwise) • Processor Replacements • Board Support • Currently a heroic effort • Solutions • Standardized components? • Generators?

  9. RDL & DI Notes (6) • Timelines • Greg’s Goals • 10/2006 should see RCF/RDLC3 • 11/2006 should see documentation • Debugging (Integrated) should be ASAP • Manpower • Board support • First board bring up • RDL & RDLC users • Standard interfaces • Features & Documentation

  10. RAMP White • Leader/Reporter: Derek Chiou • Topics • Two day break-out • First day should be pro/con • Overall • Preliminary Plan Evaluation • Who is doing exactly what? • ISA for RAMP White • OpenSPARC • 32bit Leon • PowerPC 405 • Processor agnosticism • Implementation • Reimplementation will be required • Test suites from companies are very useful

  11. RAMP White Notes (1) • Use embedded PowerPC core first • Available • Debugged • Can run full OS today • FPGA chip space is already committed • PowerPC and Sparc are both candidates • PowerPC pros • Embedded processor is PowerPC • Sparc pros • 64b available today • Wait and see on soft-core for RAMP-White • from Derek go here

  12. RAMP White Notes (2) • >= 256 processors • Can buy 64 processors today • Reasonable speed • 10’s of MHz • With 280K LUTs in Virtex 5, assume 50% for processor but 80% for ease of place-and-route • 100K LUTs for processors • Need 4 per FPGA (16 per board, 16 boards) • 25K LUTs per processor

  13. RAMP White Notes (3) • Embedded PowerPC core (it’s there and better performance than any soft-core) • Soft L1 data cache (no L2) • Hard L1 instruction cache • Emulation???? • Ring coherence (a la IBM) • Linux on top of embedded PowerPC core • NSF mount for disk access • Mark’s port of Peh’s and Dally’s router • To do: • Ring coherence + L1 data cache + memory interface • RDL for modules • Software port • Timing models for memory, ring, cache, processor? • integration

  14. RAMP White Notes (4) • RAMP-White Greek • Beta • More general fabric using same router • Still use ring coherence • Gamma • James Hoe’s coherence engine • Delta • Soft core integration

  15. Caches, Networks & IO (Uncore) • Leader/Reporter: James Hoe • Topics • CPU, Cache and Memories • Hybrid FPGA Cosimulation • Network Storage • Interfaces • Especially with respect to interfaces • Components, not sub-frameworks • Phase uncore abilities

  16. Uncore Notes (1) • A fully-system has more than just CPUs and memory • I/O is very important • Getting RAMP to “work” • Just like the real thing (from SW and OS’s perspective) • Software porting/development • Performance studies • Someone has to build the “uncore”? • Co-simulation • Direct HW support for paravirtualization / VM

  17. Uncore Notes (2) • Why make RAMP white generic? • What is a more interesting target system? • What is a more relevant target system? • Building a system without an application in mind? • Would anyone care about RAMP-“vanilla”?

  18. Uncore Notes (3) • Why insist on directory-based CC for 1000 nodes • Today’s large SMPs (at 100+ ways) are actually snoopy-based • Plug in 8-core CMPs, that is a 1000-node snoopy system (that the industry may be more interested it in)

  19. Uncore Notes (4) • Let’s ping down a reference system architecture (including the uncore) • minimum modules required? • optional modules supported? • fix standard interfaces between modules • RDL script for RAMP white?? • Need more than a block diagram for RAMP white

  20. Uncore Notes (5) • Requests and Ideas for RDL • Compensate for skewed raw performance of components (for timing measurements) • Large I/O bandwidth relative to CPU throughput • Need knobs to dial-in different rates for experiments • Some form of HW/SW co-simulation • Built-in performance monitoring

  21. Uncore Notes (6) • Sanity Check • 1000 processing nodes: no problem • I/O: we can fake it somehow • DRAM for 1000 processing node • Not easy to cheat on this one

  22. RAMP2 Hardware (BEE3) • Leader/Reporter: Dan Burke & John Wawrzynek • Topics • Follow up to XUP • Should RAMP embrace XUP at low end? • Inexpensive small systems • Size & scaling of new platform • More than 40 FPGAs? • Technical Questions • Reconsider use of SRAM • DRAM Capacity • Presence of on-board hard CPUs • On-board interfaces (PCI-Express) • Project Questions • Timelines • Definitely need one • Packaging • Pricing (Especially FPGAs) • Design for largest FPGA, change part at solder time? • Evaluation of Chen Chang’s Design

  23. RAMP2 HW Notes (1) • Follow-up to XUP • XUP has been useful to the project, particularly for early development efforts. • Xilinx will continue to design and support new XUP boards • No v4 version planned. • V5 version will be out Q2 next year. • For BEE3 can't really count on V5 FX in 2Q next year. • Perhaps use a separate (AMCC) powerPC processor chip.

  24. RAMP2 HW Notes (2) • Size and Scaling of new platform: • Given potential processor core density issue, will need to plan on a system that can scale past 40 FPGAs. • Better compatibility with new XUP is important: • ex: DRAM standard (better sharing of memory controllers) • USB use Cypress CY7300 for USB compatibility with Xilinx core. • Our design and production of BEE3 is timed to the production of V5 parts. We need to better understand RAMP team schedule for RAMP white. • Hope to be able to choose the package and have flexibility in part sizes and ideally part feature set. • How about a daughterboard for FPGA (DRC approach)?

  25. RAMP2 HW Notes (3) • Technical Questions • Reconsider use of SRAM: group thought SRAM is a bad idea. It is faster, smaller, simpler to interface to. Newer parts will make interfacing simpler. Faster not a big concern for RAMP. Smaller is a big concern. • 8GB DDR2 DIMM modules on the horizon. • A target will be 1 GByte/processor. • Presence of on-board hard CPUs • Are hard cores in FPGAs useful (e.g. PPC405 in V2Pro) • Would commodity chips on PCB be useful (eg for management)

  26. RAMP2 HW Notes (4) • Enclosures: • Using a standard form-factor will help in the with module packaging. • Need to look carefully at IBM blade center (adopted by IBM and Intel) • ATCA is gaining momentum. • Power may be a problem • Can we accomodate custom ASIC integration (perhaps through • a slight generalization of the DRAM interface). • What does Google do for packaging in their data centers? • Is it racks of 1U modules?

  27. RAMP2 HW Notes (5) • Interesting Idea from Chuck Thacker: "Design new board based on need of RAMP White"! • Previously suggested by others • Can we estimate the logic capacity, memory BW, network BW, etc.?

  28. OS, VM & Compiler • Leader/Reporter: Christos Kozyrakis • Topics • Debugging HW and SW (RDL) • Phased approach • Proxy, full kernel, VMMs, Hypervisor • HW/SW schedule and dependencies • High level applications

  29. Software Notes (1) • RAMP milestones • Pick ISA • Deploy basic VMM • Deploy OS

  30. Software Notes (2) • VMM approach: use split VMM system (ala VMware/Xen) • Run full VMM on x86 host that allows access to devices • Run simple VMM on RAMP that communicates with host for devices accesses through some network • A timing model may be used if I/O performed is important • Should talk with Sun & IBM about their VMM systems for Sparc and PowerPC. • May be able to port a very basic Xen system on our own • Questions • Accurate I/O timing with para-virtualization (you also need repeatability) • SW/system-level/IO issues for large scale machine may be more important than coherence • Related Issue: Do we want global cache coherence in white? • Benefit vs complexity (schedule etc)

  31. Software Notes (3) • Separate infrastructure from RAMP • Example: RDL should not be tied to RAMP White • Note: This is in progress with some current RDL applications • Same with BEE3 design work • Most of our tools are applicable to others

  32. Software Notes (4) • Debugging support: RDL-scope • Arbitrary conditions on RDL-level events to trigger debugging • Get traces of messages • Track lineage of messages • Traceability, accountability, relate events to program constructs • Infinite checkpoints for instructions & data • Checkpoint support • Swappable & observable designs • Single step • Instruction, RDL, or cycle level • Note: not always a commonly use feature • Such features may attract people to RDL more than retiming • Note: This is already the case with current RDL applications

  33. Software Notes (5) • What our is schedule • What can we have up and running with 1 year? • Does it have to be RAMP white? • Do we need to migrate RDL maintenance from Greg? • Note: The work should be spread out at least. • Do we have enough manpower for this SW work? • Compiler, VMMs, Applications, etc…

  34. Software Notes (6) • Application Domains • Enterprise/desktop • Full featured OS on all nodes • Running a JVM is a big plus here • Should be able to run webservers, middleware and DBs. • Embedded • While eventually an app may directly control a number of nodes, it is easier to start with all nodes running the OS. • The base design should allow all nodes to run the OS. • Easiest starting point for SW. • Various researchers may decide to run the OS in a subset of nodes, managing the rest of them directly • A simple runtime with app-specific policies • Common in embedded systems

  35. Software Notes (7) • A simple kernel for embedded systems should support • Fast remapping of computation • Protection across processes • Emulation of attached disk • ISCSI + a timing model for disks • RAMP VMM uses: • Attract VMM researchers (might require x86) • Our own convenience • Get an OS running, access to devices etc • We may achieve (b) without (a) • Some researchers will want to turn cache coherence off anyway!

More Related