Multicore Strategy

Multicore Strategy Pete Wilson • Kiva Design • pete@kivadesigngroupe.com

Overview • Multicore Opportunities and Problems - Overview • Customer Issues • Vendor Business Issues • Vendor Technology Issues • Strategy Approach • How much will it cost?

Background: Multicore Opportunities and Risks

Key Customer Issues • Standards • Acceptability of new technologies • Converting sequential/existing code • Learning, Tools & Vendor Support • Tuning

Customer Issues: Software Standards • Customer Resistance • customers will resist changing tools, approaches, languages, or designs in the absence of software standards supported by toolchain vendors • By definition, standards lag technology • Since widespread creation of highly-concurrent software has yet to happen, current deployable technology is at best immature • Thus there is a challenge to the creation of software standards • Good software standards should: • First, do no harm • Secondly, be open to evolution • Thirdly, to be expressible as standard APIs even if clearer, safer, more performant as language extensions • Define a “Concurrency Abstraction Layer”? • Evolve - and be replaced - over time • Standards Activities • Investigation of standards bodies active in this area is needed, followed by embracing a select few and active involvement in driving standards and tracking directions.

Customer Issues: New Technologies • Multicore implies widespread and possibly deep changes to software design and implementation • Customer buy-in will be a plus • Positioning vendor as a multicore leader through its commitment to tools, vendors and standards will likely be beneficial • Technology discussions with customers can lay a foundation • What the company is trying to do • Describing investment areas • Explain software approaches under investigation and our toolset for measuring, experimenting, simulating.. • Explain architectural possibilities • System level, processor level, language, compiler… • .. And provide useful feedback on “hot buttons” • The good things as well as the bad things • ..and provide an opportunity for obtaining useful application information • What are they trying to get multicore machinery to do? • What are the barriers to their success? • What tools, technologies, approaches, heuristics… would help them do what they want to do?

Customer Issues: Legacy Conversion • Customers tend to have a lot of working, qualified code tuned to run on (mostly) a single core or processor • Future cost-effective high-performance machinery will at best run this code at more or less current performance levels. • The customer will want a tool which takes this legacy code and refactors it to be able to leverage increasing numbers of cores • Initially, such a tool will only need to scale from one to two or perhaps four cores, but over time the number of cores will increase at roughly Moore’s Law rates • For appropriate nested loops, automatic compiler parallelisation is a good fix • Offered by several vendors • But, unfortunately an automatic conversion tool seems totally impractical for control-oriented code • It may be feasible, however, to identify a number of “concurrency templates” which provide heuristic guidance on how to repartition a given shape of app into multi-core suitability • Conversion will need to be done manually • And it may be feasible to develop a tool which can provide some confidence to the customer that the meaning of the new software is the same as that of the original version • Investment needed in these areas, both internally and in partnership with key toolchain vendors

Customer Issues: Vendors, Tools & Learning • Customers will rely on tools vendors to provide appropriate tools. But they will also need training in the implications of the new hardware, design approaches, debug, tuning… • Having appropriate training materials implies the generation of basic knowledge allowing the building of training materials • Not just semi vendor-generated material, but books, papers etc written by others • Standards will play an important role • Partnerships between semi vendor and outsiders will play an important role • Tools vendors will be slow to adopt any approaches/technologies not supported by the industry, their own customers and standards • Pump-priming by semi vendor is likely to be necessary, through both funding and technology transfer

Customer Issues: SW Design, Partition & Tuning • Customers do not have years of experience designing and developing multicore software • How to partition tasks across cores: • Made more interesting in the presence of heterogeneous cores and of accelerators • Dynamic or static load-balancing? • Fault tolerance? • Static or dynamic system configurations? • Tuning the software? • New knowledge is needed, along with carefully-targeted modelling/simulation tools

Key Silicon Vendor Business Issues Adopting wrong/inappropriate standards Being perceived as missing the bandwagon Pressure to revamp product lines New roadmaps needed New vendor/3rd party toolchains needed New vendor/3rd party modelling/simulation tools needed

Silicon Vendor Business Issues: Standards • Standards can be a nuisance • Standards wars - backing the wrong side can cause customer perception issues, as well as wasting time, money and resources • Avoiding action until standards become established can indicate a lack of commitment to customers, in the absence of visible pro-active measures being taken • Standards in such an IP-target rich environment as multicore can be a double-edged sword - valuable created IP may need to be “given away” just to get silicon to be usable • New standards implies new tools, new training, new app notes, new problems - someone has to pay for all this; and someone has to put the effort into driving third-party toolchain vendors into action.

Silicon Vendor Business Issues: Inaction • It’s well-known that it’s a multicore world - what are you doing to demonstrate that your company is on top of the implications? • It’s easy to generate a belief among customers and in the industry that a vendor doesn’t “get” multicore and all its implications • A technology roadmap indicating how a customer will be able to leverage new multicore technologies without abandoning legacy software/IP seems necessary - or at least highly desirable • The roadmap needs to cover both silicon and software • Inaction allows competitors to create a monopoly in valuable new IP

Silicon Vendor Business Issues: Roadmaps • Multicore is new and exciting and scary - where’s the roadmap? • Interplay of hardware, software, technology, positioning - need to choose a direction through the minefield • Need to strike a reasonable balance between short, mid-term, long-term investments • Need to be able to tell the story behind the plans • Need to inspire third-party vendors to track and support roadmap • Need to strike partnerships with competitors and partners in the industry

Silicon Vendor Business Issues: System Simulation • Customers need to be confident that they can partition software, allocate tasks to cores appropriately and choose the right-sized platform • This implies system-level, multicore-oriented simulation/modeling tools which can model software at multiple levels of abstraction as well as modeling hardware. • Where’s the technology for this coming from? Where’s the prototype? The products? The funding? The support?

Key Silicon Vendor Technology Issues Compiler technology and language extensions Modelling tools and technology Architectural challenges/changes Architectural definition tools Runtime support - RTOS/OS/bare metal Interconnect Architectural extensions

Delving Deeper - Architecture • Pressures on Established Architectures • Multicore implies multithreaded software and interthread communication • Current architectures have appalling context switch times, lack any architected message-passing/communications capability, and bear the area and power burden of exquisitely complex architectures • Current architectures are tuned for computers and do not match the needs of data-movement-intensive, multi-engine systems • Resources wasted in massive SIMD subsystems for which no language support can be made available; I/O is run naked without any MMU support; data movement is not even a decent afterthought • Current architectures assume control is the king - that there is one CPU - while the future probably needs data movement to be the king • Architected, power-efficient, language-accessible, asynchronous, low-latency data-movers seem desirable

Delving Deeper - Language • Pressures on Current Programming Languages • Multicore implies multithreaded software and interthread communication • Current languages are purely sequential, and rely on libraries or system calls to effect concurrency and communication. This means that there is a huge number of tricky concurrency problems that cannot be found at compile time - language extensions which support concurrency and communication are needed to fix this • The introduction of language extensions such as these will be a fairly long process, with lots of customer and industry involvement and the eventual driving of the extensions into the appropriate language standards • And the underlying concurrency architecture has to be available to current compilers/languages/tools and users as a well-supported library or perhaps Concurrency Abstraction Layer • Language features should allow message-passing to be about as cheap as passing arguments to a function; spawning threads should be about as cheap (in code space and path length) as calling a function • Although to do this properly needs something different from (and simpler than) a vanilla RISC architecture • While message-passing is almost certainly the best technological solution, shared-store cache-coherent systems will continue to flourish (perhaps as small SMP nodes in a larger SoC) - and so locking needs to be safe and efficient. New semantics such as transactional memory may vastly ease this problem - and improve performance - and also call for new language capabilities

Delving Deeper - Tools • Pressures on Design Tools • Multicore implies multithreaded software and interthread communication • There are no effective tools to perform what-if design analysis on partitioning functions across engines, across cores of various capabilities; and so vendors cannot choose appropriate silicon partitioning when designing SoCs, nor can they do evidence-driven design of appropriate architectures and microarchitectures for their engines and cores; and nor can customers partition their systems and choose resource-management strategies to share SoC or system resources effectively • A legacy of concentrating on overly-complex “clock-accurate” models of complex cores is a poor foundation for systems modeling • In designing architecture, it’s important that the proposal be shown to work across a range of microarchitectures, not that it works on a single microarchitecture • A systems modeling tool needs to be able to model software at various levels of abstraction, not just clocked hardware

Delving Deeper - Power • Pressures on Power Management • For some applications, large, complex cores will still be needed • As the use of concurrency matures, many of these “sequential” problems may transmute into efficient concurrent solutions • But meanwhile the availability of programmable swarms of engines will enable new applications and markets • For other apps, a swarm of simpler cores can be much better than one large one • The small cores can contain little but register resources and the computational blocks needed, with little or no supporting infrastructure • No register renaming, completion buffers, complex memory queues, vast branch-prediction structures, large shared global buses, sprawling SIMD computational units.… • Instead, performance will be obtained by having most of the cores doing something useful most of the time • Multithreading is a possible extra, although the simplicity of implementing architectures with little context is more attractive • These cores will dissipate less power per instruction executed than larger cores, and will allow a new dimension of power management • voltage and frequency scaling will still work, as will varying the number of cores being used for the work. This will likely need to be supported mainly through software, which will need appropriate “introspection” capabilities to understand what’s going on in the silicon • This will drive (at least some of) the onchip interconnect to support asynchronous intercore communication, and communications density will play a part in choosing how to dynamically reconfigure the swarm to manage power effectively

Silicon Vendor Technology Issues: Summary • Making efficient, competitive, attractive products which leverage multicore will require many (probably inconvenient) changes, and careful attention to their interplay and how they are presented to customers and the industry at large • Processor architecture changes • Architecture extensions • Embracing heterogeneous cores - within-family and multiple families • Embracing intelligences which are not mainstream processors but still need software tools • Power management has new dimensions, for both good and ill • Providing consistent system API’s from bare metal/1000-engine chips to rich Unix-like OS running on a few processors, perhaps all in the same SoC • SoC interconnect suitable for multiscale network-on-a-chip-like systems • Toolchains which handle heterogeneity in all its glory • Simulation/modeling technologies and tools to allow investigations • Cross-organisational efforts probably needed • … and more • Which of these need to be addressed first and why, and what the metrics for success might be, cannot be specified at this juncture. Decisions need to be driven by: • Planning horizon • Business imperatives • Funding • Resources • ..and commitment from the company to embark on a program and see it through

Strategy: How do we get There from Here? • First, identify the scope of the need and financial resources available • Key inputs are business needs within a chosen, defined time horizon. To be useful there should be 1, 3 and 5 year horizons and these should incorporate good competitive/industry trend projections as a backdrop • Also desirable is sufficient information to support evaluating various what-if analyses to estimate the likely effect on revenues/profit/markets/customers of some reasonable number of possible investment scenarios • With that as background, propose minimal, preferred and maximal technology investment projects, partnership plans (quantified with time, money, people etc); choose one plan; and obtain commitment, funding and resources through the 5 year horizon for the chosen plan • Real plans will need evidence-based refinement over time. Changes to the initial plan do not de facto represent failures of planning or execution.

How Much Will It Cost? • It’s not practical to cost any plan right now, but a sketch may prove helpful • Assume that what is needed technically is several new architectures and associated toolchains from established vendors, along with one new programming language. To do this, a reasonable guess at what investment is needed might be: • An architecture description tool able to drive efficient compiler back-end generation and simulation engines along with the creation of those compilers and simulations: • 5 people over 3 years - 15py • Three new microarchitectures ranging in scale from an ARM7 class machine to a MIPS 24K class machine, all completely synthesisable: 20 people over 2 years: 40 py • Four new application-specific accelerators, all needing software toolchains: • 20 people over 2 years: 40 py • New onchip interconnect family with “interconnect compilers” which select the right variants for a given SoC: • 5 people over 2 years: 10 py • Business unit support for customers and partners, including app note, specs, boards,… • ramping from 5 to 20 people over 5 years: 50 py • Payments to compiler, OS etc vendors to support new technology: • $10M over 5 years • Total: Money - about 150 py plus $10M, or $22.5M+$10M. Call it $35M • This is significantly cheaper than the cost of a traditional “next-generation processor” project and provides IP for product use much more quickly • And the whole exercise provides a remarkably rich environment for the creation of valuable, unique IP suitable for patenting and licensing

Multicore Strategy

Multicore Strategy

Presentation Transcript

Programming Multicore Processors

Multicore Computing - Evolution

Heterogeneous Multicore

Multicore: Commercial Processors

Simple multicore API

Multicore Applications Team

Multicore Design Considerations

Using Multicore Navigator

Multicore Programming

Multicore and Parallelism

Multicore Power Management:

Paving the Way: Multicore and Multi-Multicore

Using Multicore Navigator

Multicore Systems

11. Multicore Processors

MULTICORE PROCESSOR TECHNOLOGY

Multicore Applications Team

MultiCore ATPG

KeyStone Multicore Navigator

Understanding Multicore Cables: Enhancing Knowledge of Multicore Cables

Source Multicore

MultiCore Processors