html5-img
1 / 24

AAA which application form ?

AAA which application form ?. R. de Simone INRIA EPI Aoste semi-technical considerations. A dequation A lgorithme - A rchitecture A llocation A pplication / A rchitecture. m apping comprises: spatial allocation: distribution, placement (of tasks on resources)

fairly
Download Presentation

AAA which application form ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AAAwhich application form ? R. de Simone INRIA EPI Aoste semi-technical considerations

  2. AdequationAlgorithme-ArchitectureAllocation Application / Architecture mapping comprises: • spatial allocation: distribution, placement (of tasks on resources) • temporal allocation: scheduling original formulation: Yves Sorel (1988) since then rephrased as Platform-based design, Y-Chart approach,… Application Architecture mapping adequation

  3. AdequationAlgorithme-ArchitectureAllocation Application / Architecture mapping comprises: • spatial allocation: distribution, placement (of tasks on resources) • temporal allocation: scheduling architecture trends: • networks of processors (multicore, GPGPU/MIC, many-buzz) • communication bandwiththe issue, on-chip networks…  spatial routing, temporal arbitration, what else ? Application Architecture mapping

  4. AdequationAlgorithme-ArchitectureAllocation Application / Architecture what kind of application description models ? • where concurrency and timing could be extracted, made explicit • Under favorable conditions, could be performed at compile time (static, time-predictable) • Dimensioning should be feasible to optimally fit the architecture (cache sizes, PRET notions,…) • (Existing theories developed in neighboring domains could be combined and adapted GOAL is to refine the application so it reflects the architecture Application Architecture mapping

  5. Draft example: Mapping Proc1 Bus Proc2

  6. Mapping = spatial distribution/routing + temporal scheduling Proc1 Bus Proc2

  7. Applications: the scope MoCCs Affine bounds nested loops explicit distribution, concurrency extraction Process Networks explicit scheduling, time constraints extraction Syn/Poly-chronous

  8. Applications: the scope MoCCs Affine bounds nested loops • Exercice: count the communities (do not forget Model-Driven Engineering for model transformations, and Classical or Real-Time Scheduling) explicit distribution, concurrency extraction Process Networks explicit scheduling, time constraints extraction Syn/Poly-chronous

  9. Motivations for the joint use of these models • Similar restrictions !! • little (in fact no) concern for data values • control largely data-independent (uninterpreted functions) • finite-state control property • role of conflict-freeness as functional determinism • Formal models and mathematical analysis • Issues tacked at compile time but… • Largely developed independently (even if in neighboring groups) • Lack of common vocabulary, or of common framework, or common drive (risks to loose AAA grade ?) …Or is it just me who needs to go back to school ?

  10. Process Networks • Data-flow: computations triggered by data availability • Pure Data-Flow: Marked Graphs, SDF extension • w/ data route switches: Boolean DF, CycloStatic DF, Kahn PNs • Conflict-freeness: all computations amount to same partial order, onlydifferentscheduling/time assignments • ASAP schedule: provides best throughput • correctness issues: safety and liveness • Optimization issues: optimize buffer sizes whilepreservingthroughput Second-levelsemantics: computations according to schedules (activation conditions)  Representation as polychronous descriptions

  11. Explicit timing for Process Networks • Classical scheduling of Marked Graphs, Synchronous Data Flow graphs • Latency-Insensitive Design • Ultimately k-periodic schedules Represented with • N-synchronous formalisms • Signal and affine clock calculus Process Networks Syn/Poly-chronous

  12. Ultimately k-periodicscheduling [00100] (01011) 1,1 2,1 1 f bf (11010) (10110) 1 (10101) 1,1 1 3 1 1 1 ef g (01101) loop pause; emitclk; pause; pause; emitclk; pause; emitclk; pause; endloop (11010) 1,1

  13. Explicit routing • Regular switching patterns in BDF, CSDF, KPNs • Extending scheduling with similar expressivity (ultimately periodic infinite binary words) • KRGs (K-periodically) Routed Graphs • Axiomatics for an equational theory of communication re-wiring • Allows to consider sharing of communications (on common interconnect structures • Interplay of scheduling and routing (arbitration between communications using the same channels) Process Networks Syn/Poly-chronous

  14. Interconnect modeling and optimization • On-Chip Networks • Switching conditions of Select/Marge nodes can configure different communication paths, possibly overlapping in time • Predictable routing schemes (ultimately k-periodic) will match the temporal schedules obtained in classical Process Network scheduling theory C2 Select node C1 C3 Merge node C4

  15. one possible routing configuration 1 0 1 0 C2 0 1 0 1 C1 C3 1 0 1 0 C4 0 1 0 1

  16. another possible configuration 1 1 1 1 1 1 C2 0 0 0 0 0 0 0 C1 C3 0 0 0 0 0 0 C4 1 1 1 1 1 1

  17. Nestedloopswith affine bounds • Parallelcompilation models • static control (regular indices, not while (data_cond) loops) • Iterationspace(multidimensionalarrays, polyhedra) • Source-to-source transformations (rewrite as program, onlywith DOSEQ and DOPAR atvariouslevels) • Improving data locality • Variants • Systems of Uniform Recurrence Equations (SUREs, MMAlpha) • Geometrical description formalisms (Array-OL)

  18. Example + Reduced Data Graphs Affine bounds nested loops 2 1 Process Network loop i = 1 to N loop j = 1 to N a(i,j) := a(i-1,j-1) + a(i, j-1) end end dependence levels 0 1 1 1 DOSEQ j = 1 to N DOPAR i = 1 to N a(i,j) := a(i-1,j-1) + a(i, j-1) end end direction vectors

  19. FromNestedLoops to Process Networks Affine bounds nested loops • Existing efforts: basic ideais to assign one computingnode to eachassignment • COMPAAN, Pico Express • Tiling, chunking • Stefanov, et al, PolyhedralProcess Networks • Multidimensional SDF for Array-OL/Gaspard Process Network

  20. Dream (or nightmare?) • How can one produce Process Network descriptions from nested loops (or better said, SUREs or RDGs) • Of course limited, but very often already polyhedra (for bounds) separated from dependency graphs (for computations) • Needs to split clearly between (sequential) iterations and parallel ones (currently expanded) • Issue of potential (application) vs real (architecture) concurrency/parallelism j 4 a(i,j) a(i-1,j-1) a(i,j-1) 8 i

  21. Dream (or nightmare?) • Greenish nodes provide constants (in parallel at the bottom, sequentially on the left) • Blue nodes each compute 4 values in parallel, shifts the last right across the border Lack of generality certainly when values to be sent across boundaries are not ordered as expected at target j 4 1 1 a(i,j) a[1..4,0]) 0.(1) 0.(1) 3 3 a(i-1,j-1) a(i,j-1) 4 4 a[1,1..4] a[1,5..8] 8 i

  22. Challenges of it • Not entirelyclear (to me at least) how data locality and transfer (if any) isdealtwith in differentworks • Typical: a data block istransferedin someorder (row), and consumed in differentorder (column)

  23. SoC/NoC intelligent routers ? • To match (and beprogrammedfrom) the application, routers • shouldbe able to fork the data, • Shouldonly care about directions (not single target) • Should let ccommunications cross one another if not usingsamelines • Shouldbe able to buffer values (at least in the size of the processor array) P Core P Core P Core P Core P Core P Core P Core P Core P Core P Core P Core P Core

  24. ThAAAnk You A ThAAnk You Questions anybody ?

More Related