1 / 25

Research at the Computer Engineering Laboratory of Delft University of Technology

Research at the Computer Engineering Laboratory of Delft University of Technology. Ben Juurlink. Outline. General Information Group Location Group Formation Group Funding Group Interests Group Projects Molen -Iliad MOVE Pamela PUB library Concluding Remarks.

tauret
Download Presentation

Research at the Computer Engineering Laboratory of Delft University of Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research at theComputer Engineering Laboratory ofDelft University of Technology Ben Juurlink

  2. Outline • General Information • Group Location • Group Formation • Group Funding • Group Interests • Group Projects • Molen • -Iliad • MOVE • Pamela • PUB library • Concluding Remarks

  3. Delft University of Technology Aerospace Engineering Applied Sciences Architecture Civil Engineering and Geosciences Design, Engineering and Production Information Technology and Systems Technology, Policy and Management Computer Science Electrical Engineering Mathematics Telecommunication Software Engineering Microelectronics Energy Mediamatica Mathematical Analysis Control, Risk, Optimization, Stochastics, and Systems Group Location 7 faculties 13,000 student 2,100 researchers

  4. Group Formation

  5. Group Funding 94-98 (in Kfl) Total financing: 6000 Kfl

  6. Group Output (‘94-’98) • Degrees: • PhD Theses......................................................................... 9 • Eng. degrees........................................................................ 5 • MSc...................................................................................... 87 • Publications: • Books/Chapters.................................................................... 7 • Journal articles..................................................................... 47 • Conference papers............................................................... 165 • Patents................................................................................. 50 • Five start-ups

  7. Computer Engineering • Computer Engineering: Analysis of data processing requirements for electronic data processing units and systems and the design (synthesis) of their architecture, implementation, and realization • Architecture: Determine the function to perform • Implementation: Establish a method to achieve the function • Realization: Use available means to materialize the method

  8. Computer Engineering Interests

  9. Group Projects MOLEN : Embedded system architecture, multimedia, Java. MOVE : Embedded system synthesis, compilers, hardware software co-design. PAMELA : Performance analysis and languages. D-ILIAD : Computer architecture, implementation, computer arithmetic, switches.

  10. MOLENEmbedded System Design • Topics: • Embedded Processor Architectures • Multimedia • Java • Embedded System Tools • Embedded Agents • Current Contributions: • Java Processor • Multimedia Instructions • Specialized Units • FPGA Units • Future Directions: • Reconfigurable embedded processors

  11. Molen Multimedia Instructionand Functional Unit • Published at EUROMICRO’98 • Motion estimation, sum of absolute differences: s = 0; for (j=0; j<h; j++){ if ((v = p1[0]-p2[0])<0) v = -v; s += v; if ((v = p1[1]-p2[1])<0) v = -v; s += v; ... if ((v = p1[15]-p2[15])<0) v = -v; s += v; if (s >= distlim) break; p1 += lx; p2 += lx; } • Formula:

  12. Straightforward approach: Compute Ai-Bi for all pairs of pixels Take absolute values Accumulate absolute values Cost: 4 cycles MOLEN solution: Observation: |Ai-Bi| = max{Ai,Bi}-min{Ai,Bi} Problem: determine and negate min(Ai-Bi) takes > 1 cycle Solution: pass min(Ai,Bi) to accumulate stage and correct Cost: 3 cycles Efficient Implementation ofthe SAD Operation

  13. x User intercation x Solution Space x Optimizer x x x x exec. time x x x Architecture parameters x x x x feedback feedback x x x x x x cost Parametric compiler Hardware generator Parallel object code MOVE Semi-automatic generation of application specific processors

  14. MOVE • Current Contributions: • Transport triggered architecture • Operational design framework (add any unit you like, no restrictions) • Several cheap designs (data logger, video-enhancer, MPEG-decoder, wireless communications) • Future Directions: • Tune your application to suit your processor • System design • Multiprocessor TTA • Low-power processors

  15. Transport Triggered Architecture • Published in e.g. Jnl. of Systems Architecture ‘99 • Transport triggered architecture: • Only one instruction: MOVE! • FU operations are triggered by moving data to their input ports • Example: add r1,r2,r3 sub r4,r2,r6 st r4,r1 • TTA code: r2->O1add.alu1; r3->O2add.alu1; r2->O1sub.alu2; r6->O2sub.alu2 Radd.alu1->r1; Rsub.alu2->r4 r1->O1st.ls; r4->O2st.ls • After bypassing: r2->O1add.alu1; r3->O2add.alu1; r2->O1sub.alu2; r6->O2sub.alu2 Radd.alu1->r1; Rsub.alu2->r4; Radd.alu1->O1st.ls; Rsub.alu2->O2st.ls

  16. Analytic Evaluation Architecture Implementation Simulation Evaluation PAMELAPerformance Analysis of Computer Systems • Current Contributions: • Specialized Languages • Simulation Tools & Methodology • Parallel Algorithms • Delft Architecture Workbench • Future Directions: • Complete the Delft Architecture Workbench

  17. Static Branch Prediction • Data dependent branches: for (i=0; i<n-1; i++){ minIndex = i; for (j=i+1; j<n; j++) if (a[j] < a[minIndex]) B minIndex = j; swap(&a[i], &a[minIndex]); } • Oblivious static branch predictor: B will be taken 50% • Bernoulli model with truth probability p (profiling): large variance prediction error • New model based on alternating renewal processes reduces variance prediction error by order of magnitude • Let D (U) = consecutive number of 0’s (1’s) • Then • Example: 110011001100 • Then E[PA] = 0.5, Var[PA] = 0 E[PA] = E[U]/(E[D]+E[U]) Var[PA] = (E[D]2 Var[U]+E[U]2 Var[D]) (E[D]+E[U])2

  18. D-IliadHigh Performance General Purpose Computers • Topics: • Uni & Multiprocessors • Internet Processing • Computer Design • High Speed Switches • Current Contributions: • Instruction level parallel machines (Superscalar, SCISM) • New “Complex” Instructions • New Designs of Arithmetic Processing • New Switch Design • Future Directions: • New Architectural paradigm

  19. Complex Streamed Instructions • See PACT’01, EuroPar’01 • Drawbacks of MMX-like extensions: • Multimedia (MM) register size architecturally visible and fixed. Ways out: • add MM FUs and increase issue width • expensive • increase MM register size • existing codes have to be recompile/rewritten • not beneficial due to small sub-matrices • overhead for converting between packed data types and alignment • Proposed solution: Complex Streamed Instructions (CSI) • two-dimensional vector (stream) architecture, streams of arbitrary length • stream is specified by set of stream control registers • conversion between data types in hardware • no loop control and address generation overhead

  20. The Need for a Parallel Computation Model • Parallel computing has not been very successful • One reason: lack of a standard parallel computation model • Properties that a suitable parallel computation model should possess: • Scalability • Portability • Predictability • Model proposed by Valiant (1990) • Bulk-Synchronous Parallel (BSP) model

  21. BSP Model M M • BSP architectural model • set of p processors communicating by sending point-to-point messages • BSP programming model • computations proceed in phases (supersteps), separated by barrier synchronizations • BSP cost model • superstep takes time w + g · h + L where w: max. work h: max. messages (h-relation) g: bandwidth reciprocal L: latency/synchronization cost P P communication network P P M M barrier sync barrier sync

  22. PUB Library • Paderborn University BSP (PUB) library (IPDPS’99) basics: • SPMD • no receive operation; barrier synchronization signifies end of all communication operations • only non-blocking communication primitives • buffered and unbuffered communication • message is placed in buffer associated with destination processor from which it can be retrieved after the next barrier sync • Additional features: • (non-blocking) collective communication primitives • ability to partition the processors • running different BSP computations on the same system (in different threads)

  23. 11 16 9 14 11 11 13 14 17 18 19 23 21 21 18 17 14 17 17 17 13 19 19 13 PUB ExampleParallel Binary Multisearch • Search butterfly: Proc 0 Proc 1 Proc 2 Proc 3 Local search tree

  24. Parallel Binary Multisearch Using PUB void bin_search(int d, int m){ for (i=new_m=0; i<m; i++) if (query[i]<=gkey[d]&&inRight(d,me) || query[i]>gkey[d]&&inLeft(d,me)) bsp_send(&bsp,Opposite(d,me),&query[i],sizeof(int)); else query[new_m++] = query[i]; bsp_sync(&bsp); for (i=0;i<bsp_nmsgs(&bsp);i++){ msg = bsp_getmsg(&bsp,i); query[new_m++] = (int)(*bspmsg_data(msg)); } if (d==0) local_search(new_m,query,n,key); else bin_search(d-1,new_m); }

  25. Concluding Remarks • Not discussed: • testing • ISA extensions for sparse matrix computations • computer arithmetic using single-electron technology • reconfigurable processors • network processors • low power • ... • For further information, please contact me (benj@ce.et.tudelft.nl) or see • ce.et.tudelft.nl • ce.et.tudelft.nl/~benj • www.upb.de/~pub Thank You

More Related