1 / 22

Alexandru V Staicu 1 , Jacek R. Radzikowski 1

Alexandru V Staicu 1 , Jacek R. Radzikowski 1 Kris Gaj 1 , Nikitas Alexandridis 2 , Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University. Effective Use of Networked Reconfigurable Resources. http://ece.gmu.edu/lucite. Problem :. Reconfigurable resources

fineen
Download Presentation

Alexandru V Staicu 1 , Jacek R. Radzikowski 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alexandru V Staicu1, Jacek R. Radzikowski1 Kris Gaj1, Nikitas Alexandridis2, Tarek El-Ghazawi2 1 George Mason University 2 George Washington University Effective Use of Networked Reconfigurable Resources http://ece.gmu.edu/lucite

  2. Problem: • Reconfigurable resources • expensive and underutilized • Many of these resources available • over the network • It is desirable to leverage • networked reconfigurable resources • to help other users within the same • organization

  3. Approach: • Select the most suitable existing Job • Management System (JMS) - identify and define functional requirements - rank known systems according to these requirements - identify which JMS is the easiest to extend • Extend this JMS to recognize and utilize • reconfigurable resources - add new dynamic resources - configure scheduling to be based on these new resources

  4. An existing Job Management System Execution Host 1 Master Host SubmissionHost Task 1 Tasks 1, 2, 3 Task 2 Execution Host 2 Task 3 Execution Host 3

  5. Networked Reconfigurable Resource Management System Execution Host 1 SubmissionHost Master Host Task 1 Tasks 1, 2, 3 Execution Host 2 Task 2 Task 3 Execution Host 3 FPGA boards

  6. Heterogeneous network with FPGA-based accelerators SLAAC Research Reference Platform Sparc 10 Dell HP Dell WILDFORCE SLAAC WILDSTAR Dell WILDSTAR Ethernet Intelligent Hub 100 Mbps Ethernet Intelligent Hub 100 Mbps Myrinet SAN/LAN Switch Dell SLAAC Dell WILDSTAR Sparc 20 Gateway Dell Dell WILDFORCE SLAAC WILDFORCE

  7. Functional units of a typical Job Management System scheduling policies Resource Manager resource requirements available resources Resource Monitor Job Scheduler User Server Job Dispatcher jobs & their requirements resource allocation and job execution

  8. Classification of Investigated Systems (1) Distributed JMS w/o a Central Scheduler Distributed Operating System Centralized JMS • MOSIX • Globus • Legion • NetSolve • LSF • CODINE • PBS • Condor • RES

  9. Classification of Investigated Systems (2) Resource Monitor and Forecaster Parameter Study Scheduler Distributed Computing Interface • AppLES • Compaq DCE • NWS

  10. Operating system, flexibility, user interface RES CONDOR PBS Codine LSF pub com pub/com pub gov Distribution Source code OS Support Solaris Linux Tru64 NT User Interface GUI & CLI GUI & CLI CLI GUI & CLI GUI & CLI

  11. Schedulingand Resource Management RES CONDOR PBS Codine LSF Batch jobs Interactive jobs Parallel jobs Accounting

  12. Efficiency and Utilization RES CONDOR PBS Codine LSF Stage-in and stage-out Timesharing Process migration Dynamic load balancing Scalability

  13. Fault Tolerance and Security RES CONDOR PBS Codine LSF Checkpointing Daemon fault recovery Authentication Authorization

  14. Documentation and Technical Support RES CONDOR PBS Codine LSF Documentation Technical support

  15. JMS features supporting extension to reconfigurable hardware • capability to define new dynamic resources • strong support for stage-in and stage-out • configuration bitstreams • executable code • input/output data • strong support for checkpointing, job migration, and • dynamic load balancing

  16. Ranking of Centralized Job Management Systems (1) Capability to define new dynamic resources: Excellent:LSF, PBS, CODINE More difficult: CONDOR, RES Stage-in and stage-out: Excellent:LSF, PBS Limited: CONDOR No: CODINE, RES

  17. Ranking of Centralized Job Management Systems (2) Checkpointing: Excellent:LSF, CONDOR External mechanisms: CODINE No: PBS, RES Job Migration: Yes:LSF, CODINE, CONDOR, PBS No: RES Dynamic Load Balancing: Yes:LSF, CODINE No: CONDOR, PBS, RES

  18. Ranking of Centralized Job Management Systems (3) Overall suitability to extend to reconfigurable hardware: • LSF • CODINE • PBS • CONDOR • RES without changing the JMS source code requires changes to the JMS source code

  19. Operation of LSF other hosts other hosts Execution host Submission host Master host 3 LIM LIM MLIM Load information 4 2 5 SBD MBD Batch API 11 8 9 Child SBD 1 7 6 queue 12 10 bsub app RES 13 LIM – Load Information Manager MLIM – Master LIM MBD – Master Batch Daemon SBD – Slave Batch Daemon RES – Remote Execution Server User job

  20. Extension of LSF to reconfigurable hardware other hosts other hosts Execution host Submission host Master host 3 LIM LIM ELIM MLIM Load information 4 2 5 SBD MBD Batch API 11 8 9 Status of the board Child SBD 1 7 6 12 queue 10 bsub app RES 13 ELIM – External Load Information Manager ACS API – Adaptive Computing Systems API User job FPGA board 14 ACS API

  21. Conclusions (1) 12 systems evaluated using 25 functional requirements + the suitability of extension to support reconfigurable hardware LSF, CODINE, PBS, and Condor ranked the highest in the functional requirements LSF, CODINE, and PBSPro found easy to extend without changes in their source codes LSF most suitable to support reconfigurable hardware

  22. Conclusions (2) General software architecture of the extended system developed Experimental verification and the measurement of the speed-up of the extended system in progress

More Related