Blackbird: Accelerated Course Archives Using Condor with Blackboard Sam Hoover, IT Systems Architect Matt Garrett, System Administrator
Blackboard @ Clemson • End of Semester archives of all online courses in Blackboard since implementation in 2004 • 77 GB Oracle 10.2.0.4 DB tied to a 1.3 TB Content system with over 13 million files • Spring 2010: 4610 active Blackboard courses, 31,372 total courses in Blackboard • Full system backups once a week, nightly incremental backups of entire system
The Archive Problem • Blackboard is a mission critical system • Why is 85.5 hours for archives a problem? • Start of new semester vs. normal operations • Time between semesters is short and getting shorter • Faculty have to wait to set up next semester’s courses • End of semester processes
CRLT archive uses • Grade disputes
The Archive Problem • Blackboard provides a script for executing batch archives given a list of courses as input. • Weekly archive process at Clemson began in Fall 2006 after an accidental deletion of many courses. • Started out splitting the course list into four equal chunks and giving each server ¼ of the total course list. All four servers usually finished within 2 hours of each other, total time for the batch was < 24 hours. • By Fall 2008, archiving the active courses took 85.5 hours, and the servers finished at widely varying times.
The Archive Problem • Who wants to work weekends?
Blackboard archive script • /usr/local/blackboard/apps/content-exchange/bin/batch_ImportExport.sh • Archive/Restore: The Archive Course function creates a record of the Course including User interactions. It is most useful for recalling Student performance or interactions at later time. The archive package is saved as a .ZIP file that can be restored to the Blackboard system at another time. In effect, Archive/Restore acts as a backup tool at the individual course level.
The Archive Problem Potential Solutions?
Potential solutions • Write our own job scheduler? • Could we take advantage of the other 3 (CPUs)? • How do we monitor performance so end user (Blackboard) experience isn’t impacted? • Use a DB to store and manage the queue? • What about security? • Has anyone else out there already done this?
Condor to the rescue? • Job scheduler? Check • Multi-core capable? Check • Manage the queue? Check • Performance monitoring? Check • Security? Check • Has anyone done this before? No
Steps in the weekly archive process • Determine what to archive (active courses, orgs) • Build a course list • Create Blackbird submit files • Submit DAGMan job to Condor • Monitor Condor queue • Receive email notification when all courses have been archived • Look for errors and verify archive integrity
Custom Condor Configuration • DAGMAN_MAX_JOBS_IDLE = 25 • DAGMAN_MAX_JOBS_SUBMITTED = 50 • SLOTS_CONNECTED_TO_CONSOLE = 0 • SLOTS_CONNECTED_TO_KEYBOARD = 0 • ## Force Condor to use Blackboard Private Network • NETWORK_INTERFACE = Private Blackboard Net
DAGMan example JOB UniqueCourseID /path/to/condor/submit/job/file/UniqueCourseID.bbCondor JOB UniqueCourseID2 /path/to/condor/submit/job/file/UniqueCourseID2.bbCondor JOB UniqueCourseID3 /path/to/condor/submit/job/file/UniqueCourseID3.bbCondor JOB UniqueCourseID4 /path/to/condor/submit/job/file/UniqueCourseID4.bbCondor JOB UniqueCourseID5 /path/to/condor/submit/job/file/UniqueCourseID5.bbCondor JOB UniqueCourseID6 /path/to/condor/submit/job/file/UniqueCourseID6.bbCondor SCRIPT POST UniqueCourseID6 /usr/local/CMSIntegration/bin/weeklyArchiveChecker.pl
Condor Submit example universe = vanilla requirements = (OpSys=="LINUX") && ((Arch=="INTEL") || (Arch=="X86_64")) executable = /usr/local/bin/condorSubmitArchive.pl arguments = shoover-S0000BKBRD_401001,/san/weeklyArchives/20091008/ getenv = True log = /usr/local/logs/bbCondorLogs/archive20091008.log notification = Error notify_user = DCIT2803_BB_ON_CALL-L@clemson.edu transfer_executable = False when_to_transfer_output = ON_EXIT queue 1
Blackbird Benefits • Reduced total archive time from > 85 hrs to < 24 hrs • Job scheduling – servers finish about the same time • Zero impact to Blackboard Performance • Automatic suspension/resumption of archives if Load reaches threshold on any core • Email notification upon completion of all archives • Load balancing – archive jobs are distributed as cores become available • Takes advantage of all available CPU cores instead of just one core per server
What did it take to implement? • Have one or more multi-core (CPU) machines • A large amount of shared storage for archives • Choose one machine as your Central Manager • Install and configure Condor on each machine • Automate course list creation (Query DB or Directory) • Automate Condor submit files and Condor DAGMan file creation • Automate the whole thing with cron • Check log files for errors upon archive completion
Where else could I use this? • Any system that does batch processing that can be broken up into many jobs • Recently implemented on our MySQL server to export all of the MySQL databases • Reduced the export time from 10 hours to 3.5 hours on a single, quad core machine
Recent updates • 64 Bit Red Hat 5.4 OS and JVM 1.6 • Maximum (affordable) RAM per machine – 32 GB • Web page to view Blackbird Condor Pool status • Duplicate archives • Error checking logs • Redo any courses with errors or not completed • Major Blackboard upgrade from 7.3 to 9.1 end of June
What’s next? • New machines have 2 x Quad Core CPUs with HyperThreading so Condor sees 16 Cores • Add out of warranty machines to the Blackboard Condor Pool (keep users off of them) • Monitoring of queue (web page) • Use ClassAds to specify architecture and memory requirements for large archive jobs • Write code to query DB and find out what courses have changed, backup any course that has changed on a daily basis • Automate installation and configuration
Please provide feedback for this session by emailing DevConFeedback@blackboard.com. The subject of the email should be title of this session: [Blackbird: Accelerated Course Archives Using Condor with Blackboard]