The Condor DB Group Report Jiansheng Huang, Ameet Kini, Shrinivas Lakshmikant, Erik Paulson, Christine Reilly, Eric Robinson, Srinath Shankar, David DeWitt, Jeff Naughton
Overview • General overview of group projects (Naughton). • Quill (Paulson).
Condor DB Group • Overall task: • Focus on data management aspects of Condor • Deliver prototypes of useful technology • Explore, develop and evaluate technology that may be useful to Condor down the road.
Projects other than Quill • Provenance in a Condor System. • Statistical mining of log data to evaluate system health. • Interaction of user data placement, caching, and workflow job scheduling. • Job-machine matching in DB context. • Condor functionality based on App-Server technology. • Recency and consistency in captured data.
Provenance and Condor • Christine Reilly (email@example.com). • Provenance: information on how data was produced. • Observation: for each user job, Condor can record: • Which version of program(s) was used; • Which version of data was used; • When it was produced; • What system it ran on (hardware, software.) • Questions: • How much information should we gather? • How much burden should we place on the system designer, application programmer, or both?
Debugging through log mining • Srinivas Lakshmikant (firstname.lastname@example.org) • Idea: • Record “events,” logically associated with entities. • E.g., job entities start, get scheduled, run, terminate. • Find which entities have infrequent events. • Find which entities lack frequent events. • Can you use this to detect problems? • Early results suggest yes: finds and pinpoints problems that might not be found otherwise. • How can you increase the accuracy and efficiency over naïve approaches?
Caching,Scheduling,Workflow • Srinath Shankar (email@example.com) • Idea: • Cache input files and intermediate files on disks of pool machines; • Record where these files are cached; • Schedule tasks in a workflow to minimize data fetches/moves • Result: potentially much greater throughput.
Job Matching in a DBMS • Ameet Kini (firstname.lastname@example.org) • Idea: matching looks a lot like a DBMS join. • If machine and job data are already stored in a DBMS, can we or should we use the DBMS to do the matching? • Answer: early results are promising but this is a non-trivial problem.
Recency of Quill Data • Jiansheng Huang (email@example.com.) • Problem: daemons report in at uncontrollable and unpredictable times. • Result: out of date and inconsistent data set. • Can we provide the user with a concise characterization of the recency of the sources relevant to a user query? • Note: surprisingly non-trivial to define what we mean by “relevant” in this setting.
App. Servers and Condor • Eric Robinson (firstname.lastname@example.org) • Idea: applications servers provide a lot of technology that appears useful in a Condor setting. • Approach: build prototype of some Condor functionality using these tools, evaluate the approach.
Moving on… • Further questions on these projects? Best bet is to contact student listed on each slide. • On to Quill portion of talk.
The Condor Quill The Quill Developers “Give me a condor's quill! Give me Vesuvius' crater for an ink stand. Friends, hold my arms! For in the mere act of penning my thoughts of this Leviathan, they weary me. . . To produce a mighty book you must choose a mighty theme.” -Melville, Moby Dick
What is Quill? A non-invasive method of storing a read-only version of the Condor operational data in a relational database.
Quill: In pictures SchedD SchedD DBMS QuillD Job queue transaction log (job_queue.log) Job queue transaction log (job_queue.log) Disk With Quill Without Quill
Quill: Where we’ve been • First shipped in 6.7.11 (Sept 05) • Now “over the fence” – Condor Team is driving the 6.8 version • Response from users very helpful! • Lessons learned • Passive collection good • DBMSes are full of surprises
Quill: Where we’d like to be • Shared databases • Better job data • Data from non-job sources • More than just PostgreSQL DBMS • Examples of usage
Quill in Condor 6.9.3 • Development effort mostly complete • Previous bullet points addressed • Migration path for historical job data • Out of the box changes for Quill users: • Horizontal and vertical schema for active jobs • Jobs from multiple schedds in one database • By default, no new historical data stored
Example tables Horizontal Job Table Vertical Job Table
More job information • The lifecycle of the job would be nice to have • Events like those in the “user log” • But, need more info than what’s in the job queue • Passive data collection works
Quill 6.9.3 diagram DBMS SchedD QuillD Job queue.log event log (new) Disk • Schedd writes events to the new “Event” log, Quill daemon passively picks up the events and inserts them into the database. • For the schedd, event log contains userlog events and job history events
Examples • “Show me all the jobs that exited with a segfault that at some point ran on this machine” • “When my jobs get preempted, how long until they get matched again?” • “What is the average runtime for jobs for each different type of input file” • SQL “GROUP by”
Collecting non-job information DBMS SchedD StartD QuillD Negotiator event log (new) Disk
New information stored • StartD: Machine status • Negotiator: Matches made • Starter/Shadow: Files transferred • Collector: “Submitter” ads • All daemons: Generic Events, daemon ads
The DBMSD • New daemon responsible for database housekeeping • Only one needed per DBMS • Purges old data • Three classes, independent thresholds • Resource: Machine classads • Run: matches, job log events • Job: condor_history information • Estimates size of database • “Soft quota”, warn when exceeded
Multiple DBMS systems • Oracle supported • Appears to need less maintenance • A nearly unified schema • Main difference is large text fields • Same binaries, DBMS type selectable via configuration file
Example Usage • PHP web front end • Good enough for some people • Or, use as the basis for your own system • BoF on Thursday at 11:00am • We’ll use the web front end to explain the information Quill now stores