220 likes | 226 Views
Maintaining Individual Traceability in Shared Project Accounts with CEDPS/VDT Tools Shreyas Cholia < scholia@lbl.gov > NERSC Division, Lawrence Berkeley Lab Open Science Grid All-Hands Meeting Caltech-LIGO, Livingston, LA March 2009. Overview. Motivation and Requirements Solution Overview
E N D
Maintaining Individual Traceability in Shared Project Accounts with CEDPS/VDT Tools Shreyas Cholia <scholia@lbl.gov> NERSC Division, Lawrence Berkeley Lab Open Science Grid All-Hands Meeting Caltech-LIGO, Livingston, LA March 2009
Overview • Motivation and Requirements • Solution Overview • Grid Infrastructure Description • Process Accounting Information • Log collection and parsing • Build NetLogger database and reconciling information • Questions
Project Accounts for Collaborative Computing • Project (Group) Accounts enable shared access to compute and data resources for collaboration. • Jobs and data owned by common UNIX project user • Files may persist after individual has left project • Jobs may need to be managed by different users • Allow multiple users to share files and manage jobs, … without relying on group UNIX permissions, … while maintaining individual accountability • Built around standard OSG/VDT grid tools • NetLogger • GSISSH • GridFTP/GRAM • MyProxy
Requirements for Project Account Access • Must maintain individual traceability for all actions performed within project environment • DOE / NIST requirements for individual accountability at NERSC • Should allow both shell and grid based access to project accounts • Users should be able to access multiple project as well as individual accounts • Must include access to data and jobs • Solution should should work across all major NERSC platforms • Should support both OSG and non-OSG communities
Overview of Solution • Use grid certificates to track “real” user performing a given operation. Subject DN in certificate provides the user information. • Limit project account access to • Grid Interfaces • GSISSH • GridFTP • WS-GRAM • Custom login interfaces that record Parent PID • Custom HSI client for HPSS • Custom SSH for login nodes • Collect and parse log files; Reconcile all the information with original user DN using NetLogger.
NERSC CA • All NERSC users are assigned a short-lived certificate through the NERSC CA • Create a short lived certificate # myproxy-logon -s slcs.nersc.gov # grid-proxy-info -subject /DC=gov/DC=nersc/OU=People/CN=Shreyas Cholia 1234
grid-mapfile • NERSC uses grid-mapfile for GSI access • Could be easily extended to use GUMS (phase 2?) • Specify target project in command line • gsissh -l projuser • globus-url-copy gsiftp://projuser@davinci.nersc.gov/testfile file:///localfile • WS-GRAM <localUserID> tag in job spec file • Sample gridmap entry with project accounts: “/DC=org/DC=doegrids/OU=People/CN=Shreyas Cholia” shreyas,projuser,osg
GSISSH • Only supported for local access: • gsissh -l projuser localhost • GSISSH acts like a “sudo” mechanism • User must first log in to NERSC using ssh • This forces user to go through custom sshd with keystroke login • Prevents automatic credential forwarding (disabled for NERSC gsissh clients) so that user credentials are not stored in shared accounts
WS-Gram and GridFTP • User mapped to project account using grid-mapfile • WS-Gram >= GT 4.0.8 logs user DN information associated with job ID • GridFTP includes DN in session information • Logs can be generated in NetLogger format directly by WS-Gram, GridFTP • Pre-WS GT2 GRAM not supported • GT2 GRAM does not support multiple target users using gridmap-auth • VOMS/GUMS could address this for OSG use by keying off FQAN • At NERSC, project user groups are not necessarily in OSG and may not have access to VOMS/GUMS
Process ID Logging • Linux - Comprehensive System Accounting (CSA) for Parent PID tracking - needs kernel mods • Log the process tree on the node • http://oss.sgi.com/projects/csa/ • BSDV3 Accounting • AIX auditing • Provides similar process information.
HPSS Access Project account access is only allowed through one of the following: • GridFTP • GSI authentication • Allows access from outside NERSC • Logs user DN for project account access • Custom HSI client with PID logging • NERSC auth (access restricted to within NERSC - login is automatic from NERSC hosts) • Logs Client PID on server side, which can be traced back as follows Client PID -> PPID -> DN
Special Cases and Caveats • SSH to worker nodes • Users can build custom clients to bypass some of this, but these log entries can be flagged by netlogger • Credential delegation MUST be disabled • Record keeping lifetime? Parent PID not logged until process dies. Currently flag after 24 hours for review.
NetLogger Format • All log lines parsed into key=value pairs • Required Fields for every line: • ts=[timestamp in ISO8601 or secs since epoch] • event=[event identifier in java class notation] • Additionally tag lines with host and client information
Sample log parsing PROCESS ACCOUNTING LOG: grep fogal1 11:05:05 11:05:05 0.11 0.00 0 7643 7641 Mon Jul 14 2008 Mon Jul 14 2008 nl_parser ts=2008-07-14T11:05:05-07:00 event=csa.process level=Info month_start=Jul process.ppid=7641 year_start=2008 tod_start=11:05:05 monthday_start=14 tod_stop=11:05:05 process.pid=7643 cmd=grep pid=7643 year_end=2008 cputime=0.000000 local_user=fogal1 ignore=0 dow_end=Mon dow_start=Mon monthday_end=14 month_end=Jul dur=0 ppid=7641 walltime=0.110000
Logs Collected • Syslogs • ssh, gsissh information • Gridftp logs • gridftp.log • gridftp-auth.log • WS-GRAM logs • accounting.log • container-real.log • Process Accounting/Auditing logs • PBS/SGE/Loadleveler job accounting logs • HPSS HSI Logs • grid-map files
Pre-parsing Logs • Preparsing done on local system • Drop unrelated log lines • Store log lines with dependencies in local temporary database eg. (job acct -> process acct -> gsissh log) • Fill out missing fields where necessary • Create temporary database to hold process information • Process ID tree is only filled out on process conclusion • eg. Cross reference GSISSH DN and PID tables • May not be able to process all records since some records span multiple days • Unprocessed records held • Records > 1 day are flagged • Records without real user are flagged
NetLogger - Parsing and Loading the Database • Stage logs into central collector for parsing • Syslog-NG or rsync • Run NL parsers for all relevant log files • Feed parsed files into NL database • Issue queries against database for useful information select * for real_user where job_id=XXXX • Contributing parsers back to NL source, so other projects can benefit
Current Status and Open Issues • Pilot Project • Being deployed on test clusters • Still requires some level of manual oversight to review flagged entries • Information may not be complete if there is a system crash • Record lifetime
Future Development • Tighter OSG integration (GUMS/VOMS) • Create per user accounting information and integrate with NERSC project accounting
Questions? Thanks to Tina DeClerck, Dan Gunter