Incident Response in EGEE - Creating a Capability/Service for Effective Incident Management

MWSG2 June 16, 2004 www.eu-egee.org JRA3 - Incident Response General IssuesYuri Demchenko<demch@science.uva.nl> EGEE is a project funded by the European Union under contract IST-2003-508833

Outlines Goal and motivation Incidents and Incident Response – Definitions Creating Incident Response Capability/Service Incident Response in EGGE Possible steps - Discussion

Goal and Motivation The goal of this presentation is to introduce into and rise awareness about the Incident Response problem area • How to create an Incident Response Capability? • What to respond? • What standards and practices to follow? • What may be the first steps?

Incident Response – Definitions • Incident • Specifics of perceived Grid Incidents • Incident Response • Incident Response vs Intrusion Detection

Incident • A computer/ITC security incident is defined as any real or suspected adverse event in relation to the security of a computer or computer network. Typical security incidents within the ITC area are: a computer intrusion, a denial-of-service attack, information theft or data manipulation, etc. • An incident can be defined as a single attack or a group of attacks that can be distinguished from other attacks by the method of attack, identity of attackers, victims, sites, objectives or timing, etc. • An Incident in general is defined as a security event that involves a security violation. This may be an event that violates a security policy, UAP, laws and jurisdictions, etc. • A security incident may be logical, physical or organisational, for example a computer intrusion, loss of secrecy, information theft, fire or an alarm that doesn't work properly. A security incident may be caused on purpose or by accident. The latter may be if somebody forgets to lock a door or forgets to activate an access list in a router.

Incident – any specifics for Grid? • Depends on the scope and range of the Security Policy, ULA, or SLA • Should be based on threats analysis and vulnerabilities model • Should be based on Grid processes/workflow analysis • Is there a definite model and clear vision of these processes? • LCG definition of the Grid Job/Task submission • Job submission will normally progress from a User Interface (UI) machine, through a Resource Broker (RB) to a Computing Element (CE) and hence to the compute resource (usually a batch system). In some cases the RB is not used and the UI submits the job directly to the CE. Data access is through a Storage Element (SE) service • Q: Should we distinguish between Incidents with the Grid applications and processes and those with the underlying infrastructure? • Who will handle either of them?

Grid risks and threats analysis • LCG Risk Analysis – is a good starting point • http://proj-lcg-security.web.cern.ch/proj-lcg-security/RiskAnalysis/risk.html • Classified by Misuse, Confidentiality and Data integrity, Infrastructure disruption and Accidental categories • Known analyses of Grid Security Incidents nature mostly focus on vulnerabilities of AuthN/Z and Certificate compromise • E.g., Dane Skow’s “A walk through a Grid Security Incident” • However, question remains: • How to define at early stage that PKC compromised?

Incident response Incident response includes three major groups of actions/services • Incident Triage • Assessing and verification incoming Incident Reports (IR) • Incident Coordination • Categorisation Incident information, forwarding IR around and arranging interaction with other CSIRTs, ISPs and sites • Incident Resolution • Helping a local site (victim) to recover from an incident - in most cases offered as optional services.

Incident Response and Intrusion Detection Intrusion Detection normally is a component of the network infrastructure/services Intrusion Detection Systems (IDS) or Sensors are installed on or close to Firewalls, Routers, Switches or run as a special program on logfiles ID produces alerts to prevent suspected activity escalation to Incident ID is rather proactive service Incident Response is a complex of designated people, policies and procedures Incident Response is a reactive function Q: Do we need to tackle Intrusion Detection in JRA3? ID/Network protection is a responsibility of Network Operator or Team May be outsourced to network provider or hosting organisation CSIRT often has an influence on network security policy and IDS policy/criteria

Incident Response Infrastructure/Components • CSIRTs • Organisational form depends on type of organisation and required level of support to community • Security Policy • Define what is required/allowed/acceptable • Incident Response Policy • What is provided, who receives it and who provides support • Incident Response Plan • Which incidents will be responded and how • RFC 2350 – defines template for Incident Response Policy

Types of CSIRTs • Security Group • Not formally a CSIRT but may be a first step to create a CSIRT • Distributed (Internal) CSIRT • Has well defined constituency, central office and (minimum) designated staff • Most of staff is sharing responsibility or on duty • Maintains common Security and Incident Response policy • Publish Advisories, Warnings, Reports, Recommendations • Coordinating CSIRT • Coordinates wide range of Incident Response activities • Creates and maintains common Security and Incident Response policy • Publish Advisories, Warnings, Reports, Recommendations

Incident Response Policy • Types of Incidents and Level of Support • Ordered by severity list of Incident categories • Co-operation, Interaction and Disclosure of Information • Based on organisation’s Security Policy • Availability of information and ordered list of information being considered for release both personal and vendor’s • Communication and Authentication • Information protection during communication • Mutual authentication between communicating parties • Also depending on information category

Incident Response Procedures Should be documented in full or in critical parts • Initial Incident Reporting and Assessment • Progress Recording • Identification and Analysis • Notification – initial and in the progress • Escalation – by Incident type or service level • Containment • Evidence collection • Removal and Recovery

Incident Response in EGEE • Actual Incident Response will be done at GOC • By Security Groups or Internal/External CSIRTs • Incident Coordination for EGEE • Coordinating Central or Distributed CSIRT servicing EGEE infrastructure • To start this activity • Inventory and Taxonomy • Contacting GOC/sites and building awareness • Training and Education • First CSIRT Training workshop at 2nd EGEE (or even around GGF12?) • Establishing central EGEE coordinating CSIRT • Staffing • Defining policies and procedures, formats and forms • Promoting and building network of contacts

What do we have? LCG documents for sites – good starting point and initial framework • Organisation of security on LCG-1 • To implement the LCG-1 security procedures and to respond to security incidents, each LCG-1 Regional Centre and each LCG-1 site must designate a security officer • Rem: Need to be structured according to common CSIRT practices • LCG Security Policy specifies (not detailed) • Physical Security • Network Security • Access Control • Rem: Refers to site Policies but are they defined?

Standards and Practices • Incident Response and Incident Handling • Standards and Recommendations on Incident Response procedures and CSIRT operation • IETF, NIST, TI/TF-CSIRT (TERENA), CERT/CC • Formats and Protocols • IDMEF – Intrusion Detection Message Exchange Format • IODEF – Incident Object Description and Exchange Format • Emerging RID – Real-time Internetwork Defense (supported by US AFC) • Trace Security Incidents to the Source • Stop or Mitigate the Effects of an Attack or Security Incident • CSIRT community and CSIRT certification • Important component of creating world-wide Incident Response infrastructure

Tools • Intrusion Detection automation • Snort with IDMEF support (by Silicon Defense) • Benefits in simple integration, information exchange and easy outsourcing • Implemented also by CERT/CC in their AirCERT distributed System • Incident Handling • Mostly proprietary systems with growing move to standardisation of exchange format based on IODEF • IODEF Pilot implementation • CERT/CC AirCERT Automated Incident Reporting - http://www.cert.org/kb/aircert/ and http://aircert.sourceforge.net/ • JPCERT/CC: Internet Scan Data Acquisition System (ISDAS) - http://www.jpcert.or.jp/isdas/index-en.html • eCSIRT.net: The European CSIRT Network - http://www.ecsirt.net

Summary – next steps • Inventory and Taxonomy • Contact with GOC/ROC • Decide on organisational structure for EGEE Incident Response Capability/Infrastructure • Prepare 1st CSIRT Workshop

Incident Response in EGEE - Creating a Capability/Service for Effective Incident Management