1 / 21

Email Archiving

Email Archiving. Arvind Srinivasan Gaurav Baone. Imagine this is what happens to your business records at the end of every month …. SEC 17a-4. FDA 21 CFR 11. NASD 3010, 3110. DoD 5015.2. HIPAA. Sarbanes-Oxley. If this looks absurd …. That’s exactly what we do to email!.

felton
Download Presentation

Email Archiving

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Email Archiving Arvind Srinivasan Gaurav Baone

  2. Imagine this is what happens to your business records at the end of every month ….

  3. SEC 17a-4 FDA 21 CFR 11 NASD 3010, 3110 DoD 5015.2 HIPAA Sarbanes-Oxley If this looks absurd … That’s exactly what we do to email! Practically every major transaction, project, and contract, is recorded in email Regulators now treat email like hard copy records And the courts agree (FRCP, Dec 2006) Non-compliance fines and legal liabilities are rising . . . ZipLip, Inc.

  4. Just How Much Scalability Does Archiving Require? 25,000 Employees averaging 70 mails/day 7 Years Retention 4.47 Billion Emails For Archive System To Index & Search 4.28 Billion Web-Pages Indexed by source: Google Press Release, Feb 17, 2004 Assume: versus Functionality needs to scale to these volumes

  5. Outline • Email Capture Methods • Business Drivers • Archive Functionality • Retention & Deletion • Surveillance & Compliance • E Discovery • Conclusion

  6. Email Capture Methods • Active Capture Methods – PRO-ACTIVE Archiving • Journaling • Mailbox crawling • SMTP Gateway Capture • Historical Capture Methods – REACTIVE Archiving • Restore from backup tapes • Crawl for PST / NSF files from desktops • Forensic captures

  7. Journaling – 100% Capture

  8. Mailbox Crawling – Policy Based

  9. Reactive Archiving

  10. Not Just Email

  11. Primary Business Drivers - Regulations and Laws SEC 17a-4 NASD 3010 Gramm-Leach-Bliley Act HIPAA Hedge Funds Rule 203(b) Basel II CA SB1386 Sarbanes-Oxley Act Mutual Funds Rule 38a-1 NASD 3011 Investment Advisors Act UK Freedom of Information Act US Freedom of Information Act Canada PIPEDA Florida Sunshine Law FRCP Japan Personal Information Protection Act DoD5015.2

  12. Functional Requirements • Retention • Surveillance and Compliance • e Discovery • Common Theme - Classification

  13. Retention & Deletion Conflicting Requirements: • Laws & Regulation => Retain for “x” years. • Vs • Company Liability/Risk and Cost • Real-time Categorization of Mail • Sender/Recipients • Content (Subject, body, attachment) • User Input (Which folder it was found, Manual Tagging)

  14. Retention & Deletion (cont’d) • "a priori" and "a posteriori“ based Retention. • Event Driven – Deletion of mail from user folder, Reclassification of mail by end user • Legal Hold – Court Orders to retain evidence relating to certain subject matters. • Single Instance Storage • Same Email in Multiple Mailboxes • Same Attachment in Multiple Emails • Significant storage savings.

  15. Surveillance Conflicting Requirements: • Regulation require review of documents • Vs • Effort spent into reviewing the documents. • Real-time Flagging of Mail • Lexical Based – Key words, word associations, wild-cards • Policy Based – Eg. Mail from WallStreetJournal.com is newsletter. • Custom Code – Detect Vacation Response, Read Receipts, DSN’s

  16. Surveillance(cont’d) • Real-time Flagging is a categorization problem • Current Systems suffer from lot of false positive. • Transparent and Deterministic rules preferred over Blackboxes. • Disclaimers (Internal and External) tend to get flagged as it contains the very terms that we try to flag. • Use Reviewer feedback to adapt the rules.

  17. E-Discovery Conflicting Requirements: • Produce electronic docs. to satisfy court-orders • Vs. • Providing insufficient, not relevant, privileged Information • Discovery Request • Certain number of custodians • Date Range • Pertaining to certain subject matter; usually described by a set of Search terms. ┼ Source: Williams v. Taser Int’l, Inc., 2007 WL 1630875 (N.D. Ga. June 4, 2007)

  18. E-Discovery(cont’d) • Landmark case Zubulake vs. UBS Warburg (2003) • Primarily driven by Federal Rules of Civil Procedure (FRCP) established in 2006. • Litigants are entitled to obtain electronic information from the adverse party. • Voluntary Initial Disclosures need to be made pertaining to each litigant • Today, almost all cases have some sort of electronic documents as evidence.

  19. E-Discovery(cont’d) • Parties face Sanctions if they do not provide all the relevant documents.(Numerous precedence, eg. Metrokane vs Built NY 2008). Validation occurs when receiving party can prove existence of other document through hard-copy printout or other means. • Lawyers from both parties routinely negotiate keywords to define Search Concepts • Manual Review of Documents for Relevance and Privilege. Numerous product cluster similar documents (near deduplication) to present similar documents to reviewers to improve efficiency. • Chain of Custody – To prove that the document has not be tampered or altered.

  20. Palin’s e-mail at $15m per request • NBC's price quote for e-mails sent to Todd Palin: $15 million. • AP's price quote for e-mails between state employees and the campaign headquarters of Sen. John McCain: $15 million. • AP's price quote for e-mails between state employees and the National Park Service: $15 million.

  21. Conclusion • Most challenges in archiving can be reduced to Classification problem. • Segmentation Problems: Detect internal and external disclaimers • Detect change in Email behavior through email profile analysis • Understanding mails: Need to develop Analysis techniques to understand the contents • Visualization and Grouping Similar mails – Control the order in which mails and documents are viewed. • Consistent way of defining Subject Matters – Beyond just a set of keywords. • Extract more meta data about attachments such as images, audio and video files. • And all the above are required in muliple languages – English, Japanese, Spanish, Chinese, and others.

More Related