email archiving
Skip this Video
Download Presentation
Email Archiving

Loading in 2 Seconds...

play fullscreen
1 / 21

Email Archiving - PowerPoint PPT Presentation

  • Uploaded on

Email Archiving. Arvind Srinivasan Gaurav Baone. Imagine this is what happens to your business records at the end of every month …. SEC 17a-4. FDA 21 CFR 11. NASD 3010, 3110. DoD 5015.2. HIPAA. Sarbanes-Oxley. If this looks absurd …. That’s exactly what we do to email!.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Email Archiving' - verne

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
email archiving

Email Archiving

Arvind Srinivasan

Gaurav Baone

Imagine this is what happens

to your business records

at the end of every month ….

SEC 17a-4

FDA 21 CFR 11

NASD 3010, 3110

DoD 5015.2



If this looks absurd …

That’s exactly what we do to email!

Practically every major transaction, project, and contract, is recorded in email

Regulators now treat email like hard copy records

And the courts agree (FRCP, Dec 2006)

Non-compliance fines and legal liabilities are rising . . .

ZipLip, Inc.

just how much scalability does archiving require
Just How Much Scalability Does Archiving Require?


Employees averaging 70 mails/day


Years Retention


Billion Emails For Archive System To Index & Search


Billion Web-Pages Indexed by

source: Google Press Release, Feb 17, 2004



Functionality needs to scale to these volumes

  • Email Capture Methods
  • Business Drivers
  • Archive Functionality
  • Retention & Deletion
  • Surveillance & Compliance
  • E Discovery
  • Conclusion
email capture methods
Email Capture Methods
  • Active Capture Methods – PRO-ACTIVE Archiving
    • Journaling
    • Mailbox crawling
    • SMTP Gateway Capture
  • Historical Capture Methods – REACTIVE Archiving
    • Restore from backup tapes
    • Crawl for PST / NSF files from desktops
    • Forensic captures
primary business drivers regulations and laws
Primary Business Drivers - Regulations and Laws

SEC 17a-4

NASD 3010

Gramm-Leach-Bliley Act


Hedge Funds Rule 203(b)

Basel II

CA SB1386

Sarbanes-Oxley Act

Mutual Funds Rule 38a-1

NASD 3011

Investment Advisors Act

UK Freedom of Information Act

US Freedom of Information Act


Florida Sunshine Law


Japan Personal Information Protection Act


functional requirements
Functional Requirements
  • Retention
  • Surveillance and Compliance
  • e Discovery
    • Common Theme - Classification
retention deletion
Retention & Deletion

Conflicting Requirements:

  • Laws & Regulation => Retain for “x” years.
  • Vs
  • Company Liability/Risk and Cost
  • Real-time Categorization of Mail
  • Sender/Recipients
  • Content (Subject, body, attachment)
  • User Input (Which folder it was found, Manual Tagging)
retention deletion cont d
Retention & Deletion (cont’d)
  • "a priori" and "a posteriori“ based Retention.
  • Event Driven – Deletion of mail from user folder, Reclassification of mail by end user
  • Legal Hold – Court Orders to retain evidence relating to certain subject matters.
  • Single Instance Storage
  • Same Email in Multiple Mailboxes
  • Same Attachment in Multiple Emails
  • Significant storage savings.

Conflicting Requirements:

  • Regulation require review of documents
  • Vs
  • Effort spent into reviewing the documents.
  • Real-time Flagging of Mail
  • Lexical Based – Key words, word associations, wild-cards
  • Policy Based – Eg. Mail from is newsletter.
  • Custom Code – Detect Vacation Response, Read Receipts, DSN’s
surveillance cont d
  • Real-time Flagging is a categorization problem
  • Current Systems suffer from lot of false positive.
  • Transparent and Deterministic rules preferred over Blackboxes.
  • Disclaimers (Internal and External) tend to get flagged as it contains the very terms that we try to flag.
  • Use Reviewer feedback to adapt the rules.
e discovery

Conflicting Requirements:

  • Produce electronic docs. to satisfy court-orders
  • Vs.
  • Providing insufficient, not relevant, privileged Information
  • Discovery Request
  • Certain number of custodians
  • Date Range
  • Pertaining to certain subject matter; usually described by a set of Search terms.

┼ Source: Williams v. Taser Int’l, Inc., 2007 WL 1630875 (N.D. Ga. June 4, 2007)

e discovery cont d
  • Landmark case Zubulake vs. UBS Warburg (2003)
  • Primarily driven by Federal Rules of Civil Procedure (FRCP) established in 2006.
  • Litigants are entitled to obtain electronic information from the adverse party.
  • Voluntary Initial Disclosures need to be made pertaining to each litigant
  • Today, almost all cases have some sort of electronic documents as evidence.
e discovery cont d1
  • Parties face Sanctions if they do not provide all the relevant documents.(Numerous precedence, eg. Metrokane vs Built NY 2008). Validation occurs when receiving party can prove existence of other document through hard-copy printout or other means.
  • Lawyers from both parties routinely negotiate keywords to define Search Concepts
  • Manual Review of Documents for Relevance and Privilege. Numerous product cluster similar documents (near deduplication) to present similar documents to reviewers to improve efficiency.
  • Chain of Custody – To prove that the document has not be tampered or altered.
palin s e mail at 15m per request
Palin’s e-mail at $15m per request
  • NBC's price quote for e-mails sent to Todd Palin: $15 million.
  • AP's price quote for e-mails between state employees and the campaign headquarters of Sen. John McCain: $15 million.
  • AP's price quote for e-mails between state employees and the National Park Service: $15 million.
  • Most challenges in archiving can be reduced to Classification problem.
  • Segmentation Problems: Detect internal and external disclaimers
  • Detect change in Email behavior through email profile analysis
  • Understanding mails: Need to develop Analysis techniques to understand the contents
  • Visualization and Grouping Similar mails – Control the order in which mails and documents are viewed.
  • Consistent way of defining Subject Matters – Beyond just a set of keywords.
  • Extract more meta data about attachments such as images, audio and video files.
  • And all the above are required in muliple languages – English, Japanese, Spanish, Chinese, and others.