A secure publisher centric web caching infrastructure
1 / 46

A Secure,Publisher-Centric Web Caching Infrastructure - PowerPoint PPT Presentation

  • Uploaded on

A Secure,Publisher-Centric Web Caching Infrastructure. April 19 th , 2001. Selcuk Uluagac Aravind Pavuluri. Outline. Dynamic Caching Motivation & Gemini Security Issues Incremental Deployment Design & Implementation Performance Conclusions & Discussion. Outline.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' A Secure,Publisher-Centric Web Caching Infrastructure' - payton

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A secure publisher centric web caching infrastructure

A Secure,Publisher-Centric Web Caching Infrastructure

April 19th, 2001

Selcuk Uluagac

Aravind Pavuluri


  • Dynamic Caching

  • Motivation & Gemini

  • Security Issues

  • Incremental Deployment

  • Design & Implementation

  • Performance

  • Conclusions & Discussion



  • Not Finished Yet !! 

  • Active Cache: Caching Dynamic Contents on The Web “ Pei Cao et al.”

  • A Publishing System For Efficiently Creating Dynamic Data “Arun Iyengar et al.“


Dynamic web caching
Dynamic Web Caching ?

  • Content generated on every request

  • Scripting Languages (Perl, CGI, Java,VBScript, etc.)

  • Personalization and E-commerce transactions

  • Presently not cached


Gemini motivation
Gemini & Motivation

  • Drawbacks of Current Cache Infrastructure

    • Incapable of reporting access statistics

    • Not able to handle dynamic content

    • Loss of publisher control over the content

      • Not publisher centric

  • Solution is Gemini..


Key elements of gemini architecture
Key Elements of Gemini Architecture

  • Node (Cache)

  • Security Architecture

  • Incremental Deployment Strategy


  • Control Plane Data Plane

    Consistency control Filtering

    Logging&Reporting Versioning

    QoS Sand boxed VM

    Access Control


Security issues
Security Issues..

  • The need for a new security approach???

    • Active participant caches, not just end-to-end

    • Cache is responsible for reporting logs

  • Design Goals

    • Protect the publisher as well as the cache

    • Publisher decides who to trust

    • Publishers/clients find out about attacks eventually

    • The system should be incrementally deployable


Security background
Security Background

  • RSA (Rivest,Shamir, Adleman)

    • Encryption

    • Public Key  Private Key

  • Public Key Infrastructure (X.509)

  • Digital Signature

    • Verification

  • Certificate

  • Certificate Authority


A new trust model
A New Trust Model

  • Cache Authorization

    • Publishers explicitly specify which content a cache can generate

  • Cache Verification

    • Publishers and clients verify that authorized caches are performing correctly


Authorization content generation steps
Authorization & Content GenerationSteps…

  • PKI provides key distributions to clients, caches, publishers

  • Publisher’s certificate identifies its web site & PK

    • Certificate {P, KP,Valid, Expires, CA}Kca-1

  • Publisher lists authorized caches for an object ??

    • ACL: {URL,K1 K2,.. Kn,,Valid, Expires,P}Kp-1

  • Publisher gives the cache: ACL, {Headers, Body} Kp-1

  • Uses Pragma header field not to confuse legacy caches

  • Cache generates the content using the Body

  • Cache sends client

    • ACL,{URL,Cache,Client,H(Request),CurrDate,Body}Kcache-1


Authorization content generation steps1
Authorization & Content GenerationSteps…

  • Client is able check the signature on ACL and verify the authorization of the cache

    • Client verifies

      • Cache is in ACL & Cache Signature is valid

  • Cache signature’s purpose

    • Tamper detection by client

    • ID of cache generating the content

    • Non-repudiation

  • Cache can perform access control on the content based on the demand of publisher (cookie etc.)



  • Client sends a feedback to the publisher regarding the misbehaving cache

  • Similarly, inconsistencies in cache log reporting can be detected

  • Publisher removes the cache from the ACL list ???

  • When to question cache responses?

    • Publisher initiated (fake clients..)

    • Client initiated


Protecting the cache
Protecting the cache

  • Publishers may send malicious code to caches

  • To prevent that..

    • Publisher’s code runs inside sand boxed JVM

    • Limited API exposed to publisher’s code

  • Resource restrictions using OS level controls to counter denial-of-service attacks


Incremental deployment strategy
Incremental Deployment Strategy…


  • Cache and document


  • Transparency to clients

  • Transparency to legacy


  • Proximity

Leaf Cache


Discovering gemini documents
Discovering Gemini Documents…

  • Publishers explicitly notify Gemini caches about documents that have associated Gemini documents.

  • Notification contains

    • Server name

    • Pattern to match

    • Transformation

  • They’re piggy-backed on HTTP responses

  • Caches store notifications as soft state


Leaf discovery
Leaf Discovery

  • Leaf Cache Gemini cache which translates a request for a regular document into a request for a Gemini document.

  • With security the leaf cache becomes the first cache that both has the proper lookup table entry and is authorized by the publisher



  • Leverages thousands of legacy caches to help deliver Gemini documents

  • Computational burden is pushed as close to the edge of the network as possible.


Node design implementation cont
Node Design & Implementation(cont…. )

  • Platform => On top of Squid

  • Runtime Language => Java

    • Platform independent

    • Allows sand boxing

  • Partitioning of functionality

    • Squid Process

      • Look up table

      • Fetch Gemini Documents

      • Forwarding Gemini requests

    • Gemini Process

      • JVM

      • Security


Node operation
Node Operation

  • Squid front end receives the request from the client

  • Hands the requests to Gemini process via IPC

  • Gemini threads begin to process (Dispatcher,Checker, Worker)

  • The output is signed by the worker thread and sent to client

  • Request is logged


Performance evaluation
Performance Evaluation

  • 5 to 15 times response time degradation for non-active Gemini documents

  • Signing the reply accounts for 90% of processing time


Conclusions discussion
Conclusions & Discussion

  • Gemini addresses the Security issues in Dynamic Web Caching

  • Provides a node implementation

  • Provides a publisher centric architecture

  • End user performance ???


A publishing system for efficiently creating dynamic data
A Publishing System For Efficiently Creating Dynamic Data

Arun Iyengar et al.

IBM Research

T.J. Watson Research Center


Problems with dynamic caching at a first glance
Problems with Dynamic Caching At A First Glance

  • Several Problems With Dynamic Data Generation

    • Expensive to create

    • Overhead

    • Consistent update (we already know this!)

    • More ???


Little fragments
Little Fragments…

  • Fragments

  • Objects

  • Atomic vs. Complex Object

  • Object Dependence Graph(ODG)

  • Dynamic Pages…

    • Embedded fragments

      automatically updated

  • Atomic vs. Incremental Publication

    • Problems ??

    • 3 proposed algorithms


Publishing process
Publishing process

  • Immediate fragments

  • Quality controlled fragments

  • Trigger Monitor’s notified

  • Fetches new copies from source

  • The ODG is updated

  • Graph Traversal algorithms applied

  • Bundles of web pages are written to sink


Sample screen
Sample screen



  • Deployed in 2000 Olympic Games Web Site



  • Easier to design web sites

    • Users specifies and modifies relationships among web pages& fragments

  • Performance improvement

  • Incremental publication

    • Faster with 3 algorithms


Active cache caching dynamic contents on the web

Active Cache: Caching Dynamic Contents on the Web

April 19th, 2201

Selcuk Uluagac

Aravind Pavuluri

Motivation and active cache
Motivation and Active Cache

  • Dynamic documents constitute an increasing percentage of contents on the web

  • Affects the scalability of the web

  • No approaches presently to do Dynamic Content Caching

  • Solution: Active Cache…..


Brief overview
Brief Overview

  • Migrates parts of server processing on each user request to the caching proxy via “cache applets”

  • A cache applet is a server-supplied code that is attached with a URL

  • On a user request the proxy invokes the cache applet

  • Cache applets allow servers to obtain the benefit of proxy caching without losing the capability to track user accesses and tailor the content presentation


The active cache protocol
The Active Cache Protocol

  • Web server specifies association between a cache applet and a URL-named document by sending a new entity header “Cache Applet” with the document

    • CacheApplet: code = “code.class”, archive=“code.jar”, codebase=“codebase.url”

    • For security reasons, codebase of the applet has to has the same server URL as the document.


The active cache protocol cont
The Active Cache Protocol (cont…)

  • Active Cache Obligations

    • If a document is cached, it will either invoke the cache applet or send the request directly to the server.

    • If an applet’s execution fails due any reason, the request is sent to the server

    • If applet’s execution succeeds , the proxy will take the appropriate action based on the return value of the FromCache method

    • Each applet can deposit information in a log object and the proxy will send the log object back to the server.


Proxy decides
Proxy Decides….

  • Whether to cache a document

  • Whether to invoke the applet

    • Cache applet may not process every request for the document

    • Some requests may go the original server

  • What document or applet to evict from the cache at any time


Active cache interface
Active Cache Interface

  • Cache applet must implement the “ActiveCacheInterface”

  • FromCache( user_http_request, client_ip, client_name, cache_file, new_file)

  • Cache Applet can only call the ActiveProxy class to perform its functions

  • ActiveProxy provides methods for file access, cache query, locking and unlocking as well as sending requests to the server


Active cache interface1
Active Cache Interface …

Methods in ActiveProxy

  • Boolean is_in_cache( string url)

  • Public int open(string url, int mode)

  • Public int close(int fd)

  • Public int create(string url, int mode)

  • Public int read(int fd, byte[] buf, int size)

  • Public int lock(int fd)

  • Public string curtime()


Cache applet examples
Cache Applet Examples

  • Logging User Requests

    • Logs eventually sent to the server

  • Advertising Banner Rotation

    • Decides which banner to put according to the specifications

  • Access Permission Checking

    • Applet verifies weather the server signed the document

  • Client-Specific Information Distribution

    • www.my.yahoo.com


Security mechanisms
Security Mechanisms

  • Language-based Protection

    • ActiveProxy class implements the constraints

    • Java built in security measures

    • Prevents illegal access to information belonging to the other web servers

  • Resource Accounting

    • Proxy keeps track of an applets resource consumption in terms of storage size, disk bandwidth,network bandwidth , CPU usage and virtual memory size

    • Set upper limits on resources using setrlimit

    • Prevents Denial of Service attacks



  • Extended the CERN httpd proxy

  • Handles each request in a separate process

  • Makes it easy to set limits on the resources

  • Implements the Active Cache Protocol and the security mechanisms



  • Degrades the performance at least by 50 – 75%

  • Increase in client latency by a factor of 1.5 to 4

  • CPU becomes the bottleneck



  • Active Cache trades local CPU resources for network bandwidth savings

    • $6K - $10K/month for a T1 line vs.

    • $2K for high end Computer with sufficient CPU

  • Improves object hit and byte hit count from 35% and 30% to 55% and 41% respectively