1 / 39

How to live with low/intermittent bandwidth/connectivity

How to live with low/intermittent bandwidth/connectivity. Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in. Web sites have traditionally served static content But, dynamic content generation has come into vogue

cardea
Download Presentation

How to live with low/intermittent bandwidth/connectivity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to live with low/intermittent bandwidth/connectivity Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

  2. Web sites have traditionally served static content • But, dynamic content generation has come into vogue • generated on the fly by running dynamic scripts, e.g., Active Server Pages (ASP), Java Server Pages (JSP), Servlets • allows generation of different content for the same request Web Content

  3. Ad Component Headline Component Headline Component Navigation Component Headline Component Headline Component Personalized Component Dynamic Web Pages… Web Page A News content site

  4. Generic Architecture wiredhosts sensors Network Network mobile hosts servers Data sources End-hosts

  5. Strong coherency • The client and source always in sync with each other • Strong coherency is expensive! • Relax strong coherency:  - coherency • Time domain: t - coherency • The client is never out of sync with the source by more than t time units • eg: Traffic data not stale by more than a minute • Value domain: v - coherency • The difference in the data values at the client and the source bounded by v at all times • eg: Only interested in temperature changes larger than 1 degree Coherency of Dynamic Data

  6. Generic Architecture wiredhost sensors Network Network servers Proxies /caches mobile host Data sources End-hosts

  7. Proxy registers the data item of interest and the coherency requirement with the server Server pushes interesting changes + Achieves Strong Consistency + Keeps network overhead minimum -- Poor Scalability (has to maintain state and has to keep connections open) -- Low Resiliency Server Proxy User Push Push The Push Approach

  8. Proxy Pulls after Time to Live (TTL) Time To next Refresh (TTR / TNR) + Can be implemented using the HTTP protocol + Stateless and hence is generally scalable with respect to state space and computation Weak cache consistency Heavy polling for stringent coherence requirement or highly dynamic data Network overheads higher than for Push Server Proxy User Pull Push The Pull Approach

  9. Users Typical End-to-end Web Site Architecture Web Server Cluster Application Server Cluster Data . . . .

  10. Web servers • Do well defined and quantifiable local work • e.g., processing HTTP headers, serving static content • Application servers • Run multi-layer programs • e.g., scripts involving calls to backends WS vs. AS

  11. Inside the Application Layer3-tier model HTML • JSP • ASP PRESENTATION Objects • Servlets • COM+ • EJB ADDT’L SERVICES BUSINESS LOGIC Row Set • Commerce • Content Mgt. • Personalization DATA CONNECTOR • JDBC • ODBC Legacy Systems Databases

  12. 1. JSP invokes a Servlet 2. Servlet contacts CMS 3. CMS requests data Inside the Application Layer… Code Block(s) PRESENTATION . . . ADDT’L SERVICES Code Block(s) BUSINESS LOGIC . . . • Commerce • Content Mgt. • Personalization DATA CONNECTOR • JDBC • ODBC 4. DBMS calls storage system Legacy Systems Databases

  13. Computationally-intensive logic executed atmultiple tiers • Cross-tier communication • Object instantiation and cleanup processing • External I/O calls • Database connection pool latencies • Content conversion and formatting Performance and Scalability Issues

  14. JSP • ASP PRESENTATION ADDT’L SERVICES • Servlets • COM+ • EJB BUSINESS LOGIC DATA CONNECTOR • JDBC • ODBC Optimizing the Application LayerTraditional Means • Optimize each tier independently: • Presentation-level caches built inside application server processes • Main memory database employed over persistent DBMS • Persistent object storage techniques employed inside content management systems … and so on Local cache and optimization code

  15. Many application server products offer this feature -- mitigates only local database access latency -- only a subset of query results may be reused in page generation -- page fragments may not all be from databases Query result caching

  16. Caching database tables in main memory Oracle 9i Cache Main-memory databases, e.g., TimesTen -- mitigates only database access latency -- caching at table granularity results in poor cache utilization -- main-memory databases are difficult to integrate and maintain and can be expensive Middle tier database caching

  17. Dynamically generated HTML pages are cached + Can completely offload work from web/app server • Low reusability for highly personalized web pages • URL may not uniquely identify a page -- increasing the risk of delivering incorrect pages • Often introduces excessive invalidations -- e.g., even if a single element on the page changes Page Level Caching

  18. Optimizing the Application LayerIssues • Traditional techniques impact specific components within the application, but not the entire application • No mitigation of component-to-component interaction latencies • Different synchronization and invalidation policies risk data integrity • Each optimization scheme consumes programmer timefor development and maintenance

  19. Key ideas • Re-use program results to eliminate redundant work • Facilitate single-point, architecture-wide optimization • Apply to both • programmatic objects and result fragments

  20. cache Optimizing the Application Layer • JSP • ASP PRESENTATION • Servlets • COM+ • EJB ADDT’L SERVICES BUSINESS LOGIC Enables the results of programs to be re-used. • Commerce • Content Mgt. • Personalization DATA CONNECTOR • JDBC • ODBC Legacy Systems Databases

  21. Code Block(s) PRESENTATION . . . 1. JSP invokes a Servlet 2. Servlet contacts CMS ADDT’L SERVICES Code Block(s) BUSINESS LOGIC . . . • Commerce • Content Mgt. • Personalization 3. CMS requests data DATA CONNECTOR • JDBC • ODBC 4. DBMS calls storage system Databases Usually…. Legacy Systems Plus, at each step there are communication delays and logic processing delays

  22. Novel Solution… Can store any program output, but is most commonly an HTML fragment or a Programmatic Object. Appl. Programming Interface Chutney tags Real-time storage engine Code Block(s) PRESENTATION . . . Result Parameter(s) Function Code Block(s) BUSINESS LOGIC . . . Tags trigger calls to the storage engine. When the Result of a Function with a specific Parameter set is already known (and up-to-date), the work normally necessary to produce that Result is bypassed. DATA CONNECTOR • JDBC • ODBC

  23. Code block Application logic Code block Database calls HTML formatting Code Blocks Perform Work Page generation script Write to Out Write to Out . . . . . .

  24. Code block Code block Code Blocks <-> Components Page generation script Web Page Ad Component Write to Out Headline Component Headline Component Navigation Component Headline Component Headline Component Write to Out . . . Personalized Component (Example: News content site) Certain components can be cached

  25. Start tag End tag DCA: Our Solution Page generation script Code block Request Dynamic Content Accelerator Code Block Output Application logic Code block Work bypassed Database calls HTML formatting . . .

  26. Users • A single instance of the DCA serves a rack of application servers • Application servers communicate with DCA through a lightweight API DCA in a Typical End-to-end Web Site Architecture Web Server Cluster Application Server Cluster Data Dynamic Content Accelerator

  27. A critical aspect of any caching solution • DCA supports novel cache management strategies: • Prediction-based cache replacement • Observation-based cache invalidation Cache Management

  28. Site Graph News Sports Hockey Schedules Scores Players Teams Cache Replacement • Prediction-based replacement • fragments having lowest probability of access replaced • Least-Likely-to-be-Used (LLU) • Access probabilities based on: • Current user navigational patterns over site graph • (in the form of clickstreams) • Historical user navigational patterns over site graph • (in the form of association rules) (News, Sports, Hockey)  Schedules = 20% (News, Sports, Hockey)  Players = 15% LLU (News, Sports, Hockey)  Teams = 10% (News, Sports, Hockey)  Scores = 55%

  29. DCA supports common cache invalidation techniques: • Time-based: Each cache element assigned a TTL • Event-based: Updates to the database send an invalidation message to the cache • On demand: Manual invalidation of selected elements • DCA supports additional invalidation techniques…. Cache Invalidation

  30. Other invalidation techniques supported: • Observation-based • User-initiated updates are observed in scripts; each such update sends an invalidation message to the cache • Most appropriate for auction sites, online trading sites • Invalidation does not require communication with the databases • Keyword-based: • Elements can be associated with keywords; e.g., a retailer may wish to invalidate all “seasonal” items • Regular expression-based: • Elements can be invalidated based on regular expression matching Cache Invalidation…

  31. Test Site • Fictitious online retail site, allows browsing of product catalog • Pages generated using JSP scripts • Site content stored in Oracle database • Database schema based on Dublin Core Metadata Open Standard • Contains 200,000 products and 44,000 categories • Each page consists of 3 components, each involving a database call Performance Study…

  32. Test Setup • Content Database Server: Oracle 8.1.6 • Web/Application Server: WebLogic 6.0 running on cluster of 2 machines • Server machines: have 1 GB RAM, dual P III-933 Mhz processors run Windows 2K Advanced Server Performance Study…

  33. Baseline Parameters: • Cache Size, i.e., percentage of fragments that fit into cache: 75% • Cache replacement policy: LLU • User load is varied by sending requests from client machines running Radview’s WebLoad • Simulated users navigate site according to Zipf 80-20 distribution (i.e., 80% of users follow 20% of navigation links) Testing Methodology...

  34. Performance Impact 80% faster response times through existing application infrastructure Source: Fortune 100 client results

  35. Chutney Throughput Impact 250% increase in transaction rates Source: Fortune 100 client results

  36. Sources Repositories Clients e.g., Akamai Alternative: CDNs Content Distribution Networks Push BasedCore Infrastructure

  37. Increased use of dynamic page generation technologies => increases load on application servers => serious performance and scalability problems for e-business sites • DCA (Dynamic Content Acceleration) => significantly reduces the load on the server side infrastructure, allows e-business sites to scale => significantly outperforms existing middle tier caching solutions Conclusion

  38. IIT Bombay’s aAQUA Community Forum Farmers get information and get their questions answered -- In the local context -- In their local language Capitalizes on existing human and infrastructural resources: Agri-extension center – KVK, Baramati NGO – Vigyan Ashram, Pabal Government – MCIT www.aAQUA.org

  39. Access over low bandwidth:Resource Optimization • Resource constraints • Low/unpredictable bandwidth • => disconnected operation/access • Exploit • caching • prefetching (through prediction of future needs) • Profiling by user type, location • =>offline aAQUA • Data characteristics • Static data – text, images – land records, photos • can be cached/hoarded • Dynamic data – weather/price information • cached info need to be refreshed carefully • Continuous media – VoIP, video data • QoS considerations

More Related