1 / 29

Caching for Sustainability

Caching for Sustainability. Alex Bunch. Agenda. Intro Overview Background Analysis Implementation Future. Intro. Caching is a systems technique to use relatively expensive hardware with special features On-chip SRAM is fast but costs more than memory Memory is faster than disk but…

zea
Download Presentation

Caching for Sustainability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caching for Sustainability Alex Bunch

  2. Agenda • Intro • Overview • Background • Analysis • Implementation • Future

  3. Intro • Caching is a systems technique to use relatively expensive hardware with special features • On-chip SRAM is fast but costs more than memory • Memory is faster than disk but… • Web caching services (like Akamai) have low network latency to end users but can’t scale like datacenters How it works: Caching relies on evidence that some pieces of data are more likely to be accessed

  4. Intro Methods for determining likelihood of access Spatial Locality: Data near data that has just been accessed is likely to be accessed. Temporal Locality: Data that has just been accessed is likely to be accessed again.

  5. 10000 ft. view The principle idea behind this research is that green hosts are a new type of hardware with special features These hosts offer either a service that is entirely run by renewable sources, or they supplement it by purchasing enough renewable energy credits to offset any dirty energy used

  6. 10000 ft. view The idea behind Greenmail is that it acts as a cache for emails that are likely to be accessed and due to the fact that it is a zero carbon service the overall carbon footprint of the user goes down.

  7. 10000 ft. view

  8. 10000 ft. view

  9. Background • On green trends • On green hosting • On greenmail locality

  10. Background One of the fundamental ideas that Greenmail is based on this that people want their services to be green. This idea is validated by the fact that the customer base for green hosts have increased 60% a year from 2002-2008[1]

  11. Background Beyond simple customer interest, green products need to be competitively priced, as 83% of consumers would rather use a green service if it did not cost more than their dirty alternative[2] Green hosting is becoming significantly more prolific and in turn becomes competitive with dirty energy prices.

  12. Background • Green hosts are internet hosting companies that perform ‘green’ actions for their users that offset any carbon caused by their datacenter, either through the direct use of renewable energy, planting trees, or buying offsets.

  13. Background Stating that email exhibits temporal and/or spatial locality is a lofty claim, but intuition argues that a user who accesses an important email will eventually reference it again. Our hope is that these claims are validated by the data.

  14. Analysis One of the most classic equations in relation to caches is in regard to the Average Memory Access Time(AMAT): AMAT = Ht + r*Mt Where Ht is the cache hit time, r is the miss rate, and Mt is the miss penalty

  15. Analysis Beyond serving as a great high level analogy, greenmail has a similar equation for Average Carbon Footprint: ACFP = Hc + r*Mc Where Hc is carbon associated with a cache hit, r is the miss rate, and Mc is the carbon miss penalty

  16. Analyis • Due to the fact that Greenmail is carbon neutral then Hc is 0 and since Mc is based on the original email provider then the rate (r) is the only element of this equation that we can attempt to minimize, subject to our constraints.

  17. Constraints As with classic caches, the miss rate is based partially on the size of the cache and the algorithm used to replace data. While the Algorithm can be modified depending on experimental data, the cache size has a cap.

  18. Constraints Our cache size is self imposed to keep greenmail economically sound: our cost of maintaining the cache should not exceed the cost that the original email provider spends storing all of a single users data.

  19. Constraints The reason that this makes our cache smaller is that email providers have two elements working to reduce their energy costs: Dirty energy – costs less than green energy. Economy of Scale – more users translates into spending less per user.

  20. Constraints example Email Host A uses dirty power that costs half as much as green power, and due to the number of users it has it is able to purchase hardware at 75% the price Greenmail can. Greenmail must hold at most 37.5% of the emails that the host does.

  21. Implementation Our implementation of Greenmail is based on a modified version of SquirrelMail, a free open source web based email application that has access to an IMAP proxy server.

  22. Implementation Cache functionality comes from modifying the SquirrellMail IMAP functions. A single IMAP session consists of many messages being sent between the user running SquirrelMail and the initial email provider, but only a few of them are worth caching.

  23. Implementation Only two of these messages are ‘worth’ caching due to the fact most of the others are just a few lines long: ‘Get Headers’ – Returns a list of all the email subjects in the relevant mailbox/search ‘Get Body’ – Returns the body of the email requested

  24. Implementation ‘Get Body’ – An encrypted local copy is made whenever this is called and when any subsequent calls are made the local copy is retrieved. ‘Get Headers’ – theoretically should be easy to cache, except there is a timestamp baked into it that is used for error checking

  25. Implementation In addition to the modifications made to SquirrelMail, additional scripts needed to be made to allow for users to quickly and easily set up their own cache. Separate directories are made for each user due to how SquirrelMail stores IMAP configurations.

  26. Results Currently in the process of collecting data from real users as there is no set test suite / benchmark that models users accessing emails In the future if a good user ‘profile’ is found it is possible to automate this (x% spam, y% accessed frequently, etc)

  27. Example Locality Analysis (not from Greenmail)

  28. Future Work • Heavy data analysis • Cache Algorithms • Caching Headers • Caches searches • Used to limit mailbox refresh rate • Zoolander backend

  29. Questions/References [1] The AMD Opteron Processor Helps AISO. www.vmware.com. [2] N. Holdings. The nielsen global online environmental survey, 2011.

More Related