1 / 20

Caching

Caching. Willem Visser RW334. Overview. AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code. AppEngine Python Datastore. Datastore db Old and will be going away at some point

gigi
Download Presentation

Caching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caching Willem VisserRW334

  2. Overview • AppEngine Datastore • No Caching • Naïve Caching • Caching invalidation • Cache updating • Memcached • Beyond your code

  3. AppEngine Python Datastore • Datastore • db • Old and will be going away at some point • ndb (https://developers.google.com/appengine/docs/python/ndb/) • New and supports some cool features from google.appengine.ext import ndb class Stuff(ndb.Model): title = ndb.StringProperty(required = True) content = ndb.StringProperty(required = True) date = ndb.DateTimeProperty(auto_now_add=True)

  4. NDB • Python class defines the model • Each entity has a key, which in turn has a parent, up to the root that has no parent • Entities in this chain is in the same group • Entities in the same group has consistency guarantees stuff_title = self.request.get(’stuff_name') stuff = Stuff(parent=ndb.Key(”Things", stuff_title or "*notitle*"),                    content = self.request.get('content')) stuff.put()

  5. NDB (2) • Queries and Indexes • There are very many ways to query • Complex queries might need complex indexes • NDB creates simple indexes automatically • Complex ones can be defined in index.yaml • GQL is similar to SQL • Only gets executed when accessed stuff = ndb.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff)

  6. No Caching • Every db_read hits the database • Database reads tend not to be the fastest thing • This can be very inefficient therefore

  7. ExampleNo Caching def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff) class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

  8. Naïve Caching If not cache[request]: cache[request] = db_read(); return cache[request] • This will do wonders for performance • If the cache is too large it might start to slow down a bit • Above the db_read is avoided but rendering HTML could also be cached if that takes a lot of time

  9. ExampleNo Caching def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff) class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

  10. Example CACHE = {} def top_stuff(): key = 'top' stuff = CACHE[key] if not stuff: logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

  11. New data? • Will the previous solution work? • What happens if you add new data • Added to the DB and then redirect to / • Render_front calls top_stuff • However cache is hit and we get the old data • Cache must be invalidated when new data comes

  12. Clear Cache CACHE = {} def top_stuff(): … class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() CACHE.clear() self.redirect("/")

  13. Cache Stampede • If one user writes new data • Cache gets cleared • Now lots of users all access the site at the same time • All of them doing db_reads since the cache is empty • This hammers the DB and slows everybody down • Depending on settings the DB might also block or even crash • Without any caching this could also happen

  14. Cache Refresh def top_stuff(update = False): key = 'top' stuff = CACHE[key] if (not stuff or update): logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() top_stuff(True) self.redirect("/")

  15. Cache Update • Most aggressive solution • No DB reads! • On new data, store in the DB and also directly into the cache, without reading from the DB • The DB is just a backup storage now for in case something goes wrong, such as a server going down

  16. Cache Comparisons

  17. Sharing a Cache • If we have more than one server • Do we have a cache for each server, or, share a cache amongst servers? • Cache for each server can have suboptimal behavior if they are not synchronized • Data might be in the cache on server 1 and not server 2, for example • Good solution is to use a very fast shared cache

  18. Memcached • See http://memcached.org/ • Very fast, in-memory, key-value store • Caching technology behind very many websites • Support for it within AppEngine from google.appengine.api import memcache … def top_stuff(update = False): key = 'top' stuff = memcache.get(key) if (update) or (not stuff): stuff = db.GqlQuery("SELECT * FROM Art ORDER BY created DESC LIMIT 10”) stuff = list(stuff) memcache.set(key,stuff) return stuff

  19. NDB and Caching • Two Caches controlled by policies • In context (microseconds) • Only current http request • Writes to datastore and cache, reads first checks cache • Memcache (milliseconds) • All nontransactional context caches here • All contexts share same memcache • Within a transaction memcache is not used • Can be configured by policies • Some standard ones available

  20. More Caching • Some caches also live outside the developers immediate control • Browser Cache • Single user • Proxy Cache • Multiple users • Gateway Cache • Distributed by Content Delivery Networks • HTTP 1.1 supports “Cache-Control” header • Allows developers to control how things are cached

More Related