Caching

Caching Willem VisserRW334

Overview • AppEngine Datastore • No Caching • Naïve Caching • Caching invalidation • Cache updating • Memcached • Beyond your code

AppEngine Python Datastore • Datastore • db • Old and will be going away at some point • ndb (https://developers.google.com/appengine/docs/python/ndb/) • New and supports some cool features from google.appengine.ext import ndb class Stuff(ndb.Model): title = ndb.StringProperty(required = True) content = ndb.StringProperty(required = True) date = ndb.DateTimeProperty(auto_now_add=True)

NDB • Python class defines the model • Each entity has a key, which in turn has a parent, up to the root that has no parent • Entities in this chain is in the same group • Entities in the same group has consistency guarantees stuff_title = self.request.get(’stuff_name') stuff = Stuff(parent=ndb.Key(”Things", stuff_title or "*notitle*"), content = self.request.get('content')) stuff.put()

NDB (2) • Queries and Indexes • There are very many ways to query • Complex queries might need complex indexes • NDB creates simple indexes automatically • Complex ones can be defined in index.yaml • GQL is similar to SQL • Only gets executed when accessed stuff = ndb.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff)

No Caching • Every db_read hits the database • Database reads tend not to be the fastest thing • This can be very inefficient therefore

ExampleNo Caching def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff) class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

Naïve Caching If not cache[request]: cache[request] = db_read(); return cache[request] • This will do wonders for performance • If the cache is too large it might start to slow down a bit • Above the db_read is avoided but rendering HTML could also be cached if that takes a lot of time

ExampleNo Caching def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff) class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

Example CACHE = {} def top_stuff(): key = 'top' stuff = CACHE[key] if not stuff: logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

New data? • Will the previous solution work? • What happens if you add new data • Added to the DB and then redirect to / • Render_front calls top_stuff • However cache is hit and we get the old data • Cache must be invalidated when new data comes

Clear Cache CACHE = {} def top_stuff(): … class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() CACHE.clear() self.redirect("/")

Cache Stampede • If one user writes new data • Cache gets cleared • Now lots of users all access the site at the same time • All of them doing db_reads since the cache is empty • This hammers the DB and slows everybody down • Depending on settings the DB might also block or even crash • Without any caching this could also happen

Cache Refresh def top_stuff(update = False): key = 'top' stuff = CACHE[key] if (not stuff or update): logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() top_stuff(True) self.redirect("/")

Cache Update • Most aggressive solution • No DB reads! • On new data, store in the DB and also directly into the cache, without reading from the DB • The DB is just a backup storage now for in case something goes wrong, such as a server going down

Cache Comparisons

Sharing a Cache • If we have more than one server • Do we have a cache for each server, or, share a cache amongst servers? • Cache for each server can have suboptimal behavior if they are not synchronized • Data might be in the cache on server 1 and not server 2, for example • Good solution is to use a very fast shared cache

Memcached • See http://memcached.org/ • Very fast, in-memory, key-value store • Caching technology behind very many websites • Support for it within AppEngine from google.appengine.api import memcache … def top_stuff(update = False): key = 'top' stuff = memcache.get(key) if (update) or (not stuff): stuff = db.GqlQuery("SELECT * FROM Art ORDER BY created DESC LIMIT 10”) stuff = list(stuff) memcache.set(key,stuff) return stuff

NDB and Caching • Two Caches controlled by policies • In context (microseconds) • Only current http request • Writes to datastore and cache, reads first checks cache • Memcache (milliseconds) • All nontransactional context caches here • All contexts share same memcache • Within a transaction memcache is not used • Can be configured by policies • Some standard ones available

More Caching • Some caches also live outside the developers immediate control • Browser Cache • Single user • Proxy Cache • Multiple users • Gateway Cache • Distributed by Content Delivery Networks • HTTP 1.1 supports “Cache-Control” header • Allows developers to control how things are cached

Caching

Caching

Presentation Transcript

Caching

ARP Caching

Practical Caching

Relation Caching

Caching II

Caching Basics

Web Caching

Caching

web caching

Web Caching

Web Caching

Caching

Geo caching

Caching Game

Web caching

Web Caching

1010 Caching

Caching III

Caching Strategies

Public Caching