1 / 34

SilverLining: Scaling Infrastructure for Biodiversity Web Apps

SilverLining is a project focused on scaling hardware infrastructure for biodiversity web apps, with a goal of improving reliability and user experience. Funding is allocated for infrastructure and core application development. The project explores Platform as a Service (PaaS) solutions, such as Google App Engine, to optimize resource efficiency and search performance. Total operating costs are projected to be significantly reduced with the implementation of PaaS.

myrnahunt
Download Presentation

SilverLining: Scaling Infrastructure for Biodiversity Web Apps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SilverLining

  2. Stuff we're covering • Hardware infrastructure and scaling • Cloud platform as a service  • The SilverLining Project

  3. Some context • We work at a university • Funding based on projects • Biodiversity web apps and APIs • Focus on software (not hardware)

  4. Infrastructure • Applications depend on infrastructure • Infrastructure that "just works" is expensive • More money for infrastructure means less money for application development • Degenerates without long-term funding • Unreliability is bad for applications  • Increasingly bad user experience over time

  5. $1.6M USD total budget to 17 institutions • $245k  USD (30.6% of direct costs) for infrastructure

  6. $1.6M USD total budget to 17 institutions • $245k  USD (30.6% of direct costs) for infrastructure • $100k USD (12.6% of direct costs) for core application development • DiGIR provider, DiGIR portal

  7. MaNIS, ORNIS, HerpNet, FishNet  • $7.6M USD combined budgets, 71 institutions • $196k USD annual operating cost

  8. MaNIS, ORNIS, HerpNet, FishNet  • $7.6M USD combined budgets, 71 institutions • $196k USD annual operating cost • $179k USD (92%) for infrastructure

  9. Infrastructure as a Problem (IaaP)

  10. Infrastructure as a Problem (IaaP) • Unsustainable • Creates a barrier to innovation • And this is all before scaling comes into play!

  11. Scalability "The ability for infrastructure  to reliably handle heavy request loads in a high performance way."

  12. IaaP at scale 

  13. Scaling up • Scale up vertically with a server upgrade  • Scale out horizontally with more servers

  14. Scaling up

  15. Scaling DiGIR networksMaNIS, ORNIS, HerpNet, FishNet  • ~85 million records  • ~100 servers 

  16. Scaling DiGIR networksMaNIS, ORNIS, HerpNet, FishNet  • ~85 million records  • ~100 servers  s

  17. Query: All records with a point

  18. Response: Error: IO problem

  19. "Scaling is hard." - Alex Payne

  20. "Scaling is hard." - Alex Payne al3x.net/2010/07/27/node.html

  21. Scaling in the small • Handling dozens or requests per second • Scaling up vertically is sufficient • Performance improvements are software related al3x.net/2010/07/27/node.html

  22. Scaling in the large • Billions of requests per week (Google) • Millions of active users (Facebook) • Data centers worldwide with millions of servers al3x.net/2010/07/27/node.html

  23. Are we scaling large or small? • GBIF ~220 million records • eBird ~2 million new records per month • Undigitized collections ~2.5 billion records 

  24. Scaling in the "small-ish" • We're at the brink! • IaaP is in the way, scaling is making it worse • Where's the silver lining in all of this?

  25. Platform as a Service (PaaS)en.wikipedia.org/wiki/Platform_as_a_service Conceptually quite simple: • Computing power over the Internet  • No servers to maintain • Pay for use • Scales large (even if your application is small) • Provided by companies such as Amazon, Microsoft, Google

  26. SilverLiningsilver-lining.googlecode.com • Experiments, metrics, prototypes (not products) • Picked Google App Engine • PaaS with biodiversity data • Simple Darwin Core • Bulk loading, storage • MapReduce - indexes, validation, statistics • Optimize for resource efficiency, search performance

  27. Cost comparison Total annual operating costs of vertebrate networks: • Current architecture: USD $195,600 • Projected App Engine: USD $19,540 

  28. Cost comparison Total annual operating costs of vertebrate networks: • Current architecture: USD $195,600 • Projected App Engine: USD $19,540  Total cost for SilverLining work to date: • 50 cents

  29. App Enginecode.google.com/appengine • Develop scalable web apps on Google's infrastructure • No servers or hardware to maintain and free quotas • Standards based Java and Python SDKs • IDE support for Eclipse, NetBeans, IntelliJ • Local development server  • Integrated support for unit testing

  30. App Engine constraints • Practical constraints for performance and scalability • The datastore is not a relational database  • Query can only use inequality filters on 1 property • Fails: year >= 1980 and year <= 1982 and elevation > 10 • Solution: Set membership queries

  31. Set membership queries • Before: year >= 1980 and year <= 1982 and elevation > 10 • After: year "within 1 year" of 1981 and elevation > 10 • List for "within 1 year" of 1980: [1979, 1980, 1981]

  32. Aggregation and synchronizationcode.google.com/p/pubsubhubbubcode.google.com/apis/feed/push • Fast aggregation via API • Subscribe to changes at the source • Changes pushed automatically

  33. What's the end game? • PaaS instead of IaaP  • SaaS (software as a solution) • BaaS (biodiversity applications at scale) Aaron Steele asteele@berkeley.edu John Wieczorek tuco@berkeley.edu

  34. What's the end game? • PaaS instead of IaaP  • SaaS (software as a solution) • BaaS (biodiversity applications at scale) Any QaaC? (Questions as a challenge) Aaron Steele asteele@berkeley.edu John Wieczorek tuco@berkeley.edu

More Related