560 likes | 693 Views
This article explores the critical technical challenges encountered by Faz.net between 2000 and 2009. It covers essential topics such as business planning from a programmer's perspective, system design for scalability, and maintaining uptime. It highlights the site's performance fluctuations, notably in response to major events, and critiques various measurement techniques and logging strategies used to monitor system health. Additionally, it discusses implementation languages and evolving strategies for handling increased traffic and system demands, providing insight into the website's architecture.
E N D
what i won’t show • business numbers, € • business plans
what to do (from the point of view of lowly programmer) • invent a business plan • define, plan, and implement desired functionality • build a robust, secure, scalable system • keep that system up 100% of the time • allow changes to support new business initiatives
a brief history • launched 8 Jan 2001 • “stable” by Spring of 2001 • unstable on 11 Sep 2001 • better able to handle (many classes of) huge load increases by Spring 2002
how big is faz.net? • much smaller than google • smaller than spiegel
how big is faz.net? • much smaller than google • smaller than spiegel • comparable to other newspapers
a sketch of our system layout (a couple years old) • load balancer • web servers • DB and file servers • other application servers • client machines, e.g. newsroom • external partners, e.g. freemail provider
an ancient attempt to show how changing an article affects site • editor publishes a new article (or a new version of an existing article) • a DB trigger fires • cached DB result sets get updated • cached HTML gets updated or deleted • (we do things differently now)
how to measure what is happening / whether things are ok • measure throughput at / between various points
(partial) network traffic over time • MRTG (http://oss.oetiker.ch/mrtg/) • Standard, free (Gnu) • Useful at a glance info
how to measure what is happening / whether things are ok • measure throughput at / between various points • measure, e.g. cpu load on web servers (NB: load is pretty low, max would be 12*100)
Some errors can be seen by any user with a browser who runs into them • Server error • Wrong contents • Broken HTML, images, or …
digested log files can be helpful • web sites tend to have many log files, several kinds of log files; and some kinds can be *huge* • they tend to be straight ascii files in some format you might have little control over • various kinds of statistics might sometimes interest you • in particular error statistics
external, hired monitoring service • hire someone outside your site to watch certain pages on your site (load them periodically) and keep statistics about timing, sizes, errors • ideally get regular reports showing everything is groovy • support “drilling” to get more details when necessary
What does this mean? (irregular load distribution, strange peeks)
somewhat subtle: unexpected cpu load spikes on utility machines
a little bit about development methods • requirements gathering (quality varies) • planning • implementation
our implementation languages • browser • html (+ „furniture graphics“) • javascript • css (fairly recently) • public javascript libraries (fairly recently): MooTools, JQuery
One view of a web application pattern (Application Architecture Guide 2.0: Designing Applications on the .NET Platform
our implementation languages • webserver („front end“) • ASP (JScript, mainly 2000 – 2005) • DotNet 2 (2005 – 2007) • DotNet 3 (2007 - )