Add A Billion Row Data Warehouse To Your App

Add A Billion Row Data Warehouse To Your App with Redshift, sql and duct tape James Crisp, Tech Principal @ Getup

Context GetUp! is an independent movement to build a progressive Australia and bring participation back into our democracy. Lots of online campaigns, field actions, social media. Supported by small donations. 600K+ members. Budget, medicare, uni fees, barrier reef, forests etc.

Big Rails App • CMS + Petitions, emailing MPs, donations etc • Email blasting & segmenting • Back office & mini-crm • 2 X [3 app, 2 worker, 1 DB servers], Au and Sg

Data Warehouse… why?? • Reporting & exports • Experimental data science • Stop locking up transactional DB!! • More data sources (logs, CRM, ..) => customer • Different schema & faster queries

Options • Read-only replica of transactional DB • Mongo / Cassandra / .. • Hadoop, Pig, Hive • Elastic search • BIG Sql, eg Redshift

Why BIG Sql? • Team skills: tech & data scientists • Easy integration from SQL DB • Good hosted options • Fast performance, column based • Sets and aggregations • Can do JSON for less structured data

Why Redshift? • Fully hosted & managed multi-node • Fast & Column based, semi-compressed • Relatively cheap and easy to try • Good import options • Massively expandable • (Security & backup options)

What is Redshift… really? • Heavily modified fork of PostgreSQL 8 • Specialised data storage & query engine • Can use normal ODBC/JDBC/Postgres clients to connect

string connString = "Driver={PostgreSQL Unicode};" + String.Format("Server={0};Database={1};" + "UID={2};PWD={3};Port={4};SSL=true;Sslmode=Require", server, DBName, masterUsername, masterUserPassword, port); OdbcConnectionconn = new OdbcConnection(connString); conn.Open(); OdbcDataAdapterda = new OdbcDataAdapter(sql, conn); da.Fill(ds); dt = ds.Tables[0]; foreach (DataRow row in dt.Rows) { // Do something useful }

$DBConnectionString = "Driver={PostgreSQL UNICODE}:Server=$MyServer;Port=$MyPort;Database=$MyDB;Uid=$MyUid;Pwd=$MyPass;" $DBConn = New-Object System.Data.Odbc.OdbcConnection; $DBConn.ConnectionString = $DBConnectionString; $DBConn.Open(); $DBCmd = $DBConn.CreateCommand(); $DBCmd.CommandText = "SELECT * FROM mytable;"; $DBCmd.ExecuteReader(); $DBConn.Close();

psql.exe -h $DBSERVER -U $DBUSER -d $DBName-f script.sql

How much does it cost? • Min cluster size 2, leader is free • 2 X 2TB HDD, 15G RAM: Syd $2.50/h, US $1.70/h (reserved 1yr Syd $13.2K, US $8.8K) • Up to 2.56TB flash or 16TB HDD and 244GB RAM per node. Cluster up to 1.6 PB.

Data Sources so far… Transactional DB Application request logs

Hooking up DB Data DB Server Redshift MySQL S3 Map & Dump CSVs LOAD CSVs Fire Drop/Create tables, Load data

Hooking up Request Logs App Servers Fire data load Upload JSON logs S3 Redshift Map & Load JSON

Demos • Scripts & SQL for hooking up • AWS Redshift console • Connect & query

What have we used it for? • Faster data science / reporting without locking up transactional DB • Combining sharded tables • Request logs (browser agent, params, ..) • Marking/tracking segments + debugging info

Other uses • Data exploration with tools like Tableau • BI tools • Denormalised data • Dump in more data for single view of user – FB info, CRM, etc

Conclusion • Easy to set up and use (for Win/.NET too) • Super fast • Reasonable price • Met our needs well so far We are hiring atm!

Add A Billion Row Data Warehouse To Your App

Add A Billion Row Data Warehouse To Your App

Presentation Transcript

Building a Data Warehouse

Designing a Data Warehouse

Data Warehouse

Introduction to Data Warehouse

Designing a Data Warehouse

Row, row row your boat

Row, Row, Row Your Boat.

Populating a Data Warehouse

Intro. to Data Warehouse

How to design a DATA WAREHOUSE

How to add an app to your iphone

Planning a Data Warehouse

WHY A DATA WAREHOUSE?

Building a Data Warehouse

Row, Row, Row your boat

HOW TO ADD CAPTCHA TO YOUR ANDROID APP?

Add These Trending App Features To Your On-Demand Taxi App

warehouse mobile app

How to Add Plugins to your Photoshop App

UCPath Overview PPS Data Warehouse to UCPath Data Warehouse