The Project: Medium-Sized Demographics Data Mining Analysis

0 to 60 in 3.1 Tyler Carlton Cory Sessions

<Insert funny joke here>

The Project • Medium sized demographics data mining project • 1,700,000+ User base • Hundreds of data points per user

“Legacy” System – Why Upgrade? + • Main DB (External Users) • Offline backup (Internal Users) • Weekly manual copy backups • Max of 3 simultaneous data pulls • 8hr+ data pull times for complex data pulls • Random index corruption

Notes: Smaller is Better On average, CPU usage with MySQL was 20% lower than our old database solution.

Why We Chose MySQL Cluster • Scalable • Distributed processing • 5 – 9’s Reliability • Instant data availability between internal & external users

What We Built – NDB Data Nodes • 8 Node NDB cluster • Dual Core 2 Quad 1.8 ghz • 16 Gig ram (Data memory) • 6x Raid 10 SAS 15k RPM drives

What We Built – API & MGMT Nodes • 3 API nodes + 1 management node • Dual Core 2 Quad 1.8 ghz • 8 Gig ram • 300 gig 7200rpm (Raid 0)

NDB Issues with a Large Data Set • NDB load times • Loading from backup: ~ 1 hour • Restarting NDB nodes: ~ 1 hour Note: Load times differ depending on your data size

NDB Issues with a Large Data Set • Indexing Issues • Force index (NDB picks wrong) • Index creation/modification order matters (Seriously!) • Local Checkpoint Tuning • TimeBetweenLocalCheckpoints - 20 means 4MB (4 × 220) of write operations • NoOfFragmentLogFiles – No. of 4 x 16MB files • None deleted until 3 local checkpoints • On startup: Local checkpoint buffers would overrun • RTFM (two, maybe three times)

NDB Network Issues • Network transport packet size • Buffer would fill and overrun • This caused nodes to miss their heartbeats and drop • This would happen when: • A backup was running • A local checkpoint was running at the same time • Solved by : Increasing network packet buffer

Issues - IN Statements • IN statements die with engine_condition_pushdown=ON with a set of apx. 10,000 or more. (caused with zip codes) • Really need engine_condition_pushdown=ON, but this broke it for us, so… we had to disable it.

Structuring Apps: Redundency • Redundant power supply + dual power sources • Port trunking w/ redundant Gig-E switches • # NDB Replicas: 2 (2x4 setup) 64 gig max data size • MySQL (API Nodes ) Heads: Load balanced with automatic fail over

Structuring Apps: Internal Apps • Ultimate goal: Offload the data intensive processing to the MySQL nodes

The Good Stuff: Stats! Queries per Second (over 20 days) • Average 1100-1500 Queries / Sec during our peak times • Average 250 Queries / Sec

Website Traffic Stats for March 2008

Net Usage: NDB Node • All NDB data nodes have nearly identical network bandwidth usage MySQL ( API ) Nodes use about 9 MBs max under our current structure Totaling 75 MBs during peak(600 Mbs)

Monitoring & Maintenance • SNMP Monitoring: CPU, Network, Memory, Load, Disk • Cron Scripts: • Node status & Node down notification • Backups • Database maintenance routines • MySQL Clustering book provided the base the scripts

Dolphin NIC Testing 4 node test cluster 4 x overall performance Brand new patch to handle automatic Ethernet failover / Dolphin Fail Over ( beta as of March 28 ) Net Usage: Next steps…

Questions?

Contact Information • Tyler Carlton www.qdial.com tcarlton@gmail.com • Cory Sessions CorySessions.com OrangeSoda.com

The Project: Medium-Sized Demographics Data Mining Analysis

The Project: Medium-Sized Demographics Data Mining Analysis

Presentation Transcript

60 Gadgets in 60 Minutes

0 to 60 in 3.1

60 AutoCAD Tips in 60 Minutes

60 Gadgets in 60 Minutes

60 Tips in 60 Minutes

0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .

Black 0 0 Blue 20 0 Green 30 0 Red 45 0 Yellow 60 0

Adult Programming: from 0-60 VLA 2013

This is a 60 in 60 Challenge: 60 questions in 60 minutes

Zero to 3D in 60 Minutes

100% 80% 60% 40% 20% 0% 20% 40% 60% 80% 100%

In 60 seconds…

0 to 60: Developing Apps for Microsoft SharePoint 2013

60 Marketing Tips in 60 Minutes

From 0 to 60+: Revving up FYE at Warp Speed

Treatment Sucrose (mM) Nitrate (mM) 1 30 0 2 30 0 3 60 0 4 60 0 5 90 0 6 90 0

0 to 60: The Fast Track to learning eFolio

time (min) 0 5 15 30 60

3.1 start to a 3.1

Aim: What’s so special about a 30 0 -60 0 -90 0 triangle?

0 to 60: Developing Apps for Microsoft Office 2013

0 to 60 in 3.1