1 / 13

Grid Computing at The Hartford

Grid Computing at The Hartford. Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com. About The Hartford…. Headquartered in Hartford, CT Founded in 1810 Fortune 100 31,000 Employees Worldwide $26.5 Billion Revenues $2.9 Billion Core Earnings

Download Presentation

Grid Computing at The Hartford

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

  2. About The Hartford… • Headquartered in Hartford, CT • Founded in 1810 • Fortune 100 • 31,000 Employees Worldwide • $26.5 Billion Revenues • $2.9 Billion Core Earnings • $377.6 Billion Assets Under Management

  3. The Hartford’s Businesses • Property & Casualty • Auto, home, marine, workers compensation, etc. • Retail Investment Products • Variable and fixed annuities, mutual funds, 529 college savings plans • Retirement Plans • 401(k), 403(b), 457 • Institutional Financial Solutions • Individual Life Insurance • Group Benefits • International

  4. A Brief History (2003)… • Exponential growth in risk modeling activity exceeded our existing computing capabilities. • Grid technology was identified as a possible solution. • Condor was selected over other commercial solutions. • Mature • Windows Support • Simple, Scalable, and Flexible • Active Community • Free

  5. Our Grid Environment… • In Production Since 2004 • Two Pools (Production, Test) • Dedicated and Non-dedicated Execute Nodes • ~1000 Two-socket, multi-core x86 servers • ~1000 desktops, notebooks • Linux Central Managers • Linux and Windows Job Schedulers • Windows Execute Nodes • Web-based Administration and User Console

  6. Our Workload… • Hedging • Risk Management • Portfolio Pricing • Product Development • Off-the-shelf Software • In-house Software • Embarrassingly Parallel

  7. Typical Utilization

  8. Technical Challenges • Scaling – Rapid expansion of grid computing puts tremendous strain on operations (power, cooling, networking, floor space, etc.). • DR/BCP – A “cold spare” is not an option when the system is over 1000 servers. • Testing – An isolated, equivalent test environment is not an option (see above). Predictive modeling is necessary to simulate the environment at scale. • Storage – Traditional storage options are limited in both capacity and throughput. • Application Development – Developers need to be educated on writing “grid-friendly”, high-performance applications.

  9. Non-Technical Challenges • Policies – Effective and fair resource management policies need to be developed in cooperation with the users. Transparency is key in maintaining good relationships between user groups and between the users and IT. • Expectation Management – Users need to know what to expect in a shared grid environment. • Variable Capacity • Allocations vs. Named Servers • Procurement – Vendors and internal purchasing departments aren’t typically accustomed to ordering 100’s of servers at a time. • Finance – Traditional charge-back mechanisms ($/Server) don’t translate well to a grid environment.

  10. Growth Opportunities • Non HTC (High Throughput Computing) Workloads – Use grid resources to dynamically provision capacity for web services or other transactional business applications. • Virtualization – Leverage grid resource management capabilities to orchestrate virtualized resources. • More Scavenging – Continue to exploit underutilized resources throughout the enterprise to increase compute capacity. • Incorporate external resources, e.g. cloud computing, utility computing, etc., to handle planned/unplanned peaks.

  11. What’s new with Condor… • De-coupled Job Submission • Users submit jobs to database • Middleware feeds jobs to schedulers • Dynamic Preemption Policies • Need to prevent long running jobs from being preempted • Jobs should update class ads to indicate progress

  12. What’s new with our infrastructure… • Multiple Data Centers • One or two pools? • If two pools, how do we optimize utilization? • Clustered accountant? • More cores per socket • Increased server counts

  13. Conclusion • Grid has been a transformational technology giving users access to capabilities they wouldn’t have envisioned, or can now live without. • Grid computing is an integral part of our business and gives the company a stable, scalable platform to model uncertainty. • Condor has proven to be an invaluable asset and has time and again handled whatever challenge we’ve thrown at it. • Grid isn’t dead – it’s just middle-aged.

More Related