Enhancing Catastrophic Risk Analysis with IBM Puredata for Analytics

128 Views

Download Presentation
## Enhancing Catastrophic Risk Analysis with IBM Puredata for Analytics

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Enhancing Catastrophic Risk Analysiswith IBM Puredata for**Analytics**Agenda**• Leveraging IBM Puredata into Catastrophic Risk Analysis • IBM Puredata Success Stories in Catastrophic Risk Analysis • IBM Puredata In-database Analytics • IBM Puredata User Defined Extensions (UDX) • Migration of a Catastrophic Risk Application to IBM Puredata**IBM Big Data Platform**InfoSphere BigInsights Hadoop-based low latency analytics for variety and volume Hadoop (NoSQL) Information Integration Stream Computing InfoSphere Information Server High volume data integration and transformation InfoSphere Streams Low Latency Analytics for streaming data MPP Data Appliances IBM InfoSphere Warehouse Large volume structured data analytics IBM Puredata 2000 BI+Ad Hoc Analytics Structured Data IBM Smart Analytics System Operational Analytics on Structured Data**On-Demand Catastrophe Risk Analysis with IBM Puredata for**Analytics**Who is interested in Catastrophe Risk Models?**• Insurers • Managing their exposure and filing for rates • Brokers • Assessing risk management strategies for clients • Reinsurers • Pricing reinsurance • Capital markets • Pricing cat bonds • Rating agencies • Evaluating a company’s capital requirements**Leveraging Catastrophe Risk Modeling**• Reduce the risk that an insurer is unable to meet claims • Reduce policyholder loss if firm is unable to fully meet all claims • Provide an early warning system if capital falls below a required level • Promote confidence in financial stability • Evaluate the company's risk profile and related reinsurance and investment strategies • Discuss capital management with other external parties (ratings) • Evaluate returns on risk-adjusted capital for strategy development and implementation for individual business segments • Understand the relative contribution of the major risk categories to the overall risk profile (non-cat losses, catastrophes, reserve, credit and market)**Catastrophe Risk Modeling**Geospatial Peril Models Historical/Forecasted Temporal/Real Time Performance Improvement by Understanding Risk Treaty Conditions Value at Risk Underwriting Re-Insurance Policy Pricing Policyholder Loss Loss Estimating Sensitivity Analysis Capital Management Standard Models Scenario Based Models Likelihood/Probability Temporal Correlation Simulations**Changing the Game in Catastrophe Risk Modeling**Faster Near-Real-Time Data Ingestion Shortened Analytic Cycles New Methods Comprehensive Risk Analysis In-process Risk Analysis • Back-office Applications • Downstream Analytics Treaty Conditions Ad hoc Query Data Mining SPSS Cognos Standard Models Scenario Model IBM Puredata Analytic Appliance Value at Risk Underwriting Re-Insurance Policy Pricing Policyholder Loss Loss Estimates Sensitivity Analysis Capital Management Embedded Customer Algorithms (SQL & UDX) Stat & Treaty Engine IBM Netezza In-Database Analytics Netezza High-Speed Spatial Data Loader( AIR,RMS data) Increased Depth Increased Analytic Dimensionality Expanded Peril Models Likelihood/Probability Flexibility & Understanding What-if Modeling High-Speed Risk Analysis • Catastrophe Modeling Workflow Control • Policy Demographics Workflow Management Temporal Correlation Simulations**Complementing AIR & RMS with IBM Puredata for Analytics**Upstream RMS & AIR application Sorted on Yearly Total Loss Simulation Pre-Cat Stats SQL Export Sort On Year Stats Module Calculation Engine Sorted on Yearly Max Loss EP Definition Pre Cat data Data Extraction & Grouping Recovery Stats Recovery Apply Treaty Data Calc net losses Post-Cat Stats Initial Scope Report generation Ad hoc query In-database Analytics Expanded Capability by moving to in-database Analytics**Key Points for Migrating to IBM Puredata for Analytics**• Database Migration • IBM Puredata is a SQL-92 compliant database • If you are using SQL-Server proprietary extensions there will be some migration effort • Initial review indicates we may not want to use the existing UDF, but rather optimize the SQL for IBM Puredata • Analytic Applications • NetezzaAnalyitcsUDX framework essentially allows a wrapper to be put around typical “file-in – file-out” applications to run in-database • We may want to alter some of the existing application for improved parallelism (non-serial) as well as set-based logic • Long-term Simplicity • IBM Puredata essentially eliminates the need for database tuning and performance issues associated with Analytic • Consolidation of analytics into the database simplifies the entire architecture • Only the IBM Puredata Analytic Performance is proprietary • Again, IBM Puredata is SQL-92 compliant • Our UDX wrappers are similar to every other database platform**IBM Puredata Advanced Analytics Improved Analytics for**Catastrophe Risk**Up-to-the-minute Risk Modeling – Guy Carpenter**• Large reinsurance company • Exposure management application calculates risk on insured properties • Risk data changes constantly as hurricane is approaching • 4 million insured properties, tens of thousands of risk polygons • Previously analysis took 45 minutes using Oracle Spatial • Now takes 5 seconds using IBM Puredata**National Fire Station Alignment**• Determine the 5 Nearest Fire Stations to each household • 41,000 US Fire Stations • 114,00,000 million Zip 12 Points (Parcels) for Entire US…. • Calculated all scenarios in 30 minutes! • Analysis was never possible on Oracle! 41,000 – U.S. Fire stations**Proximity to Coast**• Shortest Distance to Coast: Florida • 14,700 coast segments (each defined by 300 vertices on average) • 8,500,000 Points ZIP12 Points • Cartesian Join Netezza: 3 Hours, 42 Minutes Inhouse GIS – 3 weeks! (100+x Improvement)**Policy Accumulation – Total Insured Value**• Define a “buffer” around each insured property • Sum all the insured properties in each buffer**Calculate Total Insured Value**• Sample Data – Miami Florida (Miami-Dade County), • 939,000 properties, Sum each value within a buffer centered around each point • 1 km radius search, On average 600 properties summed into each calculation • Individual Calculation - < 1 second • Bulk calculation - 2 hours**Determining Portfolio Value-at-Risk In-Database**CHALLENGE Evaluate massive portfolios as fast as possible to minimize future losses and risk exposure • “This technology will allow us to revolutionize our risk calculation environment...we will be able to completely change the we that we look at and calculate risk.” • Risk Quant at a Top 3 Bank SOLUTION In-database analytics moves the complex calculations next to the data, harnessing the power of up to 920 CPU cores to attack one of the most challenging trading analytic processes, Value-at-Risk that uses statistical simulations to compute forward looking portfolio values running in minutes as opposed to hours BENEFITS Real-time, high performance, scalable in-database analytics enables faster risk analysis**CalculatingValue-at-Risk In-Database**• Determine the Value-at-Risk for an equity options desk • 200,000 positions – different instruments and maturities • 1000 underlying stocks • Required to do the following: • Calculate daily returns on underlying stocks using historical prices • Calculate the correlation of daily returns • Perform Singular Value Decomposition (SVD) • Simulate correlated returns for all underlying stocks using SVD for next 1 year • Perform 10,000 simulations and calculate the 95% percentile loss on each day for the entire portfolio**OpenRisk uses In-Database Scoring and Spatial Analytics on**Netezza CHALLENGE Quickly and on-demand determine combined risk across all portfolios of any size (1M+) for all insured catastrophic events “Because of Netezza, we were able to launch a new business model – an on-demand, software-as-a-service large scale catastrophic risk modeling – that radically reduces the exposure for insurance companies.” - Shajy Mathai, CTO, OpenRisk SOLUTION In-database analytics eliminate data movement and execute 500B+ complex calculations in minutes to determine risk across portfolios BENEFITS Real-time, high performance, scalable in-database analytics enables broader risk analysis**OpenRisk Natural Disaster Portfolio Loss Estimate**• Statistical model with a stochastic set of hurricane events that are applied to portfolio of properties to generate loss estimates over time • 1M policies assessed for the entire state of Rhode Island • Required to do the following: • Computing the nearest “surface roughness” coefficient • Nearest GID for every impacted site (Lat/Long accuracy of .2 minute) • Interpolation on continuous distribution functions**Optimizing Your Own Advanced Analytics OpenRisk Hurricane**Risk Model**Use Case Summary – Hurricane Risk Assessment**• Catastrophe modelers run various models which simulate hazard and vulnerability over extremely large time periods (thousands of years) for portfolios of property risk. • This process generates terabytes of data which in turn is analyzed to make loss estimates. • Challenge: To develop a framework for implementing a Hurricane Model that will: • Improve performance from days to hours • Reduce data movement • Increase integration flexibility • Reduce operational footprint by integrating database with analysis grid**Technical Architecture Imperatives**• IBM Netezza Analytics as a SaaS Platform • Facilitate rapid porting of Existing Hurricane Insurance Risk Models • Maximum performance & scalability • Millions of sites affected by a disaster event, i.e.. Hurricane • Simplicity of a SQL call to run a sophisticated hurricane model • Leverage the flexibility of IBM Netezza Analytics to implement a hurricane risk model • UDX: User Defined Extensions to incorporate legacy code • Geospatial Analytics: Run risk for sites impacted in hurricane polygon • Facilitate rich, high-performance reporting and 3D map rendering! • Accurately forecast damage assessment to property • Report discrepancy between coverage and damage assessment**The Existing Solution**Bulk load to database Fortran Program Database • The Existing Solution • Single threaded processing, very slow • Potent risk modeling intellectual property locked away in Fortran • Difficult to apply parallel processing • Lots of infrastructure • Bulk-movement of data Results to Files ODBC • Process a site if in hurricane • Gather building structural characteristics • Gather terrain data • Apply mathematical modeling to score risk • Compute predicated losses to site in $ • Challenges • How to leverage existing code without significant rewrite? • How to apply parallel processing in simple way? • How to avoid massive data shipping?**IBM Netezza SolutionMulti-tenant Solution for Applying**Advanced Analytics in-Database Analytics Computing Grid • Simplicity of SQL!!! – Two Steps: • 1. Run Models on Demand! • 2. Execute Reports! • Massively parallel!!! • Speed! • Optimal distribution of site, building, terrain, and physics data • In-database Analytics • Geospatial analytics applies latitude & longitude appropriately. • C++/Fortran UDX implement model. • 1 thread/Shared Nothing node. • Elimination of DATA SHIPPING… • Emphasis on FUNCTION SHIPPING • True Multi-Tenant SaaS Client Company 1 Proprietary Risk Model Client Company n Proprietary Risk Model C++/Fortran UDX, Geospatial Analytics**Running the Model on Demand**ETL Preprocessing • SQL • ETL Tools Run Model • Simple & elegant single SQL statement • Use C++/Fortran UDXs that execute for a site: • Determine of building characteristics. • Determine terrain factors. • Determine physical forces in effect. • Use proprietary mathematics • Output data in complex, proprietary data structure. One elegant, master stored procedure Populate input tables • Simple SQL insert statements using pure C++ UDX. Process reporting tables • A SQL stored procedure Reporting Layer**Thanks**Questions?