Data-intensive Computing Case Study Area 2: Financial Engineering

Data-intensive Computing Case Study Area 2: Financial Engineering B. Ramamurthy B. Ramamurthy & Abhishek Agarwal

Modern Portfolio Theory • Modern Portfolio Theory (MPT) is the theory of investment that tries to maximize the return and minimize the risk by analytically choosing among the different (financial) assets. • MPT was introduced by Harry Morkowitz in 1952 and he received Nobel Prize for Economics in 1990. He is currently a professor of Finance at Rady School of Management at UC, San Diego. (83 years old) • One of his influences was John Von Neumann, the inventor of the stored program computer. B. Ramamurthy & Abhishek Agarwal

The Big Picture • Stock market portfolio context: • Given an amount A and a set a stocks {w, x, y, z} and the historical performance of the set of stocks, what should be (%) allocation of the amount to each of the stocks in the set so that returns are maximized and risks are minimized. • Example: $10000, stocks {C, F, Q, T}, what is the recommended split among these to get best returns and least risk. • Some quantitative assumptions are made about return and risks. • The above is my simple interpretation of the complex problem. B. Ramamurthy & Abhishek Agarwal

Reference • Reference: Application of Hadoop MapReduce to Modern Portfolio Theory by Abhishek Agarwal and Bina Ramamurthy, paper submitted to ICSA 2011. • Also work by Ross Goddard (UB grad at Drexel), Mohit Vora and Neeraj Mahajan (Yahoo.com, alumni) B. Ramamurthy & Abhishek Agarwal

Markowitz Model • How is it data intensive? • Number of assets in the financial world is quite large • Historical data for each of these assets will easily overwhelm traditional databases • Would like real data on the volume of this problem. Any guess? • We are currently working with 500000 assets. B. Ramamurthy & Abhishek Agarwal

MPT • Fundamental assumption of MPT is that the assets in an investment portfolio cannot be selected individually. • We need to consider how the change in the value of every other asset can affect the given asset. • Thus MPT is a mathematical formulation of the diversification in investing, with the objective of selecting a collection of investment assets that has collectively lower risk than any individual asset. • In theory this is possible because different types of assets change values in opposite directions. • For example, stock vs bonds, tech vs commodities • Thus a collection will mitigate each other’s risks. B. Ramamurthy & Abhishek Agarwal

Technical Details • An asset’s return is modeled as normally distributed random variable • Risk is defined as a standard deviation of return • Return of a portfolio is a weighted combination of returns • By combining different assets whose returns are not correlated, MPT reduced the total variance of the portfolio. B. Ramamurthy & Abhishek Agarwal

Expected Return and Variance If E(Ri )is the return on the asset i and wi is weight of the asset i, then the total expected return on the portfolio will be E(Rp)= ∑wi* E(Ri) Portfolio Return Variance can be written as (σ p)2 = ∑∑ wi wj σi σj pij where pij is the co relation between the assets i and j. pij for i= j is 0. B. Ramamurthy & Abhishek Agarwal

MPT explained B. Ramamurthy & Abhishek Agarwal

Portfolio Combination • Compute and plot the expected returns and variance on a graph. • The hyperbola derived from the plot represents the efficient frontier. • Portfolio on the efficient frontier represents the combination offering the best possible return for a given risk level. • Matrices are used for calculation of efficient frontier. B. Ramamurthy & Abhishek Agarwal

Efficient Frontier • In the matrix form, for a given level of risk the efficient frontier is found by minimizing this expression. wT ∑w – q RT w • w represents weight of an asset in a portfolio( ∑wi=1) • ∑ is the co variance matrix • R is the vector of expected returns • q is the risk tolerance (0,∞) B. Ramamurthy & Abhishek Agarwal

What’s new? • Parallel processing using MapReduce. • Co-variance computation: from O(n2) to O(n) B. Ramamurthy & Abhishek Agarwal

Co-variance Matrix • We calculate how an asset varies in response to variations in every other asset. • We use the means of monthly returns of both the assets for this purpose. • This operation has to be done turn by turn for each asset. • In a traditional environment this is done via nested loops. • But this calculation is intrinsically parallel in nature. Each mapper i can calculate how the asset i varies in response to variations in all of other assets. • The input to each mapper is the current asset and the list of all assets. • Its output is a vector containing the variations of that asset with respect to the other assets. • The reducer just inserts the result of the map operation into the covariance matrix table. • As all the mappers execute parallel this gives us a run time of O(n). B. Ramamurthy & Abhishek Agarwal

Inverse of co-variant matrix • Using first principles: A-1= adjoint(A)*(1/determinant(A)) • This method requires the calculation of the transpose, determinant, cofactor, adjoint, upper triangle of a matrix. • Some of these operations like transpose can be easily implemented on Hadoop. • Others like determinant, upper triangle have data dependencies and therefore are not very suitable for a Hadoop like environment. B. Ramamurthy & Abhishek Agarwal

Inverse of Co-variant Matrix • Gaussian-Jordan Elimination • Gaussian elimination that puts zeroes both above and below each pivot element as it goes from the top row of the given matrix to the bottom. [AI] = A-1[AI]= IA-1 • Gaussian- Jordan elimination has a runtime complexity of O(n3). • For MR it requires two sets in tandem resulting poor performance. B. Ramamurthy & AbhishekAgarwal

Inverse of a square co-variance matrix • The third approach is to use single value decomposition (SVD). We use this approach for our implementation. By using the Single Value Theorem, the matrix A can be written as A= V ∑ Ut • where U and V are unitary matrices and ∑ is a rectangular diagonal matrix with the same size as A. • Using Jacobi Eigen Value algorithm, if A is square and invertible then the inverse of A is given by A-1 = U ∑-1 Vt B. Ramamurthy & Abhishek Agarwal

Inverse of co-variance matrix using MR • We need two map reduce task to implement this on hadoop. • In the first task, each map task receives a row as a key and a vector of all the other rows as its value. • This map emits block id and the sub vector pairs. • The reduce task merges block structures based on the information of the block id. • In the second task each mapper receives block id as a key and 2 sub matrices A and B as its value. • The mapper multiplies both the matrices. • As A will be a symmetric matrix At*A= A*At. The reducer computes the sum of all the blocks. B. Ramamurthy & Abhishek Agarwal

Expected Returns Matrix Using MR • The expected returns matrix can be easily built on the Hadoop platform. • Each mapper computes the expected return of a particular asset. • All these mappers can run in parallel giving us a run time of O(1) as opposed to run time of O(n) that we would get in the traditional environment. B. Ramamurthy & Abhishek Agarwal

Multiply Variance inverse with Returns matrix • Use block multiplication algorithm in MR framework. • The next step of making each negative entry of a row into a positive: MR makes a O(n2) algorithm into a O(n) algorithm. • Sort the entries in the row once again using MR B. Ramamurthy & Abhishek Agarwal

Simulation using Hadoop Framework • Besides the standard Hadoop package we also used two other packages- HBase[3][10] and Hama[4]. • Hama is a parallel matrix computation package based on Hadoop Map- Reduce. • Hama proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Column families of Hbase, and utilizes the 2D blocked algorithms. • We used HBase for storing the matrices. We use the Hama package for matrix multiplication, matrix transpose and for Jacobi Eigenvalue algorithm. • Computational the simulation proved that time did not increase linearly with the size of data. B. Ramamurthy & Abhishek Agarwal

Data-intensive Computing Case Study Area 2: Financial Engineering

Data-intensive Computing Case Study Area 2: Financial Engineering

Presentation Transcript

Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study

Data Intensive Computing at Sandia

Data-intensive Computing Algorithms: Classification

Petascale Data Intensive Computing

Scaling Up Data Intensive Science with Application Frameworks

An Introduction to Data Intensive Computing Chapter 2: Data Management

Wei Jiang Data-Intensive and High Performance Computing Research Group

Data-Intensive Computing with MapReduce

Data Intensive Computing

Data Intensive Computing at Sandia

CPS216: Data-Intensive Computing Systems Data Access from Disks

Cloud Technologies for Data Intensive Computing

Case Study: Euro Area From Financial to Sovereign Crisis and Back

Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools

Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing

Extreme Data-Intensive Scientific Computing

Cooperative Computing for Data Intensive Science

Middleware Solutions for Data-Intensive (Scientific) Computing on Clouds

Data-Intensive Computing Symposium: Report Out

CIS 500 STR Courses / Uoptutorial