1 / 41

Salvatore Miccichè

Observatory of Complex Systems. http://lagash.dft.unipa.it. Different levels of information in financial data: an overview of some widely investigated databases. Salvatore Miccichè. Dipartimento di Fisica e Tecnologie Relative Università degli Studi di Palermo.

dtrudy
Download Presentation

Salvatore Miccichè

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Observatory of Complex Systems http://lagash.dft.unipa.it Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè Dipartimento di Fisica e Tecnologie Relative Università degli Studi di Palermo GIACS Conference “Data in Complex Systems” - Palermo, 7-9 April 2008

  2. Overview of Databases Observatory of Complex Systems C. Coronnello F. Lillo R. N. Mantegna S. Miccichè M. Spanò M. Tumminello G. Vaglica EconophysicsBioinformatics Stochastic Processes

  3. Overview of Databases We will present an overview of some widely investigated financial and economic databases. Most financial databases include data about transaction prices, bid and ask quotes, volume of transactions. In some financial databases the information about the coded identity of the market members acting on the order book is also available. The economic databases we will discuss contain financial and economic information on over ten millions public and private companies operating in Europe and USA. What do we do with them?

  4. Overview of Databases: financial databases Econophysics is a recently established discipline whose main aim is that of modeling some of the stylized facts empirically observed in the study of financial markets. Why Physicists are interested in Financial Markets Financial market can be considered as model complex systems • Many Agents/Factors • interactions are not always clear/known (NO equations, Hamiltonians ?) G. Parisi cond-mat/0205297 Complex Systems: a Physicist's ViewPoint: “A system is complex if its behaviour crucially depends on the details of the system”

  5. Overview of Databases: financial databases • Methods of Statistical Physics can be applied : • Stochastic Processes (Brownian motion, superdiffusivity, power-law tails, long-range correlation,...) • scaling • Network theory, clustering techniques, random matrix, ... • Agent-based models, ... • ... Last but not least: There is a huge amount of data! 1995: 1 CD per month 2003: 12-13 CD per month

  6. Overview of Databases FINANCIAL databases: TAQ, Euronext, BI, TSE LSE, BME MTS

  7. Overview of Databases: financial databases Size Trade and Quote (NYSE) - 1995 6.3 Gb - 1996 8.1 Gb - 1997 13.5 Gb - 1998 20.0 Gb - 1999 27.1 Gb - 2000 63.1 Gb - 2001 approx 110. Gb - 2002 approx 180 Gb - 2003 approx 215 Gb Rebuild Order Book - LSE - 2002 19.5 Gb (now also 2004, 2005, 2006) OPEN BOOK - NYSE - 2002 approx 110 Gb MILANO (BI) - 2002 trades 2.14 Gb. - 2002 best quotes 2.43 Gb. EURONEXT - 2002 6.7 Gb. 1 Tb Tokio (TSE) - 2002 trades 1.6 Gb. MTS - 4/2003-3/2004 4.0 Gb.

  8. Overview of Databases: financial databases Transaction prices Quotes

  9. Overview of Databases: financial databases – transaction prices - synchronized To start with Given a price S(t) at time t, the price return r(t) is: ARB I TRAGE

  10. Overview of Databases: financial databases – transaction prices - synchronized Multivariate description COMOVEMENTS t=op-cl, 1995-2003

  11. Overview of Databases: financial databases – transaction prices - synchronized Multivariate description We are looking for a possible collective stochastic dynamics and/or links between price returns / volatilities of different stocks. PRICE RETURNS CLUSTERS Cross-Correlation Clustering Procedure based on a similarity measure: At any t  subdominant ultrametric distance.  Hierarchical Tree (HT) and Minimum Spanning Tree (MST). where ri are the price returns time series.

  12. Overview of Databases: financial databases – transaction prices - synchronized Multivariate description • Compare the dynamics of price returns of stocks traded at different exchanges-industry sector identification at different time horizon - sector dynamics - LSE and NYSE - are there common (stylized) facts ? Single Linkage Clustering Analysis At each step,when two elements or one element and a cluster or two clusters p and q merge in a wider single cluster t, the distance dtr between the new cluster t and any cluster r is recursively given by: dtr =min {d pr ,d qr} i.e. the distance between any element of cluster t and any element of cluster r is the shortest distance between any two entities in clusters t and r . MST construction (N-1) Planar Maximally Filtered Graph (3N-2)

  13. Overview of Databases: financial databases – transaction prices - synchronized Sinchronized data We consider: NYSE - the 100 most capitalized stocks in 2002.LSE - the 92 most traded stocks in 2002. Trades And Quotes (TAQ) database maintained by NYSE (1995-2003) RebuildOrderBook (ROB) database maintained by LSE (2002) We consider high-frequency (intraday) data. Transactions do not occur at the same time for all stocks. We have to synchronize/homogenize the data: NYSE: 5 min, 15 min, 30 min, 65 min, 195 min, 1 day trading time 6h30’ LSE: 5 min, 15 min, 51 min, 102 min, 255 min, 1 daytrading time 8h30’

  14. Overview of Databases: financial databases – transaction prices - synchronized The set of investigated stocks LSE 92 stocks01 Technology 402 Financial 2003 Energy 304 Consumer non-Cyclical 1205 Consumer Cyclical 1006 Healthcare 607 Basic Materials 508Services 1909 Utilities 610 Capital Goods 511 Transportation 212 Conglomerates 0 NYSE 100 stocks01 Technology 802 Financial 2403 Energy 304 Consumer non-Cyclical 1105 Consumer Cyclical 206 Healthcare 1207 Basic Materials 608 Services 2009 Utilities 210 Capital Goods 611 Transportation 212 Conglomerates 4

  15. Overview of Databases: financial databases – transaction prices - synchronized Daily data: SLCA – hierarchy & topology LSE day NYSE day High level of correlation High level of correlation

  16. Overview of Databases: financial databases – transaction prices - synchronized Daily data: PMFG LSE day NYSE day

  17. Overview of Databases: financial databases – transaction prices - synchronized 5-min data: SLCA – hierarchy & topology LSE 5-min NYSE 5-min FINANCIAL 04 out of 20 SERVICES 02 out of 19

  18. Overview of Databases: financial databases – transaction prices - synchronized 5-minute data: PMFG LSE 5-min NYSE 5-min

  19. Overview of Databases: financial databases – transaction prices - synchronized Conclusions • The system is more hierarchically/topologically structured at daily time horizons conferming that the market needs a finite amount of time to assess the correct degree of cross correlation between pairs of stocks. • Financial and Energy seem to be structured even at a low time horizon (LSE more than NYSE). overnight

  20. Overview of Databases: financial databases – transaction prices – tick-by-thick A possible use of tick-by-tick data • The “extreme events” we consider will be related with the first crossing of any of the two barriers. • The Mean Exit Time (MET) is simply the expected value of the time interval 2L Financial Interest: the MET provides a timescale for market movements. GE stock dashed black=original data magenta = shuffle returns only

  21. A possible use of tick-by-tick data Overview of Databases: financial databases – transaction prices – tick-by-thick QUOTES Time between consecutive quotes

  22. Overview of Databases: financial databases – bonds Another database: MTS These are data of bonds traded in the European markets and managed by the MTS Group firm, which is based in Italy. The bonds we have considered are those continuously traded In Italy in the whole year from April 2003 to March 2004.

  23. Overview of Databases: financial databases – order book data Order book data Order book data allows to follow the details of price formation in a financial market The state of the complete order book can be visualized at any period of time by using a schematic representation

  24. Overview of Databases: financial databases – order book data Order book data: time evolution The real behavior in a short time for a normal stock - sell limit orders - buy limit orders ○ sell market orders pricex100 x buy market orders time (s)

  25. Overview of Databases: financial databases – order book data Order book data: time evolution Representation of the order book focusing on the time dependence of order flow (the plot refers to a stock traded at London Stock Exchange)

  26. Overview of Databases: financial databases – order book data Order book data: time evolution A very special day (20 Sept 2002)

  27. Overview of Databases: financial databases (Coded) Identity

  28. Overview of Databases: financial databases – order book data Tick-by-tick data, volume and identity In the LSE and BME databases the information about the coded identity of the market members (brokerages) acting on the order book is also available For LSE we have got these data under a special confidentiality agreement: e.g. people who uses these data MUST be traceable! For BME the identity is transparent in the market.

  29. Overview of Databases: financial databases – order book data Tick-by-tick data, volume and identity Inventory variation: the value (i.e. price times volume) of an asset exchanged as a buyer minus the value exchanged as a seller in a given time interval . i=1, …, 69 (BBVA) most active sign +1 for buys -1 for sells volume price (2001-2004) BBVA, TEF, SAN, REP In this talk, we focus on t = 1 trading day

  30. Overview of Databases: financial databases – order book data Tick-by-tick data, volume and identity BBVA 2003 6969 Inventory variation correlation matrix obtained by sorting the firms in the rows and columns according to their correlation of inventory variation with price return ordering

  31. Overview of Databases: financial databases – order book data Tick-by-tick data, volume and identity “noisy” firms “reversing” firms (contrarians traders) “trending” firms (momentum traders) A brokerages/firms classification by considering the correlation between its inventory variation and the price return of the traded stock;

  32. Overview of Databases: financial databases – order book data Tick-by-tick data, volume and identity Reversing - Negatively correlated with price return - Large and small institutions - Typically acting on a short time scale, reverting continuously their position in the market - Their trading activity tends to be homogeneous in time Trending - Positively correlated with price return - Large institutions - Acting on a long time scales, splitting large orders to build portfolio position by minimizing price impact - Their trading activity tends to be localized in time Noisy - Poorly correlated with price return - Large and small institutions

  33. Overview of Databases: economic databases ECONOMIC databases: Amadeus, Compustat INPS

  34. Overview of Databases: economic databases AMADEUS is a comprehensive, pan-European database containing financial information on over 10 million public and private companies in 38 European countries. Standardised annual accounts (for up to 10 years), consolidated and unconsolidated, financial ratios, activities and ownership for approximately 9 million companies throughout Europe, including Eastern Europe. A standard company report includes: 24 balance sheet items, 25 profit and loss account items and 26 ratios, descriptive information including trade description and activity codes (NACE 1, NAICS or US SIC can be used across the database), ownership information. A news module contains information from Reuters’, Dow Jones, the FT as well as M&A news and rumours from our own ZEPHYR. AMADEUS also contains security and price information and links to an executive report with integral graphs plus a report comparing the financials of the company’s default peer group.

  35. Overview of Databases: economic databases Logarithmic growth rate The growth of a firm was initially describes by Gibrat in 1931. Its model regards the logarithmic growth rate where S(t) is some proxy: total asset, employees, sells, revenue turnover, … • The Gibrat Model is based on: • Law of proportionate effects: ri(t) is independent on the initial size of the firm • ri(t) and rj(t) are un-correlated By making use (i) of the Central Limit Theorem and (ii) of the additional assumption of indepenence, one can show that the logarithmic growth rate show be log-normally distributed.

  36. Overview of Databases: economic databases Log-normal  laplacian  what else? All data are aggregated IC fixed AMADEUS database

  37. Overview of Databases: economic databases Data allow disaggregation in terms of economic sectors of activity Z-transform within sectors

  38. Overview of Databases: economic databases Data allow disaggregation year-by-year

  39. Overview of Databases: economic databases Exploring the role of correlation between firms Shuffling experiments

  40. Overview of Databases: economic databases Conclusions The availability of accurate databases allows for the inspection of the role that different variables play in the system.

  41. The End micciche@unipa.it http://lagash.dft.unipa.it

More Related