"Predicting randomness” experiment: new perspectives and results from the test run

"Predicting randomness” experiment: new perspectives and results from the test run NikitaStepanov, ITEP/CERN/GRC, 19.05.2004 N.Stepanov, CERN, 19.05.2004

Outlook • Introduction (what is it? And why? And how?) • Status (hardware and software) • Data sources (description, “random” properties) • Selected results from the test run • More (more practical results) • Conclusion (new goals and perspectives) N.Stepanov, CERN, 19.05.2004

“Predicting randomness” experiment. Motivations, primary goals etc “Predicting randomness” experiment was proposed about one year ago with the purpose of investigating the randomness of bit sequences generated by different “random” sources so as to elucidate if such sequences may be locally predictable in statistically significant sense by the human being or “intelligent” computer program. The experiment was designed to be the large scale internet initiative (game) which could allow thousands of participants around the world to be involved. Why game? The technical approach we proposed can be considered a formal analogy with a game involving many players versus a “fair” market. Any player observes the past data represented as a 1D discrete RW. At any step the player is allowed to bet on the direction of the future RW trajectory continuation or, according to the real market game terminology, to open “UP” or “Down” position. He (she) is able to close the open position in any time in future obtaining the “profit/loss” equals to the number of steps the RW makes in the predicted direction minus the number of steps in opposite direction. N.Stepanov, CERN, 19.05.2004

“Predicting randomness” experiment. Motivations, primary goals etc Finally, the prediction ability of a given experiment participant can be quantified by analyzing the whole sequence of “profit/losses” generated by him. This approach closely resembles the real market game (in the no transaction costs regime) and allows us to use the commercial trading system (kindly provided by Dukascopy Trading Technologies Corporation (Dukascopy) as a basis for our experiment. This analogy also motivates the nick name for the experiment adopted at the time of proposal: “Deep Trader” See for details: the paper “Predicting randomness” available at http://grc.dukascopy.org/ Primary goals: • To test experimentally the validity of the efficient market hypothesis • To investigate the behavior patterns of human brain when making the trading decisions 3) (the ultimate goal) to detect and quantify some “anomalous” human brain ability to predict the future for the “truly” random processes. N.Stepanov, CERN, 19.05.2004

Present status: software Data sources Remote Client (Java applet) Java server Data processor Data processor Stress tests indicate that a single Java server can serve up to 700 – 1000 client applications simultaneously. The system is easily clusterizable A special instance of Java server is instantiated to serve the experiment needs Central Oracle DB N.Stepanov, CERN, 19.05.2004

Present status: hardware The central computing facility is a scalable multilevel cluster. At present the system comprises 28 CPUs with total operative memory of 30 GB and 1 TB of HDD space, and it services approximately 8 million requests daily. N.Stepanov, CERN, 19.05.2004

Data sources (bit sequence generators) • Design constraints: 1) standard simple data representation (analysis) 2) psychologically pleasant update frequency • Each data source generates +1/-1 signal every 10 sec • The resulting sequence of bits is represented as an integer valued RW (next value = old value + new generated bit) to allow the standard graphical representation and technical analysis N.Stepanov, CERN, 19.05.2004

EURRAND Bit source driven by the market EUR/USD quote according to the following simple algorithm: Candle := 10 sec aggregation of the EUR/USD stock tick data It has: opening, closing, high, low prices Diff = candle.closingPrice(T) – candle.closingPrice(T-10 sec); If (Diff > 0) { newBit = +1; } else if (Diff < 0) { newBit = -1; } else { newBit = pseudoRandom.getBit(); }; Note: the sequence becomes pseudo random on weekends, otherwise, it more or less mimics the behavior of EUR/USD N.Stepanov, CERN, 19.05.2004

GEIGER The Geiger counter (standard RM-60 device by AWARE Electronics Company [ http://www.aw-el.com ] ) registers natural radioactive background with average count frequency about 8 counts/10 sec. It is connected via serial port to the PC, where a simple program: • supports a 10 sec. time interval slots; b) registers any counter signal appearance with accuracy around 0.1 msec and c) generates +1 / -1 bits according to the following algorithm: At the end of each 10 sec time slot (last 9999-th msec.), the program checks first whether at least one signal is registered during the current 10 sec slot. If “yes”, then it takes the millisecond of arrival of the last signal and generates +1 if this msec. is even and -1 otherwise. In (rare) case when there are no signals during this 10 sec slot, the program generates a pseudorandom bit. N.Stepanov, CERN, 19.05.2004

PSEUDO RANDOM • Matsumoto’s twisted generalized shift register generator TT800 as described in his article published in ACM Transactions on Modeling and Computer Simulation, Vol. 4, #3, 1994, pp. 254-256. Our C++ implementation is based on the C code by M. Matsumoto. • CA (Rule 30) pseudo random generator patented by S.Wolfram (thanks to his kindly permission). Despite the very simple and deterministic evolution rule, this CA random generator systematically passes new more and more sophisticated statistical tests. (see, i.e. S. Wolfram “A new kind of science”, Wolfram Media, Inc, 2002) N.Stepanov, CERN, 19.05.2004

Testing “random” source properties: statistical test In the statistical approach the randomness is treated as a probabilistic property. Any statistical test is formulated to test the hypothesis (H0) that the sequence being tested is random. To construct the new test, one needs to: 1) select a certain property of “truly” random sequence; 2) estimate the relevant statistics, i.e. the distribution of the possible outcomes of the test under the assumption of randomness; 3) fix a certain significance level a (typically 0.01 or 0.001) to accept/reject H0. For a given generated sequence the statistical test calculates a certain P-value, which can be interpreted as a probability that a “truly” random sequence will produce the result which is “less” random then the sequence being tested. Say, P(s) < a = 0.01 indicates that about 1 “truly” random sequence out of 100 will be rejected by this test, thus the test outcome P(s) > 0.01 allows one to accept a given sequence as random at 99% confidence level. N.Stepanov, CERN, 19.05.2004

statistical test: simplest common example Frequency (Monobit) test The purpose of the test is to determine whether the number of 1 and -1 in a sequence are approximately the same as would be expected for a truly random sequence. Test variable S = abs( S Si) / sqrt(N), where Si is the I-th bit and N is the total length. The distribution S/sqrt(2) has to be half-normal, therefore, P(S) = erfc(S/sqrt(2)) where erfc is a complementary error function P(S) < 0.01 rejects H0 at 99% confidence level N.Stepanov, CERN, 19.05.2004

The battery of statistical tests • Problem: there are an infinite number of possible tests. No specific finite set of tests is deemed “complete”. Negative example: the binary extension of p passes all known statistical tests. • The “practical” recipes: 1) use the “massive” and representative battery of statistical test to estimate the random properties. 2) keep in mind that results have to be interpreted with a certain “grain of salt”. Useful ref.: http://csrc.nist.gov/rng/ (Nist Statistical Test Suite explained) N.Stepanov, CERN, 19.05.2004

The battery constructed: • Frequency (monobit) test • 4 Block frequency tests for the block sizes 4, 8, 16, 32 • Runs test • 4 Longest run tests for the block sizes 8, 128, 512, 1000 • 50 Non overlapping template matching tests for the different bit patterns • 12 Serial test for the template length 3-14 (each test includes in fact 2 different tests) • 12 Approximate entropy tests for the template length 2-13 • Random excursions variant test (includes indeed 18 tests for the different RW states from -9 to 9) 85 (114) stat tests in total, aimed to detect the absence of short term correlations in bit sequence N.Stepanov, CERN, 19.05.2004

Battery applied • EURRAND: passes 33 (0.338) fails 52 (0.612) • GEIGER: passes 39 (0.459) fails 46 (0.541) • PSEUDO 1: passes 85 (1.000) • PSEUDO 2: passes 84.5 (0.994) fails 0.5 (0.006) (fails one of two tests included in the serial test for the template length 14) Q? Why GEIGER is so regular? N.Stepanov, CERN, 19.05.2004

Selected results and lessons from the test run • Final rules of the game: 1) Each participant initially receives the “starting capital” of 10000 units; 2) He is allowed to open just one “1 lot” position on every “stock” simultaneously: one step reward is +/- 1 unit. 3) 3 stocks are available: EURRAND, GEIGER and PSEUDO. The trading results for each stock are analyzed separately. • Main lesson: EURRAND and GEIGER stocks are indeed predictable both by human and AI predictors. It is, however, the “tantalizing” exercise for a human being to demonstrate the statistically significant performance: many noisy deals, dropped “bad” portfolios, lengthy learning period, etc (like in the live trading). It seems to be that the main obstacle is the human psychology. Every successful run appears to be quite hard and concentrated work for several weeks. N.Stepanov, CERN, 19.05.2004

Selected results and lessons from the test run • It looks almost meaningless to analyze the human predictions statistics “as a whole”. The data are too noisy and spoiled by the “lazy” and “frustrated” participants. It is more appropriate to concentrate on the analysis of the results of an every single participant. • AI predictors “easily” outperform human ones on the predictable stocks EURRAN and GEIGER. The results for the PSEUDO “stock” are less apparent: all tested AI predictors failed definitely, as there are no (almost no) local correlations to exploit. At the same time, some “obstinate” humans “stubbornly” kept staying at significance level of about 1 – 1.5 for a long “trading” period. We can not report any positive results, but …. perhaps, there is something funny! N.Stepanov, CERN, 19.05.2004

Main practical result • EURRAN stock seems to be “easily” predictable. Does it mean that the efficient market hypothesis is not valid? Directly not, because EURRAN is just the surrogate stock, derived from the real EUR/USD. During the last year we have made a lot of other investigations trying to answer this question and now we can definitely answer “yes, it is not valid”. Indeed, the market data are locally correlated and contain the exploitable patterns. We are definitely not alone and not the first ones. There is a plenty of publications on this subject. We have just provided the practical evidence in order to convince ourselves and now we are trying to answer yet another practical question: is there enough predictability to beat the market in live trading? N.Stepanov, CERN, 19.05.2004

Performance estimators • Simple robust estimator valid for the large number of predictions: Signif = E(p) / sqrt(D(p)/N ); where E(p) – average “profit”, D(p) – “profit” dispersion. • More “descriptive” estimator based on subsequence selection (proposed by A.Duka) Let S be a bit sequence. For a given predictor P: 1) Select all segments of S which corresponds to P “positions” history of P (each position (prediction) has starting time and closing time); 2) In each segment multiply all bits by the prediction direction (+1 or -1); 3) Construct new subsequence SP from selected segments and represent as 1D RW. Then the deviation R0 of the RW end point from 0 may be the measure of the predictor performance. As a quantitative predictor performance estimate, one can use, i.e., the probability that the end point of the realization of “true” RW of the same length N will has the deviation R which is equal or above R0. For large N next approximation is valid: P(R >= R0) ~ ½ * erfc(R0/sqrt(2*N)) N.Stepanov, CERN, 19.05.2004

Most “obstinate” human predictor • Nick name: forecast • Active period: 7.11.2003 – 16.11.2003 (9 days of hard work!) • Number of positions (predictions): 1120 (610 “down”, 510 “up”) • Total “profit”: 414; average: 0.37; Significance = 2.85 • Average position length: 20 steps • Effective length: 22400 steps; P(R >= 414) = 0.00279 General comments: the learning phase roughly takes one have of the total run – the statistics for the second half looks significantly better ( 1-st half <P> ~ 0.23; 2-nd ~ 0.45); the performance is not stable locally, one can see some periods of “frustration”. N.Stepanov, CERN, 19.05.2004

AI predictors: recurrent neural network Recurrent NN Feed forward NN Output layer Input layer Major difference: RNN’s often exhibit very reach dynamics. FNN just maps the input to the output, RNN can run forever being triggered once by a single input signal. RNN’s were found especially suitable for the time series prediction, because they can generate some sort of long term memory. The price is indeed high: BackProp is still applicable, but has to be run formally forever for a single input because of the feedback connections. N.Stepanov, CERN, 19.05.2004

RNN: implementation details Particular RNN architecture used for all prediction tests: Single input node, 6 hidden nodes (fully connected), one output node (6 + 6 + 6 + 36 + 1 + 1 = 56 real parameters) ; Activation function: symmetric sigmoid. RNN was trained to predict next bit. A simple “auto” trader was designed to interpret the “recommendations” of RNN predictor in order to make results comparable with those of other (human) predictors. Train sequence: 3000 bits Test sequence: (next) 3000 bits 10 different runs for randomly chosen 6000 bits subsequences for each data source (for the EURRAN the weekends are excluded) N.Stepanov, CERN, 19.05.2004

RNN performance: EURRAN Generation: 50 Performance on the training set: Number of “deals”: 1672 Total profit: 580 Number of “UP” deals: 835 Number of “DOWN”: 837 Average profit/deal: 0.347 std: 1.085 Significance: 13.07 Total active length: 2998 Average position length: 1.793 Performance on the validation set: Number of “deals”: 1670 Total profit: 596 Number of “UP” deals: 834 Number of “DOWN”: 836 Average profit/deal: 0.357 std: 1.032 Significance: 14.14 Total active length: 2998 Average position length: 1.795; P(R >= 596) < 10**(-27) N.Stepanov, CERN, 19.05.2004

RNN performance: GEIGER Generation: 150 Performance on the training set: Number of “deals”: 1097 Total profit: 436 Number of “UP” deals: 548 Number of “DOWN”: 549 Average profit/deal: 0.397 std: 1.940 Significance: 6.79 Total active length: 2996 Average position length: 2.731 Performance on the validation set: Number of “deals”: 1129 Total profit: 358 Number of “UP” deals: 564 Number of “DOWN”: 565 Average profit/deal: 0.317 std: 1.763 Significance: 6.04 Total active length: 2998 Average position length: 2.655 P(R >= 358) ~ 1.42 * 10**(-9) N.Stepanov, CERN, 19.05.2004

RNN performance: PSEUDO RANDOM Nothing was captured in 1000 generations! Performance oscillates around 0. Results for 350 generations: Performance on the training set: Number of “deals”: 1614 Total profit: 338 Average profit/deal: 0.209 std: 1.400 Significance: 6.01 Total active length: 2998 Average position length: 1.858 Performance on the validation set: Number of “deals”: 1639 Total profit: 9 Average profit/deal: 0.005 std: 1.319 Significance: 0.167 Total active length: 2995 Average position length: 1.827 N.Stepanov, CERN, 19.05.2004

Human being and RNN: similarity and difference • RNN definitely outperforms the human predictors on the simple “predictable” sequences. The trading strategies are quite different – the average position length for RNN predictor is about a factor of 10 shorter, also, the profit/loss fluctuations are much high for a human predictor. It seems to be, that it is psychologically difficult for the human being to change (reverse) his (here) prediction opinion. Does it means that the human psychology is always playing against the trader? • The asymptotic performance is indeed rather similar – it looks like both type of predictors can be learned to exploit almost all useful regularities in the sequence generated by the GEIGER source. Average RNN “profit”/prediction ~ 0.31 Human ~ 0.25 – 0.37 • Computational complexity: RNN complexity is perfectly controllable. The complexity of the human brain is of course much high, however, it does not mean that the part of the brain computer allocated for the particular prediction task has much high complexity than RNN predictor. The intriguing question arises: Are the computational complexities of these predictors comparable? N.Stepanov, CERN, 19.05.2004

Towards the practice: real market game in zero transaction cost (ZTC) regime • Weak efficient market hypothesis (WEMH): Technical analysis is useless. Famous G.P. Morgan prediction: “Prices will fluctuate”. • Technical analysis := any approach using just the information available from the time series itself. It can be quite sophisticated: different data representations (candles, P&F, Fourier transforms, wavelets, etc); technical indicators; patterns, correlation analysis etc; but no fundamental analysis. Any predictor trying to deduce the future behavior from the known past has to generate the asymptotically zero-sum game being applied to the real market even in ZTC regime. Is it so hopeless? It can be shown “easily” that indeed WEMH is wrong. N.Stepanov, CERN, 19.05.2004

A.Duka test portfolio 100 trading days 768 deals (~7.7 daily) Average profit: 130.24 $ Std: 761.2 $ Sharpe ratio: 0.171 Significance: 4.74 Varity of stocks Varity of different strategies. See for details: http://www.dukascopy.com N.Stepanov, CERN, 19.05.2004

AITraders • Keeping in mind the evident success of machine predictors applied to the “trading” on the artificial stocks, one can think about more practical applications to the real market. • There is a plenty of the modern promising AI techniques. The basic postulate (quite questionable indeed): any detected pattern can be exploitable for some short period in future; then it will disappear or even reverse to its negation. My particular favorites are adaptive multi agent systems. The basic ingredients of the “cooking”: evolutionary incremental learning; (almost) zero number of “hardwired” parameters; easy transition from the majority to minority; easy forgiveness of the past successes; soft implementation of the expert knowledge; inhomogeneous agent committee N.Stepanov, CERN, 19.05.2004

AITraders • Keeping in mind the comparable success of machine predictors applied to the “trading” on the artificial stocks, one can think about more practical applications to the real market. • There is a plenty of the modern promising AI techniques. The basic postulate (quite questionable indeed): any detected pattern can be exploitable for some short period in future; then it will disappear or even reverse to its negation. My particular favorites are adaptive multi agent systems. The key ingredients of the “cooking”: evolutionary incremental learning; (almost) zero number of “hardwired” parameters; easy transition from the majority to minority; easy forgiveness of the past success; soft implementation of the expert knowledge. N.Stepanov, CERN, 19.05.2004

AITraders: ZTC example 1 N.Stepanov, CERN, 19.05.2004

AITraders: ZTC example 2 N.Stepanov, CERN, 19.05.2004

From ZTC to real life: is it possible to beat the real market on the systematic basis? • In ZTC regime everything looks perfect. • Does it means that the market is beatable in the real life, i.e. when the realistic transaction cost is taken into account? Many strategies become losers now or at least demonstrate rather modest performance. It is especially true for the aggressive “scalping” strategies trying to exploit short term correlations. • But is it completely hopeless? N.Stepanov, CERN, 19.05.2004

Dukascopy trading contest(alternative “Hard” Predicting Randomness experiment?) Runs since October, 2003 on the monthly basis Already involves about 1200 participants (~150 monthly) Competition rules are simple: Each trader receives (the virtual) 50000 $ capital and tries to beat the real market in CFD trading. The trader with the largest final balance wins the monthly competition cycle. The trading conditions are identical to those for the live trading (spreads, commissions, margins etc) N.Stepanov, CERN, 19.05.2004

Dukascopy trading contest: statistics is available now Already after a brief analysis of the contests statistics one gets a very strong impression that a certain quite stable subgroup (say, about 10%) demonstrates amazing performance (i.e. prediction ability) Recently Dukascopy kindly opens the access to the (non private) part of the competition statistics for the scientific analysis N.Stepanov, CERN, 19.05.2004

Dukascopy trading contest: April results N.Stepanov, CERN, 19.05.2004

Dukascopy trading contest: March results About 600 Positions!!! N.Stepanov, CERN, 19.05.2004

AITraders in real regime: example 1 N.Stepanov, CERN, 19.05.2004

AITraders in real regime: example 2 N.Stepanov, CERN, 19.05.2004

Summary -1- • We are ready to conduct experiment in full scale. The starting date is “today”. We are planning 1 year of running. For those who wants to play against the artificial market, the registration procedure is similar to the standard registration of the demo/live account: 0) enter the Dukascopy web page 1) Fill the Dukascopy live trading registration form, typing “experiment” instead of bank attributes 2) You will receive the login/password to access the deal station 3) To launch the deal station: follow the link “CFD client entry” from the main Dukascopy page; type your login/password in the proper fields; then push the button “Enter DDS” That’s it. You will be allowed to “trade” just the artificial quotes located in the special quote folder “physics”. Your trade will be commissions free and subjected to the rules accepted for the experiment. N.Stepanov, CERN, 19.05.2004

Summary -2- • “Grand prize”: 5000 $ to the live Dukascopy trading account (one can be withdraw without any live trading :) Future winner: experiment participant which will demonstrate the best performance predicting the most difficult PSEUDO “market”. Necessary conditions: 1) Result has to be demonstrated in the real-time regime 2) At least 1000 predictions (positions) 3) Statistical significance >= 3 Decision date: 1 June 2005 N.Stepanov, CERN, 19.05.2004

Summary -3- GRC is proposing new research initiative: the main goal is the development of the next generation trading/analysis software “Smart trader machine” (STM) which will help trader to survive in modern hostile market environment The main design patterns: STM will be designed in such a way, that any useful pattern, approach, algorithm etc can be easily incorporated. Most likely STM will be able also to develop its own new knowledge and algorithms and support as well advising and auto trading regimes. Another desirable (and, in principle, realizable) feature of this new software may be the possibility to learn and adopt to any concrete human trader “stile”. GRC opens the research grand program for the next year (2005) to support this new initiative. Any (reasonable) and motivated approaches are welcome. N.Stepanov, CERN, 19.05.2004

Very last comment: For the developers of new smart trading systems Dukascopy has presented recently two new services: 1) Customized data feed which allows one to integrate into his application a reliable source of real-time and historical data. 2) Auto execution service providing the execution of trading orders (in demo regime) generated from user application. Both are realized using MSFT Web Service technology, which allows one straightforward integration in virtually any programming language. Both services are free of charge for GRC participants N.Stepanov, CERN, 19.05.2004

"Predicting randomness” experiment: new perspectives and results from the test run