Programming Neural Networks and Fuzzy Systems in FOREX Trading

Programming Neural Networks and Fuzzy Systems in FOREX Trading Presentation 2 Balázs Kovács (Terminator 2), PhD Student Faculty of Economics, University of Pécs E-mail:kovacs.balazs.ktk@gmail.com Dr. Gabor Pauler, Associate Professor Department of Information Technology Faculty of Science, University of Pécs E-mail: pauler@t-online.hu

Content of the Presentation • Learning in Artificial Neural Networks: • Unsupervised learning • Hebbian associative rule • Neccessary topology: Bidirectional Associative Memory, BAM • Activation of BAM • Algorithm of Hebbian learning rule • Example, Criticism, Testing, Application in time series: TAM • Supervised learning • Delta learning rule • Neccessary topology: Perceptron • Activation of Perceptron • Algorithm of Delta learning rule • Application, Testing, Criticism • Backpropagation learning rule • Neccessary topology: Multi-layer perceptron • Activation of Multi-layer perceptron • Algorithm of Backpropagation learning rule • Applications and Testing: • Character recognition and its biologic analogy, • Head&Shoulders • Advantages, Limitations • Validation of Artificial Neural Networks • Unsupervised learning • Supervised learning: Simple validation and Crossvalidation • Sample application: Control Function Simulator • Home Assignment 2: Character design in Control Function Simulator • References

Learning in Artificial Neural Networks: Basic definitions • Learning (Tanulás): • Modifyingtheinitially random wijsynapticweights of thenetworktogetdesired output frominputsduringneuralactivation • Sampledatabase/TeachingSample/PatternSet (Tanítóminta): • yj={yij i=1..n, yoj o=1..O}j=1..mvectors of previouslyobservedoccourencesofi=1..n input ando=1..O outputvariables • gLearningRate (Tanulási ráta): • [0,1] continousvalueexpressingtheaffinity of the networktolearnnewinfooverwriting old one • Highleraningratemeans more fast, but less stablelearning, butweakerlongtermmemory: g=1means no memoryatall, justrememberingthelastsample (eg. Alzheimer-disease) • Lowlearningratemeans more stablememory and learning, but itmakeslearningunreasonablyslow: g=0no learning, justmemory (Eg. Elephant) • LearningEpoch (Tanítási ciklus/ epoch) • Toresolvethe dilemma above, instead of usinghighlearningrate, weusesmaller, butrepeatlearningthesampledatabaseine=1..Eepochs •  Itwillresultinmuch more highcomputationrequirementthanactivation •  Butenables more stabilelearning • UnsupervisedLearning (Nem felügyelt tanulási módszerek): • Here wedonothaveo=1..Odesired output variablestoi=1..ninputs • Instead of thatweexpectthenetworktorecognizeSimilarity/Proximity (Hasonlóság) basedGroups/Clusters (Klaszter) of input data • And fit a new input sampletothe most similargroupas output of thenetwork • (Eg. FOREX-brokersmayhave more, quitewell-separatedgroupsbyRiskaversion (Kockázatkerülés) and Agressivity (Agresszivitás) havilyinfluencingtheirtradingbehavior. Weexpectthenetworktorecognizethesegroupsinlearning. When a newbrokercomeswithhisaversion and agressivityas input, thenetworshouldgive back tipicalvalues of the most similargroupas output)

Learning in ANN: Unsupervised Learning: Hebbian Rule wki wik wkk wii xkt+2j xkt+1j xkt+2j bi bi Si(xit) Si(xit) xkt+1j ui ui Avgi Avgi sk(xkt+1j) ai ai xit xit li li sk(xkt+1j) sk(xkt+1j) ×wki Sk Sk ×wii si(xi1j) sk(xkt+1j) ×wkk ×wik si(xi1j) • HebbianRule of AssociativeLearning (Hebb-féle asszociatív tanulási törvény): • Necessarynetworktopology: Bidirectional Associative Memory, BAM (KétirányúAsszociatívMemória): • Single neuron fieldwithi,k=1..nneuronsfullyintra-connectedwithikbidirectionalconnections • Aggregation of neurons is additive, and theyhavesigmoidsignalfunction • Neuronscan be orderedspatiallyin1or2dimensionalgrid • This is the most generaltype of network, othertopologiesarespecialsubcasesofthis • Activation of BAM network: • As BAM is a self-feedbacknetwork, itsactivationinitiatedbyyijinputs is cyclicint=1..Ttimeperiods: • STEP 1:Load an input sampletofield: xi1j = yij,i=1..n, j=1..m • STEP 2:Itresultsincyclic weightedsignalaggregation: xkt+1j = Si(wik×si(xitj))/ Si(wik), (2.1) i,k=1..n, t=1..T, j=1..m • Therecan be 2 outcomes: • Convergestostabilestateornot This symbol means the feedback on the I/O diagram of BAM network:

Learning in ANN: Unsupervised: Hebbian: Learning Algorithm sk(ykj) ykj sije×skje skje×skje si(yij) Dwkkje Dwikje si(yij) sk(ykj) ykj sije×sije sije×skje Dwikje Dwiije yij yij • In e=1..E epochsj=1..msamples are cyclicly loaded in membran values of the neurons, and the network reacts with its activation (see: ) • Then it modifies (see: ) wikjeweight of ik connec-tion of neurons i and k (which was initially random) • With the multiplication (see: ) of sijeandskjesignals (see: ) of neurons i and k, proportionally withgike learning rate: FORe=1..EDO /Epochs FORj=1..mDO /Samples wik(je+1) = gike×sije×skje + (1-gike)×wikje, i,k=1..n Where: (2.2) sije = si(yij), skje = sk(ykj) – signals of neurons i and k at sample j yij , ykj– sample j loaded into neuron membranes i and k NEXT j /Next sample NEXTe /Next epoch • Multiplication of signals means some kind of AND operator between them: the result is only nonzero if BOTH of them are nonzero (eg. In Pavlov’s famous experiment the dog learnt that food AND bell usually comes together, so later when the bell rang, it expected to have food) • Thats why multiplication describes association • BAM weights are analogues of Pearson-correlation coefficients (Korrelációs együttható) in statistics

Learning in ANN: Unsupervised: Hebbian: Application example • Excel file MatrixSim.xls shows a simulation of a simple BAM-based Character Recognition(Karakterfelismerő) • Character samples are loaded in membrane values of a 16-neuron BAM field ordered in 4×4 pixel matrix (see oblique blue pixels related to neurons) • Signals of neurons are weighted through full intraconnected synapses showed by 16×16 quadratic matrix (yellow shades are proportional to weights) • Weights are totally random initially. But then picture of characters „A”, „B”, „C”, „D” are trained during 100s of epochs changing weights •  After training, when the net-work gets picture of a blurred or partially missing „A”, during some iteration it will converge to sam-ple „A” learnt and stabilize there •  BAM wastes computing resources tremendously: theoretically with 16 neurons it could learn 216 images, but in the reality it can learn 4: more images taught will result in confused recognition. Why?

Learning in ANN: Unsupervised: Hebbian: Testing measures • Activation Convergency (Aktiváció konvergenciája): how quickly BAM converges to a stabile response when sample j is loaded gives a quality measure about learning:yjsamples learnt( ) create more and more deep „walleys” on the surface ofLjapunov Energy Function (Ljapunov-energiafüggvény) of BAM: Lje= SiSkSi(xi)×Sk(xk)×wik (2.3) • The better the sample is learnt, it creates more deep and distinct walley and new xj input vectors to recognize ( ) „roll down” faster into the nearest walley of samples learnt • Recognition Efficiency (Helyesen felismert minták aránya): % of correct recognition of yj samples learnt from input vectors. AtCrosstalk (Összetévesztés más mintával) input rolls down in the walley of a nearby wrong sample • Learning Convergency (Tanulás konvergenciája): distances betweenxj input vectorsandy*recognized sampleare sum- med up into SSEjeaggregated error in each e=1..Eepoch As learning converges, it should gradually decrease under predefinedSSE* error treshold during epochs: SSEje= Si(yji-yi*)2 (2.4)

Learning in ANN: Unsupervised: Hebbian: Application in time series: TAM bi bi bi bi wki Si(xit) Si(xit) Si(xit) Si(xit) wik wki wki ui ui ui ui wik wik wki wki wki wik wik wik wii wii wii wii ai ai ai ai xit xit xit xit li li li li Sk Sk Sk Sk • Temporal Associative memory, TAM(Temporálisasszociatívmemória):bothinstockexchange/FOREX and intechnicalfield (eg. invoicerecognition) therecan be a problemrecognizingcomplexwaveformsresultingfrominteraction of differentfrequencymovementsembeddedintoeachother, and Forecasting (Előrejelez) it, learning a historicsampledatabase and makingExtrapolation (Előrevetítés) inthefuture • TAM is a BAM withonedimensional neuron field • Gettingnelementtimewindowsfrom an m elementhistorictime series sample • The windowslidingonitforwardone byoneint = 1..Ttimeperiods • Duringtraining, weslidethewin- dowon series ine=1..Eepochs • Inactivation, weletthe TAM overrunthe end of the series and computeextrapolatedva- luesbyitsowniteration • TAM is eqivalent of Autoreg- ression, AR (Autoregressziós) models of statistics, butnot hinderedbyMulticolinearity (Multikolinearitás).

Learning in ANN: Supervised learning: Delta rule: Topology xoj so(xoj) so(xoj) xoj bo bo Avgi So(xot) So(xot) Avgi uo uo ×wio ao ao xot xot lo lo ×wio si(yij) wio wio wio wio Si Si yij bi bi Si(xit) Si(xit) ui ui ×wio ×wio si(yij) ai ai xit xit li li yij Input Input • Asunsupervisedlearning is an Explorative (Feltáró) tool of internalstructureofthedata, there is no guaranteethatwecanuseitsresultstosolveourestimationproblematall. • Supervisedlearning (Felügyelt tanulás): alwaysteachessampledatabase of yj={yij i=1..n, yoj o=1..O}j=1..mobservedvectors ofi=1..n input ando=1..O outputvariablestothenetwork, and thenitestimatesyo* o=1..O outputsfromyi*i=1..nnewinputs. Thereforeit is more frequentlyusedindistribution free estimatorsthanunsupervisedlearning. • The most simplesupervisedlearning is Delta Rule (Delta szabály): it is usedaloneinfrequ-ently, butitwill be thebasic building block of heavilyusedbackpropagationnetworkslater • Neccessarytopology: Perceptron (Perceptron): Dual-Layer (Kétmezős)network of i=1..nlinearinputneurons ando=1..Osigmoid, additiveoutput neurons, FullFeedforward (Teljes előrecsatolás) connectedwithiointer-fieldsysnapses.No feedback-orintra-fieldsynapses! • Activation of perceptron:Itcomputesxoj output membranevaluesinonestepfrominputsyij: xoj = Si(wio×si(yij))/ Si(wio), i=1..n, o=1..O, j=1..m (2.5) Activation of perceptron in I/O diagram:

Learning in ANN: Supervised learning: Delta rule: Algorithm s’o(yoj) Doje si(yij)×Doje si(yij)×Doje so(yoj) yoj xoje si(yij) si(yij) Dwioje Dwioje s’o(yoj) Doje yoj so(yoj) si(yij)×Doje si(yij)×Doje xoje Dwioje Dwioje yij yij • In e=1..E-epochs yjj=1..msamples are cyclically loaded into membrane values ofi=1..n input neurons, and network is activated as it is described before (see: ) • Then we modify (see: ) initially random wiojeweights of io connections between neuronsi and o proportionally to learning rategioe with a result of a multiplication shown by area of a bar (see: ): • si(yij) signal at input i (see: ) is multiplied with: • Positive/negative DojeDelta-error (Hiba) of out-put neuron o, which is itself a mutiplication of: • +/- difference of xoje real- and yoj desired output value at output neuron o • s’o(yoj) first order partial derivative of sig-nal function at output neuron o(see: ) FORe=1..EDO /Epochs FORj=1..mDO /Samples wio(je+1)= gioe× si(yij)×Doje + (1-gioe)×wioje, i=1..n, o=1..O(2.6) Where:Doje= so’(yoj)×(yoj - xoje) error (2.7) so’(yoj) – first order partial derivative of signal funtion of output neuron o yoj– desired output membran values xoje– real output membran values NEXT j /Next sample NEXTe /Next epoch • Delta rule is logically an AND operator between input signal and error signal fed back from output

Learning in ANN: Supervised learning: Delta rule: Application, Testing, Criticism • Application of 2-layer perceptrons: • We show 2 input and 2 output neuronsconnectedfullfeedforwardonControlFunction Diagram: youcanseethereSosignalfunction of output neuronsplottedby (x,y) input values. • Wecanconcludethat output neuronscanlearn „directions of growth” or „low/ high” Halfspaces (Féltér) indecisionspace: • Ifboslope of theirsignalfuntion is high, theyareCrisp (Éles) halfspaces • Ifboslope is low, theyareFuzzy (Fuzzy) halfspcaeswithblurredborder. • 2-layer perceptron is analogue of FactorAnalysis (Faktoranalízis) instatistics • Testing of 2-layer perceptron: • Sum of Squared Errors, SSE (Összesítettnégyzeteshiba):squaring is toeliminate+/-signs, whichwouldkilleachother: SSEje = So(Doje 2), j=1..m, e=1..E (2.8) • Convergency of learning (Tanuláskonvergenciája): howSSEjedecreasesduringe=1..Eepochs. Werepeatepochsuntilitgoesunder a predefinedSSE*treshold • Evaluation of 2-layer perceptron: •  It is veryfastinactivation and learning •  Butit is usedaloneveryseldom, asitcannotlearndifficultthingsatall

Learning in ANN: Supervised learning: Backpropagation rule who who who who bh bh bo bo So(xot) Sh(xht) Sh(xht) So(xot) uo uh uh uo ao ah ao ah xht xht xot xot lh lo lh lo wih wih wih wih Ph Si Ph Si bi bi Si(xit) Si(xit) ui ui ai ai xit xit li li Input Input • Backpropagation learning (Visszacsatoló tanulás) is the most heavily used in industrial/ financial applications • Neccessary topology: Multi-Layer Perceptron (Több-mezős perceptron): • There is i=1..n linear input layer, h=1..Hadditive sigmoid hidden layerando=1..Oadditive or multiplicative sigmoid output layerconnected with ihand hofull feedforward connections • No intra-field or feedback connections in activation • But, during learning, feedforward connections turn into bidirectional and feed back error signal from the forward layer to the previous one, that’s why it is called backpropagation • Number of input neurons are equal to input variables (usually 10-100) • Number of output neurons depends on type of output: • 1 neuron: estimation of single scalar value • 3-5 neurons: estimation of small number of scalars • O neurons: classification of inputs into O discrete categories (will be classified to highest output signal) (eg. „Buy”, „Sell”, „Do nothing”) • There is no exact rule for number of hidden neurons: in general more hidden neurons mean more exact approximation, but too many of them already eats up machine resources and may decrease efficiency of estimation. Optimal number of hidden neurons depends on the given problem and sample database and is found by making numerous teaching trials

Learning in ANN: Supervised learning: Backpropagation: Activation so(xoj) xoj ×who ×who Pi so(xoj) sh(xhj) sh(xhj) xoj ×who ×who Pi xhj xhj Avgi Avgi ×wih ×wih si(yij) yij ×wih ×wih si(yij) yij • Activation of multi layer perceptron: • First xhjhidden membrane values are computed fromyij input samples: xhj = Si(wih × si(yij))/ Si(wih), i=1..n, h=1..H, j=1..m (2.9) • Then, fromxhjhidden memberane values, it computesxoj output membrane values: xoj = Ph(sh(xhj)Who)1/Sh(Who), h=1..H, o=1..O, j=1..m (2.10) • Finallyxojoutputmembrane values are transformedintosoj output signal values: soj = So(xoj), o=1..O, j=1..m (2.11)

Learning in ANN: Supervised learning: Backpropagation: Algorithm S’o(yoj) sh(xhje)×Doje sh(xhje)×Doje Doje yoj so(xoje) xoj sh(xhje) sh(xhje) ×who(je+1) ×who(je+1) S’o(yoj) sh(xhje)×Doje sh(xhje)×Doje Doje yoj so(xoje) xoj ×who(je+1) ×who(je+1) xhj xhj Avgo Avgo Dwih si(yij)×Dhje yij si(yij) Dwih si(yij)×Dhje sh(xhje) Dhje sh(xhje) Dhje si(yij)×Dhje Dwih Dwih yij si(yij) si(yij)×Dhje FOR e=1..E DO /Epochs FOR j=1..m DO /Samples STEP 1: hohiddenoutput connection weights are modified with Delta-rule: who(je+1) = ghoe× sh(xhje)×Doje + (1-ghoe)×whoje, h=1..H, o=1..O (2.12) where: Doje = so’(yoj)×(yoj - xoje) (2.13) STEP 2: Dojeoutput error signals are fed back through ho connections, and h=1..H hidden neurons aggregate them weighted: sh(j+1)(e+1) = shje + Dhje,h=1..H (2.14) Where:Dhje= So( who(je+1) × Doje)/ So( who(je+1) ), o=1..O (2.15) –if neuron h has additive aggregation Dhje = Po(DojeWho(je+1))1/So(Who(je+1)), (2.16) o=1..O –if neuron h has multiplicative aggregation STEP 3: ih inputhidden connections are modified by delta rule: wih(je+1) = gihe×sije×Dhje + (1-gihe)×wihje, i=1..n, h=1..H (2.17) where: sije = si(yij) (2.18) yij– is the value of sample j by neuron i NEXT j /Next sample NEXT e /Next epoch

Learning in ANN: Supervised learning: Backpropagation: Application, Testing • Applications of multi layerperceptron: • WecanseeontheControlFunction Diagram of a 3layer (2 input/ 2hidden / 2 output neurons) perceptronthatadding a 3rdlayerwithmultiplicativeaggregationenablesthesystemtorecognizeConvexHyper-poly-hedrons (Konvex sokdimenziós sokszögtest) indeci-sionspaceasweightedcombination of halfspaces modelled by2ndlayer. Ifhalfspacesare fuzzy because of lowslopeof2ndlayersignalfunctions, wegetCon-vex Fuzzy Hyperpolyhedrons (Konvex Fuzzy Hiper-poliéder) stronglyresemblingfuzzyrulesdescribedearlier, exceptthattheirblurredborderscan be atanyanpletocoordinateaxisesinstead of parallel/ ortho-gonal. ThatswhytheyarecalledFuzzy FactorRules, FFR (Fuzzy Faktor Szabályok) („factor”meansangular) • 3 layerperceptron is analogwithDiscriminantAnalysis, DA (Diszkriminancia Analízis) fromstatistics: itcanestimatepredefinedgroupmembership of a new input afterlearningsampledatabase of observationclassifi-edintoseveralgroups. It has 2 advantages over DA: • Groupsdon’t haveto be Normallydistributed (Normális eloszlású) • DA canseparategroupsindecisionspaceonlyinConvexHypercones (Konvex Hiperkúp): likeslices of a roundedpie, while here groupscantakeanylayout: likefragments of a mosaik • Testing of multi layerperceptron: • Sum of Squared Errors, SSE (Össznégyzeteshiba): SSEje = So(Doje 2), j=1..m, e=1..E (2.8) • Convergency of learning (Tanuláskonvergenciája): howSSEjedecreasesduringe=1..Eepochs. WerepeatepochsuntilitgoesunderpresetSSE*treshold

Learning in ANN: Supervised: Backprop: Application: Character recognition • Ifthere is a 4thlayer,itcanrecognizeConcave Fuzzy Hyperpolihedrons(Konkáv fuzzy hiper-poléder) fromweightedcombination of convexonesatthe3rdlayer. Tomakeitunderstand-able, letusgiveyou a characterrecognitionexample: Wewanttorecognizescannedcharactersfrom series of (x,y) coordinates of inkdots. If – instead of usinginitially random weights – wetakethetimetosetconnectionweights of the2hiddenlayersmanually, wecancreate a „building set” justusingthere 18neurons, from where ANY of the26 latin characterscan be modelled reason- ablywelljustadding 1 extra neuron in out- putlayer • Onecancompareit withBAM character recognitionwhenfor ncharactersweneed n2neurons • More importantly BAM couldrecognizecha- ractersonlyfrom per- fectlypositionedpic- tures and couldnot handleanyslidingor rotation • While here characters have more generali- zedspatiallayoutmo- del byconcave fuzzy hyperpolihedrons • Thiswasfirstusedin postalzipcoderecog- nitionin 1970s

Learning in ANN: Supervised: Backpropagation: Biologic analogy • But this was just shitty 4 layers! You can imagine what human brain can do when IVC layer of V1Primary Visual Cortex (Elsődleges látókéreg) has hundreeds of fields above each other to recognize directions on pictures created from millions of Pyramidal Cells (Piramis-Sejtek): their synapses cover a pyramid shaped region and they are believed to be the building blocks of feedforward connections

Learning in ANN: Supervised: Backprop: Application: Head & Shoulders • Now, one can ask „are we on biology seminar? How I will make money from it at FOREX?!” You will... • Because most of complex forecasting problems can be driven back to recognizing difficult shapes: • At stock exchange there is a famous cap-like pat-tern of stock prices consecutive in time called „He-ad&Shoulders”(Denoted by: Not the shampoo!) signalling that bull trend will turn to bear soon • But using the original time series as sample it is hard to capture it in the right phase, start is blurred • Therefore, we slide eg. 3 period time window (t1, t2, t3) on series one by one, and each time window will form a coordinate point in a 3 dimensional system: • We collect data of bounch of H&S and reversed H&S (Denoted by: ) into a sample database and teach it to a 4-layer perceptron, which has only 2 output neurons H&S and Reversed H&S • During learning, it will build 2 concave fuzzy poly-hedrons (Denoted by: and ) to identify their re-gions in (t1, t2, t3) input space • If a new input falls in the middle of them, their out-put neuron will signal to warn broker to step in time $ $ $ t2 0.18 t3 t2 t2 t2 t2 t2 t2 $ 0.10 $ $ 0.80 t3 t3 t3 t3 t3 t3 t1 t1 t1 t1 t1 t1 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 0.30 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 0.10 $ $ $ $ 0.16 t1

x2 Learning in ANN: Supervised learning: Backpropagation: Advantages x1 • Multi-layer perceptrons are analogues of the following statistic methods: • 3-layer: K-mean Clustering (K-közép klasztere-zés): groups observations in preset number of gro-ups, can separate them in convex polyhedrons. It can recognize concave shaped Spurious Clusters (Elnyújtott klaszter) only with assembling them from more convex ones ( ), but the convergence of the algorithm is very uncertain in this case • 4-layer: Hierarchic clustering (Hierarchikus): in each step it agglomerates 2 most similar obser-vations/subgroup into one group, until the info loss caused by agglomeration does not jump up. If si-milarity is based on Nearest Neighbour (Legkö-zelebbi szomszéd) members of subgroups, it can detect spurious clusters well, but it is extremely sensitive to Outlier (Szélsőséges) observations: it will result in very uneven sized groups putting sin-gle outlier into a separate group! •  While concave fuzzy factor rules of backpropa-gation at 4th layer can recognize spurious clusters without being prone to outlier distortions •  Although CRT-type algorithms can also model spurious clusters pretty well, fuzzy factor rules of multi-layer perceptron can take any shape, so they can be more effective detecting transversal part of Hyper-surfaces (Hiperfelületek) (eg.: detecting char „N”) •  It creates fuzzy factor rules automatically from sample database avoiding huge amount of manual work, which would make estimation infeasible

Learning in ANN: Supervised learning: Backpropagation: Limitations 1 Y Y X X •  Black-box (Fekete doboz) phenomena: althogh multi-layer perceptron creates rules, you cannot ever see or modify them manually (except very small networks with limited number of variables), because they are encoded implicitely in the weights of full feedforward connections set by learning. In other rule based systems you can add extra IF-THEN rules manually any time (eg. to represent brokers experience from previous situations) •  Computational requirement (Számolásigény): it has the highest among all learning algorithms (eg. 1000× more than CRT,100000× times more than DA) because convergency of learning is very bad, and it can get stuck in a suboptimal solution if we use higher than learning rate than 0.002..0.003, so we need 100000s of epochs learning a real sized sample database •  Non-exact method (Nem egzakt módszer): there are a bounch of topology parameters to set (number of fields/ hidden neurons, aggregation, signal function type, treshold, slope, initial weights, learning rate) heavily influencing learning convergency and there is no any exact math theory to set them optimally, just Rules of Thumbs (Hüvelykujj-szabályok), and you can make teaching experiments (with even more computations...) •  Low convergency of learning (Alacsony tanulási konvergencia): at the character recognition example we set full feedforward connection weights manually spending some time with. In a real sized problem, this is infeasible: initial weights are set random, so initial shapes/ borders/ direction or fuzzy factor rules (see: ) are totally random: backpropagation will slide and rotate them, but there is no ANY guaran-tee that it will find optimal setting: it can cycle on more suboptimal ones

Learning in ANN: Supervised learning: Backpropagation: Limitations 2 y x • This is becauseinbackpropagation, onlythe output fieldlearnsdirectlyfromsample, hid-denfieldswillget more and more indirecter-rorsignals, blurredwithCross-effects (Ke-reszthatások). Thusbackpropagationcan: • Easilyassemble/ disassembleconcaverulesfromconvexonestogetoptimal • BUTitcancorrecttheirborders and di-rectionwithgreatdifficulty – evenriskingcollapsingthewholesystem - iftheinitial random layoutwasveryineffective: • Youcanbetterimaginethisproblemfromthefollowingkindergarten-levelsimulation: • EXAMPLE 2-1: Youwanttobuild a castlefromwoodenbricksinabox. Big bricksatthebottomarelowlevel neuron fields, smallbricksatthe top arehighlevelones. Youbuildthecastleautomatically: hittingthebrickswith a bighammer of desiredshape and expectingthemtofindtheirstabileplace • However you can hit the small bricks at the top directly, big ones move indirectly • But in a sudden big move of a big brick, the whole castle can collapse

Learning in ANN: Supervised learning: Validation methods • Artificial Neural Networks are not exact math methods, just Heuristics (Heurisztika): there is no math proof (Matematikailag nem bizonyítható) that they will find the optimal solution for estimation, just there is a chance for that • Thus, the only difference between the broker goes to Fortuneteller (Jövendőmondó) lady or uses ANN is Validation (Validáció): measuring efficiency of estimation on a given sample: • Database of our estimation problem, • Or Benchmark data (Próbaadatok) from literature. Don’t worry, there are no widely accepted benchmark data for FOREX... • At Unsupervised learning (Nem felügyelt tanulás): efficiency of learning is measured by Enthropy (Entrópia): how similar yj observations are grouped together and how dissi-milar are ykGroup centroids(Csoportközép) in e=1..Eepochs: Ee= Si(yji-yki)2 (2.20) • At Supervised learning (Felügyelt tanulás) there are the following cases: • By type of output variables o=1..O: • Discrete/category variable:% of correct classification(eg.„Buy” was really to buy) • Continous/scalar variable: • Sum of Squared Errors, SSE (Össz négyzetes hiba):SSEoe= Sj(Doje 2)(2.21) •  It is always positive •  It has no possible higher bound, hard to compare across several samples • Multiple determination coefficient/R-square (Többszörös determinációs együttható/R-négyzet): R2o = (VAR(xo) – SSEo/m) / VAR(xo) (2.22) • where: VAR(xo)- variance of output variable during samples j = 1..m •  Comparable across samples: 0:input has no effect..1:perfect estimation •  It can be negative at very bad systems:<0:the opposite is estimated • By type of validation: • Simple (Egyszerű): measuring efficiency on the whole sample • Crossvalidation (Keresztvalidáció): randomly cut sample into 2 representative hal-ves: Teaching sample (Tanítóminta) is trained to the network, then its efficiency is tested on both teaching- and Test sample (Tesztminta) and the 2 results are com-pared: teaching efficiency is naturally higher than test one, but if the difference is too big, the network Overlearnt (Túltanul) the sample (eg. 98% teaching but 56% test efficiency): it lost the ability of generalizing knowledge learnt for new inputs (Just like a Conning (Magoló) student...). In that case network should be Pruned (Vág) setting small synaptic weights to 0 and we will probably get better test efficiency.

Learning in ANN: Supervised learning: Backpropagation: Software • ContrFunctSim.xls is a ControlFunction Simulator of 4layerperceptronwith2linear input neurons (x, y) and 8neuronsinhidden and output layers, showingtheirsignalfunctioninthe(x, y) space of input variables. Ateach neuron youcanset: • Neu: code of the neuron can be referencedbyot- herneurons:(Layer.Neur) • Agr:AggregationSfor additive and Pformultiplic. • Sig:Signalfuntion, SorZ curveorLinear • Trh:Signalfunctiontres- hold -9.9..9.9 • Slp:Signalfunctionslope -9.9..9.9 • Inp.neuroncode:write here a simplecellfuncti- onreferencingtothepur- plecellcontaingcode of neuron withincomingsig- nal (eg:„=B25” not „2.1”) • Connec.wght:weight of incomingconnectionco- lor-marked: -9.9..0..9.9 • Redrawconnect.button: drawsfullfeedforward synapsesdenotedwith • Youcansee here the characterrecognition example of(AbCdEFGH):

Home Assignment 2: Character design in Control Function Diagram • Modify connection weights, tresholds, slopes, aggre-gation- and signal function types in ContrFunctSim.xls to enable the network to recognize characters K, L, M, N, O, X, Y, Z simultaneosly! (1.5 points) • Solution: HomeAssign2Solution.xls

References • Introduction of backpropagation: http://www.dlsi.ua.es/~mlf/nnafmc/pbook/node28.html • Detailed discussion of theory: http://axon.cs.byu.edu/~martinez/classes/678/Papers/Werbos_BPTT.pdf

Programming Neural Networks and Fuzzy Systems in FOREX Trading