1 / 17

Relational Data Mining in Finance

Relational Data Mining in Finance. Haonan Zhang CFWin03-37 hnzhang@cs.umanitoba.ca 03/04/2003. Organization. Motivation & Introduction Background Problem statement Solution Outcome Conclusion and future work. Motivation & Introduction.

chacha
Download Presentation

Relational Data Mining in Finance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relational Data Mining in Finance Haonan Zhang CFWin03-37 hnzhang@cs.umanitoba.ca 03/04/2003

  2. Organization • Motivation & Introduction • Background • Problem statement • Solution • Outcome • Conclusion and future work

  3. Motivation & Introduction • Motivation: Analyze finance data, find and extract hidden patterns and relations between data. Thus the result will make good support for decision making in finance. • To predict financial markets is a complex and challenging task because several reasons: • The dimensionality of the problem is high. • The relationships among independent and dependent variable are weak and non-linear. • Data mining is a process that analyze data from different perspectives and explicitly show the interaction between data with a given confidence. It is suitable for predict financial markets

  4. Motivation & Introduction cont.. • Traditional data mining methods have their limitations, such as lack of knowledge representation and limited formsof background knowledge. • The Inductive Logic Programming(ILP) and Relational Data mining (RDM) can overcome these limitations. • The ILP can naturally incorporate background knowledge andrelations between objects into the leaning process; • The RDM can discover hidden relations (general first order relations) in numerical and symbolic data using background knowledge (domain theory)

  5. Background • A predicate is a binary function. Apredicate can be defined extensionally as a set of tuples forwhich the predicate is true, or intentionally as a set of (Horn)clauses for computing whether the predicate istrue. • A literal is a predicate or its negation • A horn clause consists of two parts: a clause head and a clausebody. A clause head isdefined as a single predicate. A clausebody is defined as a conjunction of literals.

  6. Background cont.. • FOIL (First Order Inductive Learning) algorithm • Input: a target relationto be learned, a set of positive, a set of negative examples ofthe relation, and a set of background relations. • Learning approach: separate-and-conquer approach: It learns a clause at a time, then it remove trainingexamples covered by the clause, then begin to learn the subsequentclauses. FOIL tries to cover as much as positive training examplesand cover no negative training examples. The algorithm willterminate when all positive training examples are covered. • Output: a set of clauses that describe the target relation.

  7. Background cont.. • The FOCL (First Order Combined Learner) algorithm extends the FOILalgorithm in several ways. • The FOCL constrains the search by using variable typing, andinter-argument constraints. • The FOCL uses background knowledge to improve the learningprocess, such as rules which are defined by a collection of examples, and a partial possible incorrect rule which is an initial approximation of the predicate to be learned.

  8. Background cont.. • The MMDR (Machine Methodfor Discovering Regularities)algorithmfocuses on generating probabilistic first-order rules andmeasurement issues for numerical relational • MMDR permits various forms of backgroundknowledge, such as constraints, predefinedpredicates and partial (may be incorrect) rules. • MMDR uses thestatistical significance of hypotheses and the strength of datatypes scales, to limit the search space.

  9. Problem statement • The FOIL input a target relation to be learned, a set of positiveand a set of negative examples, and a set of background knowledge. • The output would be a set of clauses that describe the targetrelation usingbackground knowledge

  10. Solution • separate-and-conquer approach

  11. Solution cont.. • Two kinds of literals can be appended to develop a clause. • gainful literals: literals may increase the covering ofpositive examples. Gainful literals areevaluated using information heuristic. The average information provided by discoverythat literal of the bindings is When new literalm is added, suppose that some of the bindingsare excluded, and k of then+ bindings are notexcluded. The total gain is

  12. Solution cont.. • determinate literals: A determinate literal introduces new variable. The new partialclause has the same binding for each positive binding of currentclause, and at most one binding for each negative binding ofcurrent clause. Therefore, sometimes determinate literal has zerogain.

  13. Solution cont.. • Four forms of literals are considered can appear in aclause:

  14. Solution cont.. • In order to learn recursive theories without leading to infinite recursion, FOILuses three approaches to assure that the recursive literals arerisk free. • Ordering constants: the algorithm can discover an ordering of constant and order constants. • Ordering pairs of variables: when a type’s constant is ordered, the ordering of a pair of variable Vi and Vj of same type in a partial clause may also exist. • Ordering recursive literals: the ordering among variables can be extended to an ordering of literals involving a predicate and variables.

  15. Flow Diagram

  16. Outcome

  17. Conclusion and future work • The FOIL algorithm can effectively find the hidden relations between target relation and background knowledge and represent the target relation using background knowledge. • The FOIL algorithm uses a very complex recursive control andbackup scheme, which increase the complexity of the algorithm.Further implementation needs a better understanding of theseschemes. • Future work: implement the FOIL algorithm in parallel computers.

More Related