Web Reasoning: Unlocking Bioinformatics Potentials with Ontologies and Semantic Technologies

Reasoning on the Web:Theory, Challenges, and Applications in Bioinformatics

Contents • Motivation • Beyond the web: Rules, Reasoning, Semantics, Ontologies • Semantics of Deduction Rules • Argumentation Semantics • Fuzzy Reasoning • Reaction rules • Vivid Agents • Prova • Applications in Bioinformatics

LLNE YLEEVE EYEEDE The Web • A great success story, but… • it’s the web for humans, not machines • Many areas, such as biology, have fully embraced the web • Human genome project is only tip of the iceberg • More than 500 tools and databases online

>12.000.000 literature abstracts Great resource if one knows what one is looking for “Kox1” has 17 hits But “diabetes” will produce >200.000 Often need to automatically process abstracts Example: Pubmed

Title Author Year Journal Results of PubMed • Lorenz P, Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44. • Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31.... However, to a machine things look different!

Results of PubMed • Lorenz P, Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44. • Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31.... Solution: tag data (XML)

Results of PubMed • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem</journal><year>2001<year> • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem</journal><year>2001<year> • ... However, to a machine things look different!

Results of PubMed • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem </journal><year>2001<year> • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem </journal><year>2001<year> • ... Solution: use ontologies (Semantic Web)

Biologists have recognised the problem of semantic inter-operability between disparate information sources GeneOntology (GO) is effort to provide common vocabulary for molecular biology GO has >10.000 terms in three branches “function”, “process”, “localisation” GeneOntology

GeneOntology • Has 13 levels • Width broadens to level 6 (3885 terms wide) then shrinks • Number of leaves per levels broadens to level 6 (1223 leaves) then shrinks • Average term has 4 words • Maximal term has 29 words: Oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors Breadth of GO

Motivation Summary • Web in the old days • HTML (for humans) • Web these days • HTML • XML, Ontologies (for machines) • Web of the future • HTML • XML, Ontologies • rules, reasoning, semantics • access to computational resources (a la grid-computing)

Open Problems • Part I: Theory of rules and reasoning on the web: • Knowledge representation: Which level of expressiveness? • Semantics: How to guarantee inter-operability • Reasoning: Fuzzy reasoning and unification • Reactivity: Vivid agents • Part II: Applications of rules and reasoning on the web: • Integration and querying of information sources • Integration: transmembrane prediction tools • Integration: protein structure DB and structure classification • Consistency checking • Ontology: If A is B and B is C, then the ontology should not explicitly mention A is C, as it is already implicit • Annotation: Do different tools agree or disagree?

The wider Picture: www.RuleML.org • Goal: develop Web language for rules • using XML markup, • formal semantics, and • efficient implementations. • Rules: derivation rules, transformation rules, and reaction rules. • RuleML can thus specify queries and inferences in Web ontologies, mappings between Web ontologies, and dynamic Web behaviors of workflows, services, and agents. • Currently, some 30 international members and close collaboration with W3C

The wider Picture: REWERSE • Reasoning on the Web with Rules and Semantics • FP6 Network of Excellence with nearly 30 partners • Working groups on Infrastructure and Applications • Composition • Typing • Policies • Querying • Reactivity and evolution • Personalised Web sites • Calendar systems • Bioinformatics

Part I: Theory • Motivation: Expressive Knowledge Representation • Part I.a: Argumentation as LP semantics • Notions of attack and justified arguments • Hierarchy of semantics • Proof procedure • Part I.b: Fuzzy unification and argumentation • Fuzzy negation • Fuzzy argumentation • Fuzzy unification • Part I.c: Vivid Agents

Part I.a: A Hierarchy of Semantics • RuleML caters for different degrees of knowledge representation • A hierarchy of semantics is required to guarantee inter-operation. • Analogy: In HTML, <b>Michael</b> will be interpreted differently in Netscape (Michael) and the text-based browser Lynx (Michael). • Problem: How can we guarantee inter-operability between different interpretations of rules?

Knowledge representation • Pete earns 500.000$ p.a. • earns(pete,500000). • Cross the street if there are no cars • cross  not car • cross   car • The fridge is quite cheap • cheap(fridge):70% • Does Mike live in Londn? • address(mike,london) = address(mike,londn): 95%

fdFB fdDB dDB dFB fDB fFB rDB rFB fuzzy deductive negation Knowledge System Cube • r: relational • f: fuzzy • d: deductive • DB: database • FB: factbase

Part I.a:Argumentation as semantics for Extended Logic Programs fdFB fdDB • f: fuzzy • d: deductive • DB: database • FB: factbase dDB dFB fDB fFB rDB rFB fuzzy deductive negation

Extended Logic Programming • Logic Programming with 2 negations • Default negation: not p : true if all attempts to prove p fail. • Explicit negation: p : falsehood of a literal may be stated explicitly. • Coherence principle: p  not p

Argumentation • Interaction between agents in order to • gain knowledge • revise existing knowledge • convince the opponent • solve conflicts • Elegant way to define semantics for (extended) logic programming • Dung • Kowalski, Toni, Sadri • Prakken & Sartor • Etc.

Arguments • An argument is a partial proof, with implicitly negated literals as assumptions. • Argument = sequence of rules

Attacking arguments • Two fundamental kinds of attack: • A undercuts B = A invalidates premise of B • P: Let’s go to the lake as it is not snowing anymore • O: Hang, it is snowing • A rebuts B = A contradicts B • P: Let’s go to the lake as it is not snowing • O: Let’s not, as I’ve got to prepare my talk • Derived notions of attack used in Literature: • A attacks B = A u B or A r B • A defeats B = A u B or (A r B and not B u A) • A strongly attacks B = A a B and not B u A • A strongly undercuts B = A u B and not B u A

Proposition: Hierarchy of attacks Attacks = a = u  r Defeats = d = u  ( r - u -1) Undercuts = u Strongly attacks = sa = (u  r ) - u -1 Strongly undercuts = su = u - u -1

Fixpoint Semantics • Argumentation: • game between proponent and opponent • argument A is acceptable if opponent’s x-attack is countered by proponent’s y-attack, which proponent already accepted earlier. • Acceptable • Let x,y be notions of attack. • An argument A is x,y-acceptable w.r.t. a set of arguments S iff • for every argument B, such that (B,A)  x, there is a C  S such that (C,B)  y • Fixpoint semantics • Fx/y (S) = { A | A is x,y-acceptable w.r.t. S } • x/y-justified arguments = Least Fixpoint of Fx/y. • x/y-overruledarguments = x-attacked by a justified argument. • x/y-defensible iff neither justified nor overruled

Theorem: Relationship of semantics Prakken and Sartor’ssemantics w/o priorities If opponent is allowed to attack,type of defense does not matter • Weakening opponent or strengthening proponent increases justified arguments • Different notions of acceptability give rise to different argumentation semantics If opponent is allowed defeat,type of defense does not matter Dung’s groundedargumentation semantics WFSX su/a=su/d If opponent is allowed undercut,defense with (a,u,sa) or without(su,u) rebut makes a difference su/u su/sa sa/u=sa/d=sa/a su/su u/a=u/d=u/sa sa/su=sa/sa u/su=u/u d/su=d/u=d/a=d/d=d/sa a/su=a/u=a/a=a/d=a/sa

Proof procedure • Dialogues: • x/y-dialogue is sequence of moves such that • Proponent and Opponent alternate • Players cannot repeat arguments • Opponent x-attacks Proponent’s last argument • Proponent y-attacks Opponent’s last argument • Player wins dialogue if other player cannot move • Argument A is provably justified if proponent wins all branches of dialogue tree with root A • Concrete implementation SLXA: • Since u/a=u/d=u/sa=WFSX  compute justified arguments with top-down proof procedure SLXA for WFSX [Alferes, Damasio, Pereira] • SLXA can be adapted for other notions

Part I.b:Fuzzy unification and argumentation fdFB fdDB • r: relational • f: fuzzy • d: deductive • DB: database • FB: factbase dDB dFB fDB fFB rDB rFB fuzzy deductive negation

Classical Fuzzy Logic • Solution: • Truth values in [0,1] instead of {0,1}. • Assertions: • p:V (p a formula, V a truth value). • Conjunction: • p:V, q:W p q : min(V,W) • Disjunction: • p:V, q:W p q : max(V,W) • Inference: • p  q1, …, qn ; q1:V1, …, qn:Vnp : min(V1, …, Vn)

Fuzzy Negation • Classical fuzzy negation: • L:V L: 1-V (Zadeh) • Our setting (fuzzy adaptation of WFSX): • L:V and L:V’ with V’  1-V possible • L and L not directly related.

Fuzzy Coherence Principle • If L:V and V > 0, and not L:V’, then V’ > V. • “If there is some explicit evidence that L is false, then there is at least the same evidence that L is false by default.” • If L:V and V > 0, then not L: 1.

p  p :V V > 0 possible Contradictory programs! not p p : V V > 0 possible By coherence principle! Contradiction removal not p  p : V V > 0 p  p : V V = 0 possible p is unknown Law of excluded... ...contradiction ...middle

Strength of an argument • Strength of an argument: • Fact: value is given • Rule: minimum of body literals • Argument: Conclusion • Least fuzzy value of the facts contributing to the argument.

Theorems • Theorem (Soundness and Completeness) There is a justified argument of strength V for L iff There is a successful T-tree of truth value V for L • Theorem (Conservative Extension) Argumentation semantics is a conservative extension of WFSX.

Application: Fuzzy unification • Open systems: • knowledge and ontologies may not match • interaction with humans • “Does Mike live in Londn?” • Approach: • address(mike,london) = address(mike,londn): 95% • adapt unification algorithm(normalised edit distance over trees net) • embed into argumentation framework

Finding Mismatches: Edit distance • Edit distance between strings A and B: • minimal number of delete, add, replace operations to convert A into B. • efficient implementation with dynamic programming • Example: • e(address,adresse)=2, e(007,aa7)=2 • Normalise: • ne(A,B) = e(A,B) / max{ |A|, |B| } • Trees: • net = sum of all mismatches divided by sum of all max lengths

Fuzzy unification and arguments • net is conservative extension of MGU (most general unifier) • net(t,t’)  ne(t,t’) • Adapt definition of argument for fuzzy unification • V-argument: for all L in a body, there is L’ in head such that net(L,L’)  1-V • A V-undercuts B if A contains not L and B’s head is L’ and net(L,L’)  1-V • A V-rebuts B if A’s head is L and B’s head is L’ and net(L,L’)  1-V • Adapt previous definitions accordingly

Comparison: Argumentation • Our framework allows us to relate existing and new argumentation semantics: • Dung= a/su=a/u=a/a=a/d=a/sa • Prakken&Sartor = d/su=d/u=d/a=d/d=d/sa • WFSX = u/a = u/d = u/sa • Dung  Prakken&Sartor  WFSX • Proof Theory and Top-down Proof Procedure adapted from Alferes, Damasio, Pereira’s SLXA

Comparison: Fuzzy Argumentation • Wagner: • Scale: -1 to +1 • Unlike WFSX, he relates F and F:  F: -V iff F:V • We adopted his interpretation for not:not F:1 if F:V, V>0 • Relates his work to stable models, but there is no top-down proof procedure for stable models [Alferes&Pereira] • Our approach conservatively extends WFSX, hence we can adapt proof procedure SLXA

Comparison: Fuzzy unification • Arcelli, Formato, Gerla • define abstract fuzzy unification/resolution framework • cannot deal with missing parameters (common problem [Fung et al.]) • no conservative extension of classical unification • we use concrete distance: edit distance • Evaluated idea on bioinfo DB

Conclusion • “A database needs two kinds of negation”(Wagner) • Argumentation is an elegant way of defining semantics • Our framework allows classification of various new and existing semantics • Efficient top-down proof procedure for justified arguments • Argumentation as basis for belief revision (REVISE) • We cover the whole knowledge system cube including fuzzy argumentation • Defined fuzzy unification, which is useful in open systems

Part I.c: Vivid Agent • A vivid agent is a software-controlled system, • whose state is represented by a knowledge base and • whose behaviour is represented by • action- and • reaction rules • Actions are planned and executed to achieve a goal • Reactions are triggered by events • Epistemic RR: Effect <- Event, Cond • Physical RR: Action, Effect <- Event, Cond • Interaction RR: Msg, Effect <- Event, Cond

Intentions Goals Action rules Goals Planner Believes Believes KB Vivid Agent Interface Events Reaction Rules Perception Reaction Cycle Believes/ Updates KB

Agent State and Transition Semantics • Agent State: • Event queue, Plan queue, Goal queue, Knowledge base • Transition semantics • Perception • Add event to agent’s event queue • Reaction • Pop event from event queue, execute reactions including update of knowledge base • Plan execution • Execute action of plan in plan queue • Replanning • If action fails, replan • Planning • Pop goal from goal queue and generate plan

Implementation in Prova • Original Implementation in PVM-Prolog • Course-grain parallelism (PVM) for each agent and Prolog threads for an agent’s components • Currently: Prova • is a Java-based rule engine • easy integration of all kinds of data sources. e.g., database, web services, etc.

LLNE YLEEVE EYEEDE Part II: Application to Bioinformatics • NSF and EU’s strategic research workshop found that bioinformatics could play the role for the semantic web, which physics played for the web. • Why? • Masses of information • Masses of publicly accessible online information • (e.g. 8000 abstracts per month and over 500 tools) • Data (more and more often) published in XML • Data standards are accepted and actively developed • Much valuable information scattered (as production cheap and hence not centralised) • Systemsintegration and interoperation prime concern (e.g. GeneOntology)

Source Source Example: Information Agents for… • … Protein interactions • PDB, SCOP • … Protein annotation • TOPPred, HMMTOP,… • Information source • Wrapper • Mediator • Facilitator Facilitator Mediator Wrapper Source Wrapper Wrapper

Example 1: Protein Interaction: • PDB: Protein structures • SCOP: Structure classification

Example 1: PSIMAP: Structural Interactions

Example 1: Protein Interaction: How it is currently done • PDB: 15 Gigabyte in flat files • SCOP: 3 flat files • How? • Download PDB, SCOP files • Think up DB schema and populate MySQL DB • Run some Perl scripts on various machines, that grind through the data and analyse it • Run some Java to visualise results • Problem: “Business logic” not separated

Web Reasoning: Unlocking Bioinformatics Potentials with Ontologies and Semantic Technologies

Web Reasoning: Unlocking Bioinformatics Potentials with Ontologies and Semantic Technologies

Presentation Transcript

Game Theory: introduction and applications to computer networks

Number Theory

Transactional – Reader Response Theory

ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics

Pathway Bioinformatics

GS 608 Introduction to GPS: Theory and Applications Undergraduate and Graduate, 3 credit hours AU 2001

Clinical reasoning: lessons learned from pharyngitis!

EMOTION AND MOTIVATION

Leonardo de Moura Microsoft Research

Approximation Techniques for Automated Reasoning

Gas Turbine Engine Theory

Advanced Micro Theory

Introduction to Bioinformatics

Information Theory, Statistical Measures and Bioinformatics approaches to gene expression

GS 608 Introduction to GPS: Theory and Applications Undergraduate and Graduate, 3 credit hours

Quantified Boolean Formula (QBF) Reasoning

Identifying Faulty Reasoning

CS 6293 Advanced Topics: Current Bioinformatics

Inference Algorithms: A Tutorial

最佳的多重序列比對方法針對基因組領域