650 likes | 736 Views
Dive into the theory, challenges, and applications of reasoning on the web for bioinformatics. Explore the use of rules, semantics, and ontologies in processing vast amounts of biological data efficiently. Learn how fuzzy reasoning, vivid agents, and tools like Prova are revolutionizing research in the field. Discover the importance of utilizing XML and ontologies for machine readability and semantic interoperability in bioinformatics. Stay ahead in the rapidly-evolving world of bioinformatics by understanding the power of web reasoning technologies.
E N D
Reasoning on the Web:Theory, Challenges, and Applications in Bioinformatics
Contents • Motivation • Beyond the web: Rules, Reasoning, Semantics, Ontologies • Semantics of Deduction Rules • Argumentation Semantics • Fuzzy Reasoning • Reaction rules • Vivid Agents • Prova • Applications in Bioinformatics
LLNE YLEEVE EYEEDE The Web • A great success story, but… • it’s the web for humans, not machines • Many areas, such as biology, have fully embraced the web • Human genome project is only tip of the iceberg • More than 500 tools and databases online
>12.000.000 literature abstracts Great resource if one knows what one is looking for “Kox1” has 17 hits But “diabetes” will produce >200.000 Often need to automatically process abstracts Example: Pubmed
Title Author Year Journal Results of PubMed • Lorenz P, Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44. • Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31.... However, to a machine things look different!
Results of PubMed • Lorenz P, Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44. • Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31.... Solution: tag data (XML)
Results of PubMed • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem</journal><year>2001<year> • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem</journal><year>2001<year> • ... However, to a machine things look different!
Results of PubMed • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem </journal><year>2001<year> • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem </journal><year>2001<year> • ... Solution: use ontologies (Semantic Web)
Biologists have recognised the problem of semantic inter-operability between disparate information sources GeneOntology (GO) is effort to provide common vocabulary for molecular biology GO has >10.000 terms in three branches “function”, “process”, “localisation” GeneOntology
GeneOntology • Has 13 levels • Width broadens to level 6 (3885 terms wide) then shrinks • Number of leaves per levels broadens to level 6 (1223 leaves) then shrinks • Average term has 4 words • Maximal term has 29 words: Oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors Breadth of GO
Motivation Summary • Web in the old days • HTML (for humans) • Web these days • HTML • XML, Ontologies (for machines) • Web of the future • HTML • XML, Ontologies • rules, reasoning, semantics • access to computational resources (a la grid-computing)
Open Problems • Part I: Theory of rules and reasoning on the web: • Knowledge representation: Which level of expressiveness? • Semantics: How to guarantee inter-operability • Reasoning: Fuzzy reasoning and unification • Reactivity: Vivid agents • Part II: Applications of rules and reasoning on the web: • Integration and querying of information sources • Integration: transmembrane prediction tools • Integration: protein structure DB and structure classification • Consistency checking • Ontology: If A is B and B is C, then the ontology should not explicitly mention A is C, as it is already implicit • Annotation: Do different tools agree or disagree?
The wider Picture: www.RuleML.org • Goal: develop Web language for rules • using XML markup, • formal semantics, and • efficient implementations. • Rules: derivation rules, transformation rules, and reaction rules. • RuleML can thus specify queries and inferences in Web ontologies, mappings between Web ontologies, and dynamic Web behaviors of workflows, services, and agents. • Currently, some 30 international members and close collaboration with W3C
The wider Picture: REWERSE • Reasoning on the Web with Rules and Semantics • FP6 Network of Excellence with nearly 30 partners • Working groups on Infrastructure and Applications • Composition • Typing • Policies • Querying • Reactivity and evolution • Personalised Web sites • Calendar systems • Bioinformatics
Part I: Theory • Motivation: Expressive Knowledge Representation • Part I.a: Argumentation as LP semantics • Notions of attack and justified arguments • Hierarchy of semantics • Proof procedure • Part I.b: Fuzzy unification and argumentation • Fuzzy negation • Fuzzy argumentation • Fuzzy unification • Part I.c: Vivid Agents
Part I.a: A Hierarchy of Semantics • RuleML caters for different degrees of knowledge representation • A hierarchy of semantics is required to guarantee inter-operation. • Analogy: In HTML, <b>Michael</b> will be interpreted differently in Netscape (Michael) and the text-based browser Lynx (Michael). • Problem: How can we guarantee inter-operability between different interpretations of rules?
Knowledge representation • Pete earns 500.000$ p.a. • earns(pete,500000). • Cross the street if there are no cars • cross not car • cross car • The fridge is quite cheap • cheap(fridge):70% • Does Mike live in Londn? • address(mike,london) = address(mike,londn): 95%
fdFB fdDB dDB dFB fDB fFB rDB rFB fuzzy deductive negation Knowledge System Cube • r: relational • f: fuzzy • d: deductive • DB: database • FB: factbase
Part I.a:Argumentation as semantics for Extended Logic Programs fdFB fdDB • f: fuzzy • d: deductive • DB: database • FB: factbase dDB dFB fDB fFB rDB rFB fuzzy deductive negation
Extended Logic Programming • Logic Programming with 2 negations • Default negation: not p : true if all attempts to prove p fail. • Explicit negation: p : falsehood of a literal may be stated explicitly. • Coherence principle: p not p
Argumentation • Interaction between agents in order to • gain knowledge • revise existing knowledge • convince the opponent • solve conflicts • Elegant way to define semantics for (extended) logic programming • Dung • Kowalski, Toni, Sadri • Prakken & Sartor • Etc.
Arguments • An argument is a partial proof, with implicitly negated literals as assumptions. • Argument = sequence of rules
Attacking arguments • Two fundamental kinds of attack: • A undercuts B = A invalidates premise of B • P: Let’s go to the lake as it is not snowing anymore • O: Hang, it is snowing • A rebuts B = A contradicts B • P: Let’s go to the lake as it is not snowing • O: Let’s not, as I’ve got to prepare my talk • Derived notions of attack used in Literature: • A attacks B = A u B or A r B • A defeats B = A u B or (A r B and not B u A) • A strongly attacks B = A a B and not B u A • A strongly undercuts B = A u B and not B u A
Proposition: Hierarchy of attacks Attacks = a = u r Defeats = d = u ( r - u -1) Undercuts = u Strongly attacks = sa = (u r ) - u -1 Strongly undercuts = su = u - u -1
Fixpoint Semantics • Argumentation: • game between proponent and opponent • argument A is acceptable if opponent’s x-attack is countered by proponent’s y-attack, which proponent already accepted earlier. • Acceptable • Let x,y be notions of attack. • An argument A is x,y-acceptable w.r.t. a set of arguments S iff • for every argument B, such that (B,A) x, there is a C S such that (C,B) y • Fixpoint semantics • Fx/y (S) = { A | A is x,y-acceptable w.r.t. S } • x/y-justified arguments = Least Fixpoint of Fx/y. • x/y-overruledarguments = x-attacked by a justified argument. • x/y-defensible iff neither justified nor overruled
Theorem: Relationship of semantics Prakken and Sartor’ssemantics w/o priorities If opponent is allowed to attack,type of defense does not matter • Weakening opponent or strengthening proponent increases justified arguments • Different notions of acceptability give rise to different argumentation semantics If opponent is allowed defeat,type of defense does not matter Dung’s groundedargumentation semantics WFSX su/a=su/d If opponent is allowed undercut,defense with (a,u,sa) or without(su,u) rebut makes a difference su/u su/sa sa/u=sa/d=sa/a su/su u/a=u/d=u/sa sa/su=sa/sa u/su=u/u d/su=d/u=d/a=d/d=d/sa a/su=a/u=a/a=a/d=a/sa
Proof procedure • Dialogues: • x/y-dialogue is sequence of moves such that • Proponent and Opponent alternate • Players cannot repeat arguments • Opponent x-attacks Proponent’s last argument • Proponent y-attacks Opponent’s last argument • Player wins dialogue if other player cannot move • Argument A is provably justified if proponent wins all branches of dialogue tree with root A • Concrete implementation SLXA: • Since u/a=u/d=u/sa=WFSX compute justified arguments with top-down proof procedure SLXA for WFSX [Alferes, Damasio, Pereira] • SLXA can be adapted for other notions
Part I.b:Fuzzy unification and argumentation fdFB fdDB • r: relational • f: fuzzy • d: deductive • DB: database • FB: factbase dDB dFB fDB fFB rDB rFB fuzzy deductive negation
Classical Fuzzy Logic • Solution: • Truth values in [0,1] instead of {0,1}. • Assertions: • p:V (p a formula, V a truth value). • Conjunction: • p:V, q:W p q : min(V,W) • Disjunction: • p:V, q:W p q : max(V,W) • Inference: • p q1, …, qn ; q1:V1, …, qn:Vnp : min(V1, …, Vn)
Fuzzy Negation • Classical fuzzy negation: • L:V L: 1-V (Zadeh) • Our setting (fuzzy adaptation of WFSX): • L:V and L:V’ with V’ 1-V possible • L and L not directly related.
Fuzzy Coherence Principle • If L:V and V > 0, and not L:V’, then V’ > V. • “If there is some explicit evidence that L is false, then there is at least the same evidence that L is false by default.” • If L:V and V > 0, then not L: 1.
p p :V V > 0 possible Contradictory programs! not p p : V V > 0 possible By coherence principle! Contradiction removal not p p : V V > 0 p p : V V = 0 possible p is unknown Law of excluded... ...contradiction ...middle
Strength of an argument • Strength of an argument: • Fact: value is given • Rule: minimum of body literals • Argument: Conclusion • Least fuzzy value of the facts contributing to the argument.
Theorems • Theorem (Soundness and Completeness) There is a justified argument of strength V for L iff There is a successful T-tree of truth value V for L • Theorem (Conservative Extension) Argumentation semantics is a conservative extension of WFSX.
Application: Fuzzy unification • Open systems: • knowledge and ontologies may not match • interaction with humans • “Does Mike live in Londn?” • Approach: • address(mike,london) = address(mike,londn): 95% • adapt unification algorithm(normalised edit distance over trees net) • embed into argumentation framework
Finding Mismatches: Edit distance • Edit distance between strings A and B: • minimal number of delete, add, replace operations to convert A into B. • efficient implementation with dynamic programming • Example: • e(address,adresse)=2, e(007,aa7)=2 • Normalise: • ne(A,B) = e(A,B) / max{ |A|, |B| } • Trees: • net = sum of all mismatches divided by sum of all max lengths
Fuzzy unification and arguments • net is conservative extension of MGU (most general unifier) • net(t,t’) ne(t,t’) • Adapt definition of argument for fuzzy unification • V-argument: for all L in a body, there is L’ in head such that net(L,L’) 1-V • A V-undercuts B if A contains not L and B’s head is L’ and net(L,L’) 1-V • A V-rebuts B if A’s head is L and B’s head is L’ and net(L,L’) 1-V • Adapt previous definitions accordingly
Comparison: Argumentation • Our framework allows us to relate existing and new argumentation semantics: • Dung= a/su=a/u=a/a=a/d=a/sa • Prakken&Sartor = d/su=d/u=d/a=d/d=d/sa • WFSX = u/a = u/d = u/sa • Dung Prakken&Sartor WFSX • Proof Theory and Top-down Proof Procedure adapted from Alferes, Damasio, Pereira’s SLXA
Comparison: Fuzzy Argumentation • Wagner: • Scale: -1 to +1 • Unlike WFSX, he relates F and F: F: -V iff F:V • We adopted his interpretation for not:not F:1 if F:V, V>0 • Relates his work to stable models, but there is no top-down proof procedure for stable models [Alferes&Pereira] • Our approach conservatively extends WFSX, hence we can adapt proof procedure SLXA
Comparison: Fuzzy unification • Arcelli, Formato, Gerla • define abstract fuzzy unification/resolution framework • cannot deal with missing parameters (common problem [Fung et al.]) • no conservative extension of classical unification • we use concrete distance: edit distance • Evaluated idea on bioinfo DB
Conclusion • “A database needs two kinds of negation”(Wagner) • Argumentation is an elegant way of defining semantics • Our framework allows classification of various new and existing semantics • Efficient top-down proof procedure for justified arguments • Argumentation as basis for belief revision (REVISE) • We cover the whole knowledge system cube including fuzzy argumentation • Defined fuzzy unification, which is useful in open systems
Part I.c: Vivid Agent • A vivid agent is a software-controlled system, • whose state is represented by a knowledge base and • whose behaviour is represented by • action- and • reaction rules • Actions are planned and executed to achieve a goal • Reactions are triggered by events • Epistemic RR: Effect <- Event, Cond • Physical RR: Action, Effect <- Event, Cond • Interaction RR: Msg, Effect <- Event, Cond
Intentions Goals Action rules Goals Planner Believes Believes KB Vivid Agent Interface Events Reaction Rules Perception Reaction Cycle Believes/ Updates KB
Agent State and Transition Semantics • Agent State: • Event queue, Plan queue, Goal queue, Knowledge base • Transition semantics • Perception • Add event to agent’s event queue • Reaction • Pop event from event queue, execute reactions including update of knowledge base • Plan execution • Execute action of plan in plan queue • Replanning • If action fails, replan • Planning • Pop goal from goal queue and generate plan
Implementation in Prova • Original Implementation in PVM-Prolog • Course-grain parallelism (PVM) for each agent and Prolog threads for an agent’s components • Currently: Prova • is a Java-based rule engine • easy integration of all kinds of data sources. e.g., database, web services, etc.
LLNE YLEEVE EYEEDE Part II: Application to Bioinformatics • NSF and EU’s strategic research workshop found that bioinformatics could play the role for the semantic web, which physics played for the web. • Why? • Masses of information • Masses of publicly accessible online information • (e.g. 8000 abstracts per month and over 500 tools) • Data (more and more often) published in XML • Data standards are accepted and actively developed • Much valuable information scattered (as production cheap and hence not centralised) • Systemsintegration and interoperation prime concern (e.g. GeneOntology)
Source Source Example: Information Agents for… • … Protein interactions • PDB, SCOP • … Protein annotation • TOPPred, HMMTOP,… • Information source • Wrapper • Mediator • Facilitator Facilitator Mediator Wrapper Source Wrapper Wrapper
Example 1: Protein Interaction: • PDB: Protein structures • SCOP: Structure classification
Example 1: Protein Interaction: How it is currently done • PDB: 15 Gigabyte in flat files • SCOP: 3 flat files • How? • Download PDB, SCOP files • Think up DB schema and populate MySQL DB • Run some Perl scripts on various machines, that grind through the data and analyse it • Run some Java to visualise results • Problem: “Business logic” not separated