Biological Network Querying Techniques: Analysis and Comparison

Biological Network Querying Techniques: Analysis and Comparison Valeria Fionda, Luigi Palopoli SHAY MICHAELI, ANNA ROMANOV

Table of contents • Introduction • Problem Statement • Biological Network modeling • Node Similarity Computation • Approximation Handling • Tools Comparison • PPI Networks • Metabolic Networks • Multiple Kinds of Networks • General Graphs • Summary

Introduction Network querying tools search a whole biological network to identify conserved occurrences of a given query module for transferring biological knowledge. In the last few years, several techniques have been developed to query biological networks. The aim of this paper is that of analyzing and comparing tools devised to query biological networks.

Introduction The following specific aspects were considered: • adopted network model • biological information exploited (e.g., sequence similarity, interaction reliabilities, etc.) • delivery of exact versus approximate results • types of approximation supported (e.g., node insertions and deletions) • handling of general versus specific types of network; • supported query structures • adoption of exact versus heuristic algorithms • computational complexity • software availability and modifiability • user interface • data used for the evaluation • biological results obtained.

Problem Statement

Biological Network modeling • (a) Transcriptional Regularity Networks: • Nodes: • transcriptional factors • mRNAs • Directed Edges: • transcriptional regulations • transcriptional translations

Biological Network modeling • (b) Signal Transduction Networks: • Nodes: • proteins • Directed Edges • This type of networks stores information about the processes through which a cell converts one kind of signal or stimulus into another

Biological Network modeling • (c) Metabolic Networks: • Nodes: • metbolites • reactions • enzymes • Directed Edges: • mass ﬂow • catalytic regulation

Biological Network modeling (d) Protein-Protein-Interaction Networks: • Nodes: • proteins • Un-Directed and possibly weighted Edges: • two proteins connected by an edge if they bind

Node Similarity Computation • Network querying algorithms exploit a similarity score which is computed the following way: • In Protein-Protein-Interaction Networks: • By aligning the proteins amino acid sequences using: • BLAST(Basic Local Alignment Search Tool) • PRSS routine of the FASTA package – used only by Qian The output is accompanied by an E-value. The analyzed techniques differ from one another in the E-value threshold used to assess if two proteins are similar (for example 10^-2 for PATHBLAST and 10^-7 for Torque). • By exploiting databases like COG (Clusters of Orthologous Groups) or KEGG (Kyoto Encyclopedia of Genes and Genomes) that organize proteins into orthologous groups, so that two proteins are deemed similar if they belong to the same group.

Node Similarity Computation • Network querying algorithms exploit a similarity score which is computed the following way: • In Metabolic Networks: • Using EC (Enzyme Commission) classiﬁcation: a numbering system, consisting of four sets of numbers, that categorize the type of the catalyzed chemical reaction. • give functional classification that does not necessarily reflect sequence similarity

Approximation Handling Approximation handling is needed for dealing withpossible occurrences of evolution events modifyinga network structure: • Node insertions, corresponding to the additionof nodes in the target network • Node deletions, corresponding to the additionsof nodes in the query network • Node mismatches, corresponding to pairs ofnodes characterized by a low similarity, butsharing similar biological characteristics (e.g., proteins performing the same function). figure (a) – the query network figure (b) – the target network figure (c) – a potential solution

Comparison and Biological results • PATHBLAST (Kelley BP,2004) • Qpath (Shlomi,2006) • Qnet (Banu Dost,2007) • Qian (2009) • Fionda (2008) • Torque (Bruckner, 2009) • MetaPathwayHunter (Pinter,2005) • MetaPAT (Wernicke,2007) • SAGA (Tian,2007) • GraphMatch (Yang,2007) • PathMatch (Yang,2007) • NetMatch (Ferro,2007) • GenoLink (Durand,2006) *All theexperimentswereexecutedonanIntel Core2 Duorunningat 2.4GHz with 4 GB RAM

Protein-Protein Interaction Networks PATHBLAST • Query Structures: pathways • Starts by building a global alignment graph • Each node represents a pair of similar proteins: • Each edge represents interactions/ gaps/ mismatches

Protein-Protein Interaction Networks PATHBLAST – Biological Results • Query: mating-pheromone response pathway. Target: yeast network

Protein-Protein Interaction Networks Qpath • Query Structures: pathways • Based on the color coding technique • Total insertions and deletions are bounded

Protein-Protein Interaction Networks Qpath – Biological Results: • Query: mating-pheromone response pathway. Target: yeast network

Protein-Protein Interaction Networks Qian • Query Structures: pathways • Bound only on node insertions • Based on computing hidden Markov models (HMMs) • PPI networks are modelled using the HMM formalism that embeds protein similarities into its probabilistic framework

Protein-Protein Interaction Networks Qian – Biological Results: • Query: mating-pheromone response pathway. Target: yeast network

Protein-Protein Interaction Networks Qnet • Query Structures: Trees or graphs with bounded tree width • Based on the color coding technique • Total insertions and deletions are bounded

Protein-Protein Interaction Networks Qnet – Biological Results: • Query: yeast actine-related-proteins module. Target: human network

Protein-Protein Interaction Networks Torque • Query Structures: topology-free • Bounded number of insertions and deletions • Based on dynamic programming

Protein-Protein Interaction Networks Torque – Biological Results: • Query: yeast actine-related-proteins module. Target: human network

Protein-Protein Interaction Networks Fionda • Query Structures: general graphs • Associates distance score and stops when a solution with a distance score lower than the threshold is found

Protein-Protein Interaction Networks Fionda – Biological Results: • Query: yeast actine-related-proteins module. • Target: human network

Metabolic Networks • Important to note: comparison of biological results of tools for metabolic networks is not as informative as of tools for PPI networks. • Some of the approaches proposed to query metabolic networks (Pinter et al., 2005; Tian et al., 2007) use a collection of distinct metabolic pathways as the target network, while others like GraphMatch (Yang and Sze, 2007) require in input a unique network. • PathMatch and GraphMatch take into account also the information about the similarity between compounds taking part in the reactions, while the other systems exploit only information about enzymes. • The data used for the evaluation of biological results wes taken from the KEGG database (Kanehisa and Goto, 2000).

Metabolic Networks MetaPathwayHunter • Query Structure: multi-source tree • Target: forest of multi-source trees • Handles both node deletions from the query module and node insertions in the retrieved target submodules (at most one consecutive insertion) • Exhaustively computes both all optimal solutions and several suboptimal, which are ranked by their statistical significance • Exploits a bottom-up dynamic programming approach based on a subtree homeomorphism computation • Exact algorithm

Metabolic Networks MetaPathwayHunter – Biological results • Query: Homoserine and methionine biosynthesis of E.coli • Compund information was deleted from the original pathway since MetaPathwayHunter consider only enzyme information • Target: S.cerevisiae (a species of yeast) figure (a) – the query pathway figure (b) – the retrieved pathway

Metabolic Networks MetaPAT • Query Structure: General graph • Nodes represent compounds • Edges are labeled using the EC numbers of enzymes catalyzing the reactions, such that the source metabolite is the reactant and the sink metabolite is the product • The approach exhaustively examines all the sub-graphs of the target network that are homeomorphic to the query subgraph. • Heuristic algorithm

Metabolic Networks MetaPAT – Biological results • Query: Pentose phosphate pathway of the yeast • Target: Pentose phosphate pathway of the human figure (a) – the query graph

Metabolic Networks MetaPAT – Biological results • Query: Pentose phosphate pathway of the yeast • Target: Pentose phosphate pathway of the human figure (b) – the target graph

Metabolic Networks MetaPAT – Biological results • Query: Pentose phosphate pathway of the yeast • Target: Pentose phosphate pathway of the human • MetaPAT was not able to retrieve any matching sub-graph • Therefore, D-gluconate and the enzyme 2.7.1.12 were deleted figure (a) – the query graph

Metabolic Networks MetaPAT – Biological results • Query: Pentose phosphate pathway of the yeast • Target: Pentose phosphate pathway of the human • After the deletion, the following result was retrieved • Running time: in the order of few seconds figure (c) – the result retrieved by MetaPAT

Metabolic Networks SAGA • Query Structure: General graph • Exploits only information about enzymes • A label is associated with each node of the query and each node of the graphs in the database with the aim of identifying node mismatches. • Only one consecutive node insertion and only one node deletion is allowed • Heuristic algorithm

Metabolic Networks SAGA – Biological Results • Query: Pentose phosphate pathway of the yeast • Obtained by running a script that translates the KEGG KGML format to the SAGA format (needed because SAGA does not take into account the compounds involved in the reactions) • Target: Pentose phosphate pathway of the human figure (a) – the query graph

Metabolic Networks SAGA – Biological Results • Query: Pentose phosphate pathway of the yeast • Obtained by running a script that translates the KEGG KGML format to the SAGA format • Target: Pentose phosphate pathway of the human figure (a) – the query graph

Metabolic Networks SAGA – Biological Results • Query: Pentose phosphate pathway of the yeast • Obtained by running a script that translates the KEGG KGML format to the SAGA format • Target: Pentose phosphate pathway of the human • In the result, 15 out of 17 enzymes were correctlymatched. • Running time: in the order of few seconds figure (b) – the result retrieved by SAGA

Multiple Kinds of Networks PathMatch • Query Structure: Pathway • Nodes are enzymes or compounds • Each node of the target network may correspond to more than one node of the query subnetwork • Fixes the maximum number of allowed node insertions for each direct edge in the query graph by a threshold value • Heuristic algorithm, reduces the query problem to that of finding the longest weighted path in a directed acyclic graph

Multiple Kinds of Networks PathMatch - Biological results • Query: Homoserine and methionine biosynthesis of E.coli • Target: S.cerevisiae (a species of yeast) • Running time: order of few seconds figure (a) – the query pathway figure (b) – the retrieved pathway

Multiple Kinds of Networks GraphMatch • Query Structure: General graph • Nodes are enzymes or compounds • Each node of the target network may correspond to more than one node of the query subnetwork • For each enzyme node, an incoming edge occurs with each of its substrate nodes and an outgoing edge occurs with each of its product nodes • Fixes the maximum number of allowed node insertions for each direct edge in the query graph by a threshold value • Exact algorithm

Multiple Kinds of Networks GraphMatch – Biological Results • Query: Pentose phosphate pathway of the yeast • Target: Pentose phosphate pathway of the human • Similarity score between compounds was computed using SimComp tool • All pairs with SimComp score higher than a fixed threshold were given the maximum similarity value • Therefore, some pairs were incorrectly identified to be interchangeable • Running time: order of few minutes figure (a) – the query graph

Multiple Kinds of Networks GraphMatch – Biological Results • Query: Pentose phosphate pathway of the yeast • Target: Pentose phosphate pathway of the human • Similarity score between compounds was computed using SimComp tool • All pairs with SimComp score higher than a fixed threshold were given the maximum similarity value • Therefore, some pairs were incorrectly identified to be interchangeable • Running time: order of few minutes figure (b) – the result retrieved by GraphMatch

General Biological Graphs GenoLink • Query Structure: General graph • Nodes and edges are constrained. The nodes of the graphs may represent biological objects with the edges modeling the relationships holding among the nodes • Able to retrieve an occurrence of the query graph in the target graph that must feature: • exactly the same topology as the query graph • all its nodes and edges must have the same data types (or subtypes) • all the query constraints on attributes must be satisfied. • Exact algorithm

General Biological Graphs NetMatch • Query Structure: General graph • Queries may be structurally approximated: some of their parts may be left unspecified • Each node and edge may have associated a list of attributes specifying query constraints • The resulting sub-graphs are connected according to the same structure as the query graph • Able to handle query and target graphs with more than one edge between a pair of nodes • Was built as a Cytoscape plugin • Exact algorithm

Summary • All the tools are on the average capable to find biologically significant results even if some tools outperforms the others. • According to the paper’s authors, PathMatch is the most promising tool among those that deal with pathway queries in PPI networks. Among the tools that deal with general graph queries in PPI networks, Fionda and GraphMatch seem to outperform the others. • As for the tools that deal with metabolic networks, the comparison is more difficult to carry out since the tools use different representations of input data. PathMatch and GraphMatch appear to be interesting and useful tools • For pathway queries, the best choices are PathMatch and Qian which are able to deal with node mismatches, insertions and deletions and have the lowest time complexity among the considered systems (linear). • For queries shaped as general graphs the best tools appear to be Fionda and GraphMatch. • Torque proves itself to be quite innovative for it opening an appealing view on the topology-free querying issue.

Summary • Possible improvements: • Using additional biological information (e.g. GO terms or interaction reliability). • Taking into account all possible biological diversities (e.g., approximations in resulting sub-graphs). • The results obtained so far are promising and the efforts within this research area have been steadily increasing in the last few years.However, computational techniques for network querying are still at an early stage, thus making this research area still open and worth investigating

Biological Network Querying Techniques: Analysis and Comparison

Biological Network Querying Techniques: Analysis and Comparison

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7