1 / 33

Uwe Reyle Institute of Computational Linguistics University of Stuttgart

Processing Natural Language Comments in Biological Databases: Molecular Assemblies and Their Catalitic Functions. A Case Study. Uwe Reyle Institute of Computational Linguistics University of Stuttgart. EML European Media Laboratory Heidelberg INRIA Institute National de Recherche en

sheera
Download Presentation

Uwe Reyle Institute of Computational Linguistics University of Stuttgart

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processing Natural Language Comments in Biological Databases: Molecular Assemblies and Their Catalitic Functions.A Case Study Uwe Reyle Institute of Computational Linguistics University of Stuttgart

  2. EML European Media Laboratory Heidelberg INRIA Institute National de Recherche en Informatique et Automatique Grenoble

  3. Biological Databases Enzymes Compounds Pathways Proteins flat files – no relational/deductive databases made for Biologists – not for Machines

  4. Biological Databases Enzymes Data-Model Ontology Compounds Pathways Proteins Efficient Querying

  5. Overview • Genes, Proteins and Enzymes • Swissprot Protein Database • Two Examples • Semantic Processing • Parsing Protein Names • Merits for • Coreference Resolution • Extraction/Detaction of Molecular Assemblies

  6. compounds (e.g. sugar...) gene EC enzyme molecularassembly polypeptide biochemical reactions Molecular Assembly Catalitic Activity Posttranslational Modifications EC Translation Transcription EC Chromosome Pathways

  7. compounds (e.g. sugar...) gene EC enzyme molecularassembly polypeptide biochemical reactions SUBUNIT CATALITIC ACTIVITY DE INCLUDES CONTAINS EC Swissprot Entries DE POS EC FUNCTION Chromosome PATHWAY

  8. Reference Database Enzymes Compounds Pathways Proteins Swissprot

  9. Swissprot vs. Medline • fact + • organism + • experimental context • enormous vocabulary • coreference = • intra-document coreference • + coreference to database Medline Abstracts IE Papers Swissprot • fact • much smaller vocabulary • coreference = • intra-document coreference

  10. Coreference to DE-line of database entry SYNONYMSpeptidase, dipeptidyl, IVPep X leukocyte antigen CD26 glycylprolyl dipeptidylaminopeptidaseglycylproline-dipeptidyl-aminopeptidaseglycylproline aminopeptidaseXaa-Pro-dipeptidyl-aminopeptidasedipeptidyl-peptide hydrolaselymphocyte, antigen CD26postproline dipeptidyl aminopeptidase IVglycylprolyl aminopeptidasedipeptidyl-aminopeptidase IVGly-Pro-naphthylamidaseDPP IV/CD26 glycoprotein GP110amino acyl-prolyl dipeptidyl aminopeptidasedipeptidyl aminopeptidase IVT cell triggering molecule Tp103 dipeptidyl-peptidase IV (CD26) X-prolyl dipeptidyl aminopeptidaseX-PDAP aminopeptidase, glycylproline • RECOMMENDED NAMEdipeptidyl-peptidase IV

  11. Structure of Swissprot Entries Quality of information by marking: experiment, similarity, ... Each entry refers to a polypeptide in one single organism 3. The different line types 3.1    The ID line 3.2    The AC line 3.3    The DT line 3.4    The DE line 3.5    The GN line 3.6    The OS line 3.7    The OG line 3.8    The OC line 3.9    The OX line 3.10  The reference (RN, RP, RC, RX, RA, RT, RL) lines 3.11  The CC line 3.12  The DR line 3.13  The KW line 3.14  The FT line 3.15  The SQ line 3.16  The sequence data line 3.17  The // line

  12. An Example ID ACCD_ECOLI STANDARD; PRT; 304 AA. AC P08193; P78251; P76937; DE ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE SUBUNIT BETA (EC 6.4.1.2) (ACCASE BETA CHAIN). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL-COA. CC -!- SUBUNIT: ACETYL-COA CARBOXYLASE IS AN HETEROHEXAMER OF BIOTIN CARBOXYL CARRIER PROTEIN, BIOTIN CARBOXYLASE AND THE TWO SUBUNITS OF CARBOXYL TRANSFERASE IN A 2:2 COMPLEX. -!- CATALITIC ACTIVITY, PATHWAY, SIMILARITY, FEATURES, ...

  13. Variety of SUBUNIT-lines • HETERODIMER ... • PP2 CONSISTS OF A COMMON HETERODIMERIC CORE ENZYME, COMPOSED OF A 36 KDA CATALITIC SUBUNIT (SUBUNIT C) AND A 65 KDA CONSTANT REGULATORY SUBUNIT (PR65 OR SUBUNIT A), THAT ASSOCIATES WITH A VARIETY OF REGULATORY SUBUNITS. PROTEINS THAT ASSOCIATE WITH THE CORE DIMER INCLUDE THREE FAMILIES OF ...

  14. Subunit-lines of type NP <DE> A-kinase anchor protein 5 <SUBUNIT> BINDING PROTEIN FOR DIMER OF PKA AND ALSO FOR PKC AND PP2B. EACH ENZYME IS INHIBITED WHEN BOUND TO THE ANCHOR PROTEIN. <DE> Potassium-transporting ATPase alpha chain <SUBUNIT> HETERODIMERCOMPOSED OF TWO SUBUNITS, ALPHA AND BETA.

  15. Subunit-lines of type NP <DE> A-kinase anchor protein 5 <SUBUNIT> BINDING PROTEIN FOR DIMER OF PKA AND ALSO FOR PKC AND PP2B. EACH ENZYME IS INHIBITED WHEN BOUND TO THE ANCHOR PROTEIN.  (AKAP5 : (PKA:PKA)), where PKA is inhibited  PKC AKAP5andPP2B  AKAP5 where PKC and PP2B are inhibited <DE> Potassium-transporting ATPase alpha chain <SUBUNIT> HETERODIMERCOMPOSED OF TWO SUBUNITS, ALPHA AND BETA.

  16. Subunit-lines of type NP <DE> A-kinase anchor protein 5 <SUBUNIT> BINDING PROTEIN FOR DIMER OF PKA AND ALSO FOR PKC AND PP2B. EACH ENZYME IS INHIBITED WHEN BOUND TO THE ANCHOR PROTEIN. <DE> Potassium-transporting ATPase alpha chain <SUBUNIT> HETERODIMERCOMPOSED OF TWO SUBUNITS, ALPHA AND BETA.  Potassium-transporting ATPase  (alpha : beta)  Potassium-transporting ATPase alpha chain  (alpha : beta)

  17. Subunit-lines of type NP <DE> A-kinase anchor protein 5 <SUBUNIT> BINDING PROTEIN FOR DIMER OF PKA AND ALSO FOR PKC AND PP2B. EACH ENZYME IS INHIBITED WHEN BOUND TO THE ANCHOR PROTEIN. <DE> Potassium-transporting ATPasealpha chain <SUBUNIT> HETERODIMERCOMPOSED OF TWO SUBUNITS, ALPHA AND BETA.  Potassium-transporting ATPase  (alpha : beta)  Potassium-transporting ATPase alpha chain  (alpha : beta) Task: parse recommended name

  18. Structure of Polypeptide Names that Refer to Subunits of Proteines AssemblyName homolog precursor ; phrase(s) vacuolar soluable anaerobic SubunitRef Protein Name Enzyme Name {beta 1, ASHI, lacH, ...} subunit 30 kda subunit {small, major, second largest,...} subunit type B catalitic subunit subunit {alpha 3, 2 type B, ...} iron-sulfur subunit alpha-2 {alpha, light, catalitic,...} chain cytochrome B-558

  19. Problems • We cannot assume a dictionary of assembly names • AssemblyName very often end with a highly ambiguous symbol that may also be used to start the SubunitRef expression - F, A1, I, II, i, ..., geneName, ... • Nomenclature of subunits does not exist • Contextual knowledge is needed to disambiguate, e.g., XYase A1 large chain

  20. Assembly Names • Mitogen-activated protein kinase kinase kinase  kinase acting on a kinase that acts on a protein kinase one of these kinases is mitogen-activated, not the protein, however • „kinase“ has 1 semantic argument, namely the molecule X that it phosphorylates Acceptor/Donor Group phosphoryl Function transfer Acceptor/Donor ... Group phosphoryl Function transfer

  21. Assembly Names CoA Carboxylase Carboxyl Transferase Acceptor/Donor CoA Carboxylase Group carboxyl Function transfer Acceptor/Donor X Group carboxyl Function transfer ADJ-Rel CoA Carboxylase With ADJ-Rel  {,is_expressed_by, ...}

  22. Semantic Relations projected from the Lexicon • carboxyl transferase  transcarboxylase (IUPAC) • transcarboxylation  carboxylation • transcarboxylate  carboxylate • phosphorylate, biotinylate, adenylylate, ... • transphosphorylate, ... • crossphosphorylate, ...

  23. Coreference (local) ID ACCD_ECOLI STANDARD; PRT; 304 AA. AC P08193; P78251; P76937; DE ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE SUBUNIT BETA (EC 6.4.1.2) (ACCASE BETA CHAIN). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL-COA. CC -!- SUBUNIT: ACETYL-COA CARBOXYLASE IS AN HETEROHEXAMER OF BIOTIN CARBOXYL CARRIER PROTEIN, BIOTIN CARBOXYLASE AND THE TWO SUBUNITS OF CARBOXYL TRANSFERASE IN A 2:2 COMPLEX.

  24. Coreference (local) ID ACCD_ECOLI STANDARD; PRT; 304 AA. AC P08193; P78251; P76937; DE ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE SUBUNIT BETA (EC 6.4.1.2) (ACCASE BETA CHAIN). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL-COA. CC -!- SUBUNIT: ACETYL-COA CARBOXYLASE IS AN HETEROHEXAMER OF BIOTIN CARBOXYL CARRIER PROTEIN, BIOTIN CARBOXYLASE AND THE TWO SUBUNITS OF CARBOXYL TRANSFERASE IN A 2:2 COMPLEX.

  25. Coreference (local) ID ACCD_ECOLI STANDARD; PRT; 304 AA. AC P08193; P78251; P76937; DE ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE SUBUNIT BETA (EC 6.4.1.2) (ACCASE BETA CHAIN). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL-COA. CC -!- SUBUNIT: ACETYL-COA CARBOXYLASE IS AN HETEROHEXAMER OF BIOTIN CARBOXYL CARRIER PROTEIN, BIOTIN CARBOXYLASE AND THE TWO SUBUNITS OF CARBOXYL TRANSFERASE IN A 2:2 COMPLEX. PP-attachment: semantics of Heterohexamer

  26. Coreference (non-local) ID ACCD_ECOLI STANDARD; PRT; 304 AA. AC P08193; P78251; P76937; DE ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE SUBUNIT BETA (EC 6.4.1.2) (ACCASE BETA CHAIN). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL-COA. CC -!- SUBUNIT: ACETYL-COA CARBOXYLASE IS AN HETEROHEXAMER OF BIOTIN CARBOXYL CARRIER PROTEIN, BIOTIN CARBOXYLASE AND THE TWO SUBUNITS OF CARBOXYL TRANSFERASE IN A 2:2 COMPLEX.  ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE SUBUNIT XYZ XYZ SUBUNIT OF ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE ...

  27. Coreference (non-local) ID ACCA_ECOLI STANDARD; PRT; 318 AA. AC P30867; DE ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE SUBUNIT ALPHA (EC 6.4.1.2). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL- COA. CC -!- CATALYTIC ACTIVITY: CARBOXYBIOTIN CARBOXYL CARRIER PROTEIN + ACETYL-COA = BIOTIN CARBOXYL CARRIER PROTEIN + MALONYL-COA. CC -!- PATHWAY: FIRST STEP IN LONG-CHAIN FATTY ACID SYNTHESIS. CC -!- SUBUNIT: ACETYL-COA CARBOXYLASE IS AN HETEROHEXAMER OF BIOTINCARBOXYL CARRIER PROTEIN, BIOTIN CARBOXYLASE AND THE TWO SUBUNITS OF CARBOXYL TRANSFERASE IN A 2:2 COMPLEX. CC -!- SIMILARITY: TO THE C-TERMINUS OF MAMMALIAN PROPIONYL-COA CARBOXYLASE BETA CHAIN.

  28. Coreference (non-local) ID BCCP_ECOLI STANDARD; PRT; 156 AA. AC P02905; DE BIOTIN CARBOXYL CARRIER PROTEIN OF ACETYL-COA CARBOXYLASE (BCCP). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL-COA. CC -!- PATHWAY: FIRST STEP IN LONG-CHAIN FATTY ACID SYNTHESIS. CC -!- SUBUNIT: HOMODIMER.

  29. Coreference (non-local) ID ACCC_ECOLI STANDARD; PRT; 449 AA. AC P24182; DE BIOTIN CARBOXYLASE (EC 6.3.4.14) (A SUBUNIT OF ACETYL-COA CARBOXYLASE) (EC 6.4.1.2) (ACC). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL-COA. CC -!- CATALYTIC ACTIVITY: ATP + BIOTIN-CARBOXYL-CARRIER PROTEIN + CO(2) = ADP + ORTHOPHOSPHATE + CARBOXYBIOTIN-CARBOXYL-CARRIER PROTEIN. CC -!- PATHWAY: FIRST STEP IN LONG-CHAIN FATTY ACID SYNTHESIS. CC -!- SUBUNIT: ACETYL-COA CARBOXYLASE IS AN HETEROHEXAMER OF BIOTIN CARBOXYL CARRIER PROTEIN, BIOTIN CARBOXYLASE AND THE TWO SUBUNITS OF CARBOXYL TRANSFERASE IN A 2:2 COMPLEX. CC -!- SIMILARITY: TO OTHER BIOTIN-DEPENDENT ENZYMES AND CARBAMOYL- PHOSPHATE SYNTHETASES.

  30. Extraction ID ACCD_ECOLI STANDARD; PRT; 304 AA. AC P08193; P78251; P76937; DE ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE SUBUNIT BETA (EC 6.4.1.2) (ACCASE BETA CHAIN). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL-COA. CC -!- SUBUNIT: ACETYL-COA CARBOXYLASE IS AN HETEROHEXAMER OF BIOTIN CARBOXYL CARRIER PROTEIN, BIOTIN CARBOXYLASE AND THE TWO SUBUNITS OF CARBOXYL TRANSFERASE IN A 2:2 COMPLEX. CC -!- SIMILARITY: BELONGS TO THE ACCD / PCCB FAMILY. Complex consisting of 6 subunits

  31. Extraction ID ACCD_ECOLI STANDARD; PRT; 304 AA. AC P08193; P78251; P76937; DE ACETYL-COENZYME A CARBOXYLASE CARBOXYL TRANSFERASE SUBUNIT BETA (EC 6.4.1.2) (ACCASE BETA CHAIN). CC -!- FUNCTION: THIS PROTEIN IS A COMPONENT OF THE ACETYL COENZYME A CARBOXYLASE COMPLEX; FIRST, BIOTIN CARBOXYLASE CATALYZES THE CARBOXYLATION OF THE CARRIER PROTEIN AND THEN THE TRANSCARBOXYLASE TRANSFERS THE CARBOXYL GROUP TO FORM MALONYL-COA. CC -!- SUBUNIT: ACETYL-COA CARBOXYLASE IS AN HETEROHEXAMEROF BIOTIN CARBOXYL CARRIER PROTEIN, BIOTIN CARBOXYLASE AND THE TWO SUBUNITS OF CARBOXYL TRANSFERASE IN A 2:2 COMPLEX. CC -!- SIMILARITY: BELONGS TO THE ACCD / PCCB FAMILY. Acetyl-CoA Carboxylase Carrier Protein Biotin Carboxylase Carboxyl Transferase Alpha Alpha Beta Beta

  32. Completing the Picture ID BIRA_ECOLI STANDARD; PRT; 321 AA. AC P06709; CC -!- FUNCTION: BIRA ACTS BOTH AS A BIOTIN-OPERON REPRESSOR AND AS THE ENZYME THAT SYNTHESIZES THE COREPRESSOR, ACETYL COA:CARBON-DIOXIDE LIGASE. THIS PROTEIN ALSO ACTIVATES BIOTIN TO FORM BIOTINYL-5'-ADENYLATE AND TRANSFERS THE BIOTIN MOIETY TO BIOTIN-ACCEPTING PROTEINS. CC -!- CATALYTIC ACTIVITY: ATP + BIOTIN + APO-[ACETYL-COA:CARBON-DIOXIDE LIGASE (ADP FORMING)] = AMP + PYROPHOSPHATE + [ACETYL-COA:CARBON-DIOXIDE LIGASE (ADP FORMING)]. CC -!- SUBUNIT: MONOMER. CC -!- SIMILARITY: WITH OTHER BACTERIAL BIRA AND WITH EUKARYOTIC BIOTIN APO-PROTEIN LIGASE. = Acetyl CoA Carboxylase

  33. Conclusion • Sophisticated IE must incorporate • Domain Ontology (EML, INRIA,IMS) • Lexical Semantics (IMS) • Morphological Analysis + Compositional Semantics (IMS) • Discourse Semantics (IMS) • Work on the Lexicon of Cell-Biology • Organic Chemical Compounds „Was bedeutet UREYLEN“ (C. Gerstenberger, IMS) • Semantic/ontological classification of  100 chemical Verbs (Phillip Cimiano Lavin, IMS) • Enzyme- and Protein Names (work in progres)

More Related