Chemical entity extraction using the chemicalize org technology
Sponsored Links
This presentation is the property of its rightful owner.
1 / 27

Chemical Entity extraction using the chemicalize.org-technology PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on
  • Presentation posted in: General

Chemical Entity extraction using the chemicalize.org-technology. Josef Scheiber Novartis Pharma AG – NITAS/TMS. Where the story of this project started . A day in October 2008 Some time around 7:45 in the morning . Novartis Campus. Dreirosenbrücke.

Download Presentation

Chemical Entity extraction using the chemicalize.org-technology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chemical Entity extraction using the chemicalize.org-technology

Josef Scheiber

Novartis Pharma AG – NITAS/TMS


Where the story of this project started ...

A day in October 2008

Some time around 7:45

in the morning ...

Novartis Campus

Dreirosenbrücke


Vision for textminingIntegration chemical, biological knowledge


Mining for Chemical Knowledge - Rationale

  • Make text corpora searchable for chemistry

  • Generate chemistry databases for use in research based on Scientific Papers or Patents

  • Link Chemical Information with further annotation in an automated way for e.g. Chemogenomics applications

  • Patent analyis for MedChem projects

Connection table


Mining for chemical Knowledge - Rationale

Information on compounds targeting GPCRs

HELP

Information explosion

Source: Banville, Debra L. “Mining chemical structural information from the drug literature.” Drug Discovery Today, Number 1/2 Jan. 2006, p.35-42


Example:Project Prospect – Royal Society of Chemistry

  • Enhancing Journal Articles with Chemical Features

This helps you identifying other articles talking about the same molecule


Mining for Chemical Knowledge – Focus for today

  • Make text corpora searchable for chemistry

  • Generate chemistry databasesfor use in research based onScientific Papers or Patents

  • Link Chemical Information with further annotation in an automated way for e.g. Chemogenomics applications

  • Patent analyis for MedChem projects

Connection table


A use case for successful patent mining(molecules you sometimes find in your inbox ;-) )

Vardenafil

(2003, Bayer) – € 1.24 billion (USD 1.6 billion)

Sildenafil (1998, Pfizer) – € 11.7 billion (USD 15.1 billion)

Slide inspired by an example from Steve Boyer/IBM; Sales data from Prous Integrity datase


Conventional Database Building


Facts – current standard

... (ACS) owes most of its wealth to its two 'information services' divisions — the publications arm and the Chemical Abstracts Service (CAS), a rich database of chemical information and literature. Together, in 2004, these divisions made about $340 million — 82% of the society's revenue — and accounted for $300 million (74%) of its expenditure. Over the past five years, the society has seen its revenue and expenditure grow steadily ...

Source: ACS homepage


Facts

Established application

Straighforward use

De-facto Gold standard

Unique data source

Very costly

No structure export for reasonable price

Very limited in large-scale follow-up analysis

Most recent patents not available


Not data (search), but integration, analysis and insight, leading to decisionsanddiscovery


Now – What would be the perfect solution?

All patent offices require to provide all claimed structures as machine-readable version available for one-click-download 


Text extraction

Definition: Extract all molecules that are mentioned in a patent text of interest, convert them to structures and make them available in machine-readable format


Mining for Chemical KnowledgeTechnologies from providers


The objective

To provide a tool that provides sophisticated text analysis methods for NIBR scientists and thereby leverages the methods of TMS


Mining for Chemical Knowledge – Novartis Tools – the chemicalize-technology is working under the hood!

Clipboard Analysis

Identified structures

Patent text

View structure onMouseOver

Export to other applications


Mining for Knowledge – Novartis ToolsInput example: J Med Chem Paper


Mining for Chemical Knowledge – Use Case

Medicinal Chemist wants to synthesize competitor compound as tool compound for own project

This enables the identification of compounds most representative for a competitor patent

Identification of core scaffold

Analysis of substitution patterns


Example – A text-based patent

A patent example

Automated Text extraction

452 compounds

Reference

636 compounds

71%


Example – An image-base patent

  • Text extraction not suitable for this case, it does find only a meager 40 molecules, 1129 in reference – Why?

An entirely image-based patent example


Language issues – e.g. Japanese patents


Encountered problems

  • OCR (Optical Character Recognition)!!

  • USPTO and WIPO are now available full text in most cases

  • Typos!

  • Name2Struct problems (less an issue here)


IBM initiative Patent Mining / ChemVerse database (Steve Boyer)

  • The objective is to automatically extract all molecules from all patents available and make them searchable in a database

  • They leverage cloud computing and have access to all full-text patents

  • This is going absolutely the right direction

  • They annotate the molecules with information from freely available databases


Future ideas: Patent Analysis

  • Markush translation, Image+Target

  • Ranking capabilities of outcome for User

  • „blurred“ dicos for translating stuff like aryl, cycloalkyl etc.

  • Select  annotate as entity  on the fly error-correction

  • Result goes in a database  Crowdsourcing efforts to improve and store results

  • Suggest functionality


To enable true Patinformatics analyses ...

Definition by Tony Trippe:


Acknowledgements

NITAS/TMS

  • Therese Vachon

  • Daniel Cronenberger

  • Pierre Parisot

  • Martin Romacker

  • Nicolas Grandjean

  • Clayton Springer

  • Naeem Yusuff

  • Bharat Lagu

  • Alex Fromm

  • Katia Vella

  • Olivier Kreim

And many other people in different divisions of NIBR for their support


  • Login