Chemical entity extraction using the chemicalize org technology
Download
1 / 27

Chemical Entity extraction using the chemicalize-technology - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Chemical Entity extraction using the chemicalize.org-technology. Josef Scheiber Novartis Pharma AG – NITAS/TMS. Where the story of this project started . A day in October 2008 Some time around 7:45 in the morning . Novartis Campus. Dreirosenbrücke.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chemical Entity extraction using the chemicalize-technology' - roana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chemical entity extraction using the chemicalize org technology

Chemical Entity extraction using the chemicalize.org-technology

Josef Scheiber

Novartis Pharma AG – NITAS/TMS


Where the story of this project started
Where the story of this project started ...

A day in October 2008

Some time around 7:45

in the morning ...

Novartis Campus

Dreirosenbrücke


Vision for textmining integration chemical biological knowledge
Vision for textminingIntegration chemical, biological knowledge


Mining for chemical knowledge rationale
Mining for Chemical Knowledge - Rationale

  • Make text corpora searchable for chemistry

  • Generate chemistry databases for use in research based on Scientific Papers or Patents

  • Link Chemical Information with further annotation in an automated way for e.g. Chemogenomics applications

  • Patent analyis for MedChem projects

Connection table


Mining for chemical knowledge rationale1
Mining for chemical Knowledge - Rationale

Information on compounds targeting GPCRs

HELP

Information explosion

Source: Banville, Debra L. “Mining chemical structural information from the drug literature.” Drug Discovery Today, Number 1/2 Jan. 2006, p.35-42


Example project prospect royal society of chemistry
Example:Project Prospect – Royal Society of Chemistry

  • Enhancing Journal Articles with Chemical Features

This helps you identifying other articles talking about the same molecule


Mining for chemical knowledge focus for today
Mining for Chemical Knowledge – Focus for today

  • Make text corpora searchable for chemistry

  • Generate chemistry databasesfor use in research based onScientific Papers or Patents

  • Link Chemical Information with further annotation in an automated way for e.g. Chemogenomics applications

  • Patent analyis for MedChem projects

Connection table


A use case for successful patent mining molecules you sometimes find in your inbox
A use case for successful patent mining(molecules you sometimes find in your inbox ;-) )

Vardenafil

(2003, Bayer) – € 1.24 billion (USD 1.6 billion)

Sildenafil (1998, Pfizer) – € 11.7 billion (USD 15.1 billion)

Slide inspired by an example from Steve Boyer/IBM; Sales data from Prous Integrity datase



Facts current standard
Facts – current standard

... (ACS) owes most of its wealth to its two 'information services' divisions — the publications arm and the Chemical Abstracts Service (CAS), a rich database of chemical information and literature. Together, in 2004, these divisions made about $340 million — 82% of the society's revenue — and accounted for $300 million (74%) of its expenditure. Over the past five years, the society has seen its revenue and expenditure grow steadily ...

Source: ACS homepage


Facts
Facts

Established application

Straighforward use

De-facto Gold standard

Unique data source

Very costly

No structure export for reasonable price

Very limited in large-scale follow-up analysis

Most recent patents not available


Not data (search), but integration, analysis and insight, leading to decisionsanddiscovery


Now what would be the perfect solution
Now – What would be the perfect solution?

All patent offices require to provide all claimed structures as machine-readable version available for one-click-download 


Text extraction
Text extraction

Definition: Extract all molecules that are mentioned in a patent text of interest, convert them to structures and make them available in machine-readable format


Mining for chemical knowledge technologies from providers
Mining for Chemical KnowledgeTechnologies from providers


The objective
The objective

To provide a tool that provides sophisticated text analysis methods for NIBR scientists and thereby leverages the methods of TMS


Mining for chemical knowledge novartis tools the chemicalize technology is working under the hood
Mining for Chemical Knowledge – Novartis Tools – the chemicalize-technology is working under the hood!

Clipboard Analysis

Identified structures

Patent text

View structure onMouseOver

Export to other applications


Mining for knowledge novartis tools input example j med chem paper
Mining for Knowledge – Novartis Tools chemicalize-technology is working under the hood!Input example: J Med Chem Paper


Mining for chemical knowledge use case
Mining for Chemical Knowledge – Use Case chemicalize-technology is working under the hood!

Medicinal Chemist wants to synthesize competitor compound as tool compound for own project

This enables the identification of compounds most representative for a competitor patent

Identification of core scaffold

Analysis of substitution patterns


Example a text based patent
Example – A text-based patent chemicalize-technology is working under the hood!

A patent example

Automated Text extraction

452 compounds

Reference

636 compounds

71%


Example an image base patent
Example – An image-base patent chemicalize-technology is working under the hood!

  • Text extraction not suitable for this case, it does find only a meager 40 molecules, 1129 in reference – Why?

An entirely image-based patent example


Language issues e g japanese patents
Language issues – e.g. Japanese patents chemicalize-technology is working under the hood!


Encountered problems
Encountered problems chemicalize-technology is working under the hood!

  • OCR (Optical Character Recognition)!!

  • USPTO and WIPO are now available full text in most cases

  • Typos!

  • Name2Struct problems (less an issue here)


Ibm initiative patent mining chemverse database steve boyer
IBM initiative chemicalize-technology is working under the hood!Patent Mining / ChemVerse database (Steve Boyer)

  • The objective is to automatically extract all molecules from all patents available and make them searchable in a database

  • They leverage cloud computing and have access to all full-text patents

  • This is going absolutely the right direction

  • They annotate the molecules with information from freely available databases


Future ideas patent analysis
Future ideas: Patent Analysis chemicalize-technology is working under the hood!

  • Markush translation, Image+Target

  • Ranking capabilities of outcome for User

  • „blurred“ dicos for translating stuff like aryl, cycloalkyl etc.

  • Select  annotate as entity  on the fly error-correction

  • Result goes in a database  Crowdsourcing efforts to improve and store results

  • Suggest functionality


To enable true patinformatics analyses
To enable true Patinformatics analyses ... chemicalize-technology is working under the hood!

Definition by Tony Trippe:


Acknowledgements
Acknowledgements chemicalize-technology is working under the hood!

NITAS/TMS

  • Therese Vachon

  • Daniel Cronenberger

  • Pierre Parisot

  • Martin Romacker

  • Nicolas Grandjean

  • Clayton Springer

  • Naeem Yusuff

  • Bharat Lagu

  • Alex Fromm

  • Katia Vella

  • Olivier Kreim

And many other people in different divisions of NIBR for their support


ad