The Medical Information System - MedISys eHealth 2009 Second International ICST Conference on Electronic Healthcare for the 21st century - PowerPoint PPT Presentation

Slide1 l.jpg
1 / 33

The Medical Information System - MedISys eHealth 2009 Second International ICST Conference on Electronic Healthcare for the 21st century September 23-25, 2009 - Istanbul, Turkey Erik van der Goot & the OPTIMA team ( OPensource Text Information Mining and Analysis )

Related searches for The Medical Information System - MedISys eHealth 2009 Second International ICST Conference on Electronic Healthcare for the 21st century

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

The Medical Information System - MedISys eHealth 2009 Second International ICST Conference on Electronic Healthcare for the 21st century

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Slide1 l.jpg

The Medical Information System - MedISys

eHealth 2009

Second International ICST Conference


Electronic Healthcare for the 21st century

September 23-25, 2009 - Istanbul, Turkey

Erik van der Goot & the OPTIMA team (OPensource Text Information Mining and Analysis )

European Commission – Joint Research Centre (JRC)Institute for the Protection and Security of the Citizen (IPSC)

Jrc who l.jpg

JRC - who

Jrc where l.jpg

JRC - where

Medisys overview l.jpg

MedISys - Overview


Provide open source data collection and analysis for surveillance and epidemiology

Replace manual scanning of multiple newspapers and web portals

Support national and international Public Health (PH) organisations to monitor issues of Public Health concern (e.g. CBRN)


Gather, filter, classify, extract and aggregate health-related information

Monitor trends, detect breaking news

Visualise analysis results

Alert users

Allows customised views

In combination with RNS tool, allows manual moderation.

Background history l.jpg

Background - History

Based on JRC’s Europe Media Monitor (EMM) technology (EMM live since 2002;

On request / initiative of the EC’s Directorate General for Health and Consumer Protection (DG SANCO).

Password-protected service for Public Health bodies since 2005.

Public service since early 2007 (, restricted functionality).

Background media monitoring l.jpg

Background - Media Monitoring

  • EU Commission Media Monitoring (until 2001/2002)

    • Traditional cut and paste for printed press only

    • Monitoring of incoming news wires (e.g. Reuters, AFP)

    • Simple keyword based filtering of wires

    • Manual selection of printed press items

    • Human classification of items

  • Potential problems

    • Not ‘real-time’ for mainstream media: printed press typically once a day

    • Limited coverage: not all media is printed

    • Inaccurate and incomplete classification: subjective and limited number of categories

    • Labour intensive and expensive: limited number of articles per reviewer per day, requires topical knowledge and requires language knowledge

Emm history l.jpg

EMM History

  • New Challenges (as seen in 2002)

    • Enlargement (+10 countries): more media, more languages

    • More use of electronic publishing (media)

    • Electronic distribution of results (web+mobile)

    • Automatic alerting functions

  • New approach: EMM - a one stop shop for Media Monitoring

    • Facilitate (not replace) human Media Monitoring activities

    • Extend monitoring beyond the traditional news wires (Internet).

    • Improve coverage, number of languages, analysis.

    • Apply automatic categorization and analysis to all sources

    • Provide new services like automatic e-mail, sms, mobile editions etc.

    • Provide editorial system to manage the information and produce newsletters etc.

    • Important: EMM is notYet Another Internet Search Engine

Emm system features l.jpg

EMM System Features

  • Automatic language recognition

  • Based on continuously updated language specific frequency tables

  • Automated information/entity extraction

  • 400.000 persons and organizations based on continuously updated list of entities, many language specific synonyms.

  • Geotagging

  • Based on homegrown harmonised multilingual geo-data set, about 600.000 place name variants in most languages covered by EMM, mostly national capitals, regional capitals and provincial capitals.

  • Improved Categorization Engine

  • Boolean combinations, proximity, wildcards

  • Support for Arabic and similar (automatic noun-prefix processing) Support for Chinese and similar (no whitespace)

  • Tonality/Sentiment

  • Simple bag of words approach, range from very negative to very positive, corrected for long term source bias, interesting for following reporting trends per category

More features l.jpg

… more features

  • Duplicate detection

  • Metadata categorization

  • Allows selection of articles based on any previously assigned meta-data.

  • Automated information linking

  • Incremental topic based clustering and storytracking, geolocation.

  • 10 minute interval incremental clustering on last 4 hours worth of news.(Top Stories on front page)

  • Automatic detection of breaking news

  • Cluster growth rate

  • Flux of articles per category

  • Indexing

  • Index full text and most metadata.

  • Statistics/Trend analysis

  • Quantitative analysis of reporting. Maintain simple count statistics.

And more features l.jpg

…and more features

  • Event extraction

  • Language independent event grammars used to parse clusters using language dependent resources to fill the grammar slots.

  • Currently for 5 languages (en, fr, it, pt, ru), violent events, humanitarian events

Development time line l.jpg





Development time line


Domain specific application


Continuous development

New features


First version 2005

EMM System redesign

Redesign based on EMM

RNS redesign

Medisys system overview l.jpg

MediSys System Overview

MediSys Newsbrief

NewsDesk Service (a.k.a. RNS)

Editorial Interface

EMM Open Source Monitoring Engine

Problems to solve l.jpg

Problems to solve

  • Find relevant information

    • Millions of new articles/blogs/items/tweets published on Internet each day

  • Deliver the information to the right user

    • Allow for many (possibly overlapping) categories to meet specific needs

  • Timely

    • Right now if possible

  • In short: Deliver targeted information timely to the right user

Approach l.jpg


  • Wide coverage

    • Many sources

      • Local, Regional, National and International coverage

    • Many languages

      • Multilinguality & cross-lingual information access

  • Fast coverage

    • High frequency monitoring of sites, some sites every 5 minutes

  • Overcome the information overflow

    • Categorization, aggregation, duplicate identification, clustering

    • Customisability of MedISys NewsBrief

    • Search functions

    • RNS tool for manual moderation and targeted dissemination

Input data l.jpg

Input data

~ 2200 Sources (world-wide, but primary focus on Europe)

~ 4,000 HTML web pages+RSS feeds

~ 100 specialist medical sites

~ 20 commercial newswires

Specialist pay-for sources (LexisMed)

24/7, near continuous monitoring

~80,000 new articles/items per day

Converts dirty html with adverts, menus, html tags, ‘related stories’, etc. into clean and standardised Unicode-encoded RSS format

Use RSS when available

Perform full content analysis

Medisys screenshots l.jpg

MediSys Screenshots

Medisys current subscribers and users include l.jpg

MedISys – Current subscribers and users include …

Supranational organisations

Directorate General Health and Consumer Protection (SANCO)

European Centre for Disease Control, Stockholm (ECDC)

European Food Safety Authority (EFSA)

World Health Organisation (WHO)

National Public Health organisations

Swiss Federal Office of Public Health

Icelandic Ministry of Health

Spanish Ministry of Sanitation & Ministry of Health and Consumer Protection

Institut de Veille Sanitaire (France)

Global Public Health Intelligence Network (Canada)

Danish Emergency Management Agency

Italian Ministry of Health and Ministry of Defence

Dutch Institute of Public Health & Food and Consumer Product Safety Authority

The (general?) public

Currently ~ 1000 visitors, ~ 37000 hits per day on public system

Locations mentioned in medisys medical articles across languages l.jpg

Locations mentioned in MedISys medical articles across languages

English - French

Spanish - Portuguese

Importance of multilingual information gathering

Italian - German

Multilingual and cross lingual analysis 1 l.jpg


influenzavirus tipo A

swine-origin influenza

sjevernoameričk gripe

pandemia influenzale

mexicaanse griep

мексиканск грипп

североамериканск грипп

pandemija svinjske

sjevernoameričke gripe

grippe nouvelle

gripă porcină

svinjski grip




Porzine Influenza


influenza porcina

prasečí chřipka

Multilingual and cross lingual analysis (1)

Barack Obama (Eu,yo)

Barak Obama (az,wo)

Барак Обама (ba,uk)

باراك أوباما (ar)

باراك اوباما (ar,fa)

Барак Хуссейн Обама (ru)

Baraque Obama (pt)

バラク・オバマ (ja)

บารัค โอบามา (th)

Բարաք Օբամա (hy)

ބަރަކް އޮބާމާ (dv)

באראק אבאמא (yi)

ברק אובאמה (he)

贝拉克·奥巴马 (zh)

ބަރާކް އޮބާމާ (dv)

بارک اوبامہ (ur)

  • Data processing layer:

    • Detect ‘known entities’ across languages using large multilingual set of name variants (updated daily)

    • Geo-locate the articles using large multilingual geo-database

    • Apply content based categorization using multilingual category definitions

Multilingual and cross lingual analysis 2 l.jpg

Multilingual and cross lingual analysis (2)

  • Data presentation layer:

    • ‘Convenience’ links to external Machine Translation programs, where available.

    • Display of other MedISys categories, of persons and organisations found in text.

    • Display on-line English translation of Chinese and Arabic

Aggregation of multilingual information l.jpg

Aggregation of multilingual information

Documents from all languages get classified according to the same countries and categories.

An increase of the number of media reports on any country-category combination is detected,

independently of the reporting language.

Graphs and alerts may show events not yet reported in your own language.

Detection using statistics l.jpg

Detection using statistics

  • Detect abnormal flux of reporting for a particular country/category combination

Recent case l.jpg

Recent case

News clusters mostly about category sat 02 05 2009 influenza a l.jpg

News Clusters mostly about CategorySat. 02-05-2009, Influenza A

Categorized and clustered news sat 02 05 2009 influenza a l.jpg

Categorized and Clustered NewsSat. 02-05-2009, Influenza A

Puls event detection l.jpg

PULS Event detection

Results from Helsinki University

Category definitions example haemorrhagic fever l.jpg

Category definitions – Example: haemorrhagic fever

  • Terms (single or multi-word)

    • Cumulative weights with threshold

  • Case forcing

    • Upper case characters in pattern only match uppercase in text (useful for acronyms etc.)

  • Wild cards

    • Single letters (_)

    • Zero, one or more letters (%)

    • Adjacent words (+)

  • Boolean combinations of term lists

    • And, or, not

    • Using proximity operator (within X words)

Customisability of medisys l.jpg

Customisability of MedISys

Add more news sources or new categories, e.g.

Events: Cricket World Cup, Rugby World Cup, UEFA Euro 2008

New diseases

Other classes, e.g. deliberate release of chemicals

(on request of recognised users/partners)

Output formats: web pages, email alerts, or RSS feed to integrate into your environment.

Email alerts:

daily vs. breaking news only

for daily notification: specify hour

for breaking news: level-dependent

User-selected languages only

Customisability filter by language news source category l.jpg

Customisability: Filter by language/news source/category

Rapid news service rns restricted to subscribed users l.jpg

Rapid News Service - RNS (restricted to subscribed users)

Allows MedISys users to further customise their view of the news

Selection of specific languages and feeds

Allows human moderation

Manual selection of news items

Drag and drop compilation of newsletters

Allows moderators to forward news items to user groups

Allows user management

Via SMS alerts, emails or newsletters

Shows overview of relative activity of each category over time

Rns moderation editing interface for newsletter l.jpg

RNS moderation: Editing interface for newsletter

Manual selection of news items, drag and drop compilation of newsletters.

Rns moderation alert overview page l.jpg

RNS moderation: Alert overview page

Time line shows overview of relative activity of each category over time.

Medisys summary l.jpg

MedISys - Summary

High coverage: helps monitor a large number of multilingual media reports.

Includes tools to help beat the information overflow:

via clustering, duplicate detection;

categorization; information aggregation; visualisation; mapping

further means are being implemented: e.g. multiligual medical event extraction

Special features of MedISys:

Fully automatic (moderation possible)

Real time (10-minute updates), 24/7

High multilinguality (43 languages)

Multilingualinformation aggregation

Part of EMM family of applications, active team: much new functionality to come.

  • Login