slide1
Download
Skip this Video
Download Presentation
The Medical Information System - MedISys eHealth 2009 Second International ICST Conference on Electronic Healthcare for the 21st century

Loading in 2 Seconds...

play fullscreen
1 / 33

JRC - who - PowerPoint PPT Presentation


  • 434 Views
  • Uploaded on

The Medical Information System - MedISys eHealth 2009 Second International ICST Conference on Electronic Healthcare for the 21st century September 23-25, 2009 - Istanbul, Turkey Erik van der Goot & the OPTIMA team ( OPensource Text Information Mining and Analysis )

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'JRC - who' - Leo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

The Medical Information System - MedISys

eHealth 2009

Second International ICST Conference

on

Electronic Healthcare for the 21st century

September 23-25, 2009 - Istanbul, Turkey

Erik van der Goot & the OPTIMA team (OPensource Text Information Mining and Analysis )

European Commission – Joint Research Centre (JRC)Institute for the Protection and Security of the Citizen (IPSC)[email protected]

medisys overview
MedISys - Overview

Objective:

Provide open source data collection and analysis for surveillance and epidemiology

Replace manual scanning of multiple newspapers and web portals

Support national and international Public Health (PH) organisations to monitor issues of Public Health concern (e.g. CBRN)

Functionality:

Gather, filter, classify, extract and aggregate health-related information

Monitor trends, detect breaking news

Visualise analysis results

Alert users

Allows customised views

In combination with RNS tool, allows manual moderation.

background history
Background - History

Based on JRC’s Europe Media Monitor (EMM) technology (EMM live since 2002; http://emm.newsbrief.eu).

On request / initiative of the EC’s Directorate General for Health and Consumer Protection (DG SANCO).

Password-protected service for Public Health bodies since 2005.

Public service since early 2007 (http://medusa.jrc.it/, restricted functionality).

background media monitoring
Background - Media Monitoring
  • EU Commission Media Monitoring (until 2001/2002)
    • Traditional cut and paste for printed press only
    • Monitoring of incoming news wires (e.g. Reuters, AFP)
    • Simple keyword based filtering of wires
    • Manual selection of printed press items
    • Human classification of items
  • Potential problems
    • Not ‘real-time’ for mainstream media: printed press typically once a day
    • Limited coverage: not all media is printed
    • Inaccurate and incomplete classification: subjective and limited number of categories
    • Labour intensive and expensive: limited number of articles per reviewer per day, requires topical knowledge and requires language knowledge
emm history
EMM History
  • New Challenges (as seen in 2002)
    • Enlargement (+10 countries): more media, more languages
    • More use of electronic publishing (media)
    • Electronic distribution of results (web+mobile)
    • Automatic alerting functions
  • New approach: EMM - a one stop shop for Media Monitoring
    • Facilitate (not replace) human Media Monitoring activities
    • Extend monitoring beyond the traditional news wires (Internet).
    • Improve coverage, number of languages, analysis.
    • Apply automatic categorization and analysis to all sources
    • Provide new services like automatic e-mail, sms, mobile editions etc.
    • Provide editorial system to manage the information and produce newsletters etc.
    • Important: EMM is notYet Another Internet Search Engine
emm system features
EMM System Features
  • Automatic language recognition
  • Based on continuously updated language specific frequency tables
  • Automated information/entity extraction
  • 400.000 persons and organizations based on continuously updated list of entities, many language specific synonyms.
  • Geotagging
  • Based on homegrown harmonised multilingual geo-data set, about 600.000 place name variants in most languages covered by EMM, mostly national capitals, regional capitals and provincial capitals.
  • Improved Categorization Engine
  • Boolean combinations, proximity, wildcards
  • Support for Arabic and similar (automatic noun-prefix processing) Support for Chinese and similar (no whitespace)
  • Tonality/Sentiment
  • Simple bag of words approach, range from very negative to very positive, corrected for long term source bias, interesting for following reporting trends per category
more features
… more features
  • Duplicate detection
  • Metadata categorization
  • Allows selection of articles based on any previously assigned meta-data.
  • Automated information linking
  • Incremental topic based clustering and storytracking, geolocation.
  • 10 minute interval incremental clustering on last 4 hours worth of news.(Top Stories on front page)
  • Automatic detection of breaking news
  • Cluster growth rate
  • Flux of articles per category
  • Indexing
  • Index full text and most metadata.
  • Statistics/Trend analysis
  • Quantitative analysis of reporting. Maintain simple count statistics.
and more features
…and more features
  • Event extraction
  • Language independent event grammars used to parse clusters using language dependent resources to fill the grammar slots.
  • Currently for 5 languages (en, fr, it, pt, ru), violent events, humanitarian events
development time line

2002

2004

2006

2009

Development time line

EMM/RNS

Domain specific application

MediSys

Continuous development

New features

NewsExplorer

First version 2005

EMM System redesign

Redesign based on EMM

RNS redesign

medisys system overview
MediSys System Overview

MediSys Newsbrief

NewsDesk Service (a.k.a. RNS)

Editorial Interface

EMM Open Source Monitoring Engine

problems to solve
Problems to solve
  • Find relevant information
    • Millions of new articles/blogs/items/tweets published on Internet each day
  • Deliver the information to the right user
    • Allow for many (possibly overlapping) categories to meet specific needs
  • Timely
    • Right now if possible
  • In short: Deliver targeted information timely to the right user
approach
Approach
  • Wide coverage
    • Many sources
      • Local, Regional, National and International coverage
    • Many languages
      • Multilinguality & cross-lingual information access
  • Fast coverage
    • High frequency monitoring of sites, some sites every 5 minutes
  • Overcome the information overflow
    • Categorization, aggregation, duplicate identification, clustering
    • Customisability of MedISys NewsBrief
    • Search functions
    • RNS tool for manual moderation and targeted dissemination
input data
Input data

~ 2200 Sources (world-wide, but primary focus on Europe)

~ 4,000 HTML web pages+RSS feeds

~ 100 specialist medical sites

~ 20 commercial newswires

Specialist pay-for sources (LexisMed)

24/7, near continuous monitoring

~80,000 new articles/items per day

Converts dirty html with adverts, menus, html tags, ‘related stories’, etc. into clean and standardised Unicode-encoded RSS format

Use RSS when available

Perform full content analysis

medisys current subscribers and users include
MedISys – Current subscribers and users include …

Supranational organisations

Directorate General Health and Consumer Protection (SANCO)

European Centre for Disease Control, Stockholm (ECDC)

European Food Safety Authority (EFSA)

World Health Organisation (WHO)

National Public Health organisations

Swiss Federal Office of Public Health

Icelandic Ministry of Health

Spanish Ministry of Sanitation & Ministry of Health and Consumer Protection

Institut de Veille Sanitaire (France)

Global Public Health Intelligence Network (Canada)

Danish Emergency Management Agency

Italian Ministry of Health and Ministry of Defence

Dutch Institute of Public Health & Food and Consumer Product Safety Authority

The (general?) public

Currently ~ 1000 visitors, ~ 37000 hits per day on public system

locations mentioned in medisys medical articles across languages
Locations mentioned in MedISys medical articles across languages

English - French

Spanish - Portuguese

Importance of multilingual information gathering

Italian - German

multilingual and cross lingual analysis 1

Influenza-A-Virus

influenzavirus tipo A

swine-origin influenza

sjevernoameričk gripe

pandemia influenzale

mexicaanse griep

мексиканск грипп

североамериканск грипп

pandemija svinjske

sjevernoameričke gripe

grippe nouvelle

gripă porcină

svinjski grip

sikainfluenssa

svininfluensa

Schweineinfluenza

Porzine Influenza

Schweinegrippe

influenza porcina

prasečí chřipka

Multilingual and cross lingual analysis (1)

Barack Obama (Eu,yo)

Barak Obama (az,wo)

Барак Обама (ba,uk)

باراك أوباما (ar)

باراك اوباما (ar,fa)

Барак Хуссейн Обама (ru)

Baraque Obama (pt)

バラク・オバマ (ja)

บารัค โอบามา (th)

Բարաք Օբամա (hy)

ބަރަކް އޮބާމާ (dv)

באראק אבאמא (yi)

ברק אובאמה (he)

贝拉克·奥巴马 (zh)

ބަރާކް އޮބާމާ (dv)

بارک اوبامہ (ur)

  • Data processing layer:
    • Detect ‘known entities’ across languages using large multilingual set of name variants (updated daily)
    • Geo-locate the articles using large multilingual geo-database
    • Apply content based categorization using multilingual category definitions
multilingual and cross lingual analysis 2
Multilingual and cross lingual analysis (2)
  • Data presentation layer:
    • ‘Convenience’ links to external Machine Translation programs, where available.
    • Display of other MedISys categories, of persons and organisations found in text.
    • Display on-line English translation of Chinese and Arabic
aggregation of multilingual information
Aggregation of multilingual information

Documents from all languages get classified according to the same countries and categories.

An increase of the number of media reports on any country-category combination is detected,

independently of the reporting language.

Graphs and alerts may show events not yet reported in your own language.

detection using statistics
Detection using statistics
  • Detect abnormal flux of reporting for a particular country/category combination
puls event detection
PULS Event detection

Results from Helsinki University

category definitions example haemorrhagic fever
Category definitions – Example: haemorrhagic fever
  • Terms (single or multi-word)
    • Cumulative weights with threshold
  • Case forcing
    • Upper case characters in pattern only match uppercase in text (useful for acronyms etc.)
  • Wild cards
    • Single letters (_)
    • Zero, one or more letters (%)
    • Adjacent words (+)
  • Boolean combinations of term lists
    • And, or, not
    • Using proximity operator (within X words)
customisability of medisys
Customisability of MedISys

Add more news sources or new categories, e.g.

Events: Cricket World Cup, Rugby World Cup, UEFA Euro 2008

New diseases

Other classes, e.g. deliberate release of chemicals

(on request of recognised users/partners)

Output formats: web pages, email alerts, or RSS feed to integrate into your environment.

Email alerts:

daily vs. breaking news only

for daily notification: specify hour

for breaking news: level-dependent

User-selected languages only

rapid news service rns restricted to subscribed users
Rapid News Service - RNS (restricted to subscribed users)

Allows MedISys users to further customise their view of the news

Selection of specific languages and feeds

Allows human moderation

Manual selection of news items

Drag and drop compilation of newsletters

Allows moderators to forward news items to user groups

Allows user management

Via SMS alerts, emails or newsletters

Shows overview of relative activity of each category over time

rns moderation editing interface for newsletter
RNS moderation: Editing interface for newsletter

Manual selection of news items, drag and drop compilation of newsletters.

rns moderation alert overview page
RNS moderation: Alert overview page

Time line shows overview of relative activity of each category over time.

medisys summary
MedISys - Summary

High coverage: helps monitor a large number of multilingual media reports.

Includes tools to help beat the information overflow:

via clustering, duplicate detection;

categorization; information aggregation; visualisation; mapping

further means are being implemented: e.g. multiligual medical event extraction

Special features of MedISys:

Fully automatic (moderation possible)

Real time (10-minute updates), 24/7

High multilinguality (43 languages)

Multilingualinformation aggregation

Part of EMM family of applications, active team: much new functionality to come.

ad