Bruno cartoni pierre zweigenbaum limsi cnrs france
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Semi-Automated Extension of a Specialized Medical Lexicon for French PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on
  • Presentation posted in: General

Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France. Semi-Automated Extension of a Specialized Medical Lexicon for French. Outline. Context : UMLF for French The desired coverage The target lexical information The organisation of a specialised lexicon Acquiring lexical information

Download Presentation

Semi-Automated Extension of a Specialized Medical Lexicon for French

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bruno cartoni pierre zweigenbaum limsi cnrs france

Bruno Cartoni & Pierre Zweigenbaum

LIMSI-CNRS, France

Semi-Automated Extension of a Specialized Medical Lexicon for French


Outline

Outline

Context : UMLF for French

The desired coverage

The target lexical information

The organisation of a specialised lexicon

Acquiring lexical information

Initial coverage

Obtaining lexical entries from general lexicon

Guessing technique

Results

Consensus guessing

Acquisition of the full paradigm

General improvement

Conclusion and further work


Context the interstis project

Context : the InterSTIS project

InterSTIS: development of Terminology Server for French Medical Terminologies

Sub-Project: Improving the Lexical Coverage of a French medical lexicon (UMLF : Unified Medical Lexicon for French)

Use: support indexation process of medical texts

Issues:

What is the desired lexical knowledge ?

How to acquire it ?


The desired coverage

The desired coverage

Reference: “Term-Union”

Union of 10 terminologies (CIM-10, SNOMED, MeSH, CISMeF, …) of French medical domains, organised around concept identifiers (CUI) of the UMLS

311,518 terms

203,300 unique concepts (CUI)‏

94,964 word-forms


Term union example

Term-Union: example

C0000936MSHFRE…Accommodation de l'oei

C0000936MSHFRE…Accommodation des yeux

C0000936MSHFRE…Accommodation oculaire

C0000936SNMIGIPFRE…accommodation visuelle

...

C00001558MSHF … Voie cutanée

C00001558 MSHF… Voie intradermique

C00001558MSHF … Voie percutanée

C00001558 MSHF … Voie transcutanée

 Observation of term variation


Target lexical information

Target lexical information

Term variation within Term-Union

Graphemic

équilibre acido-basique – équilibre acidobasique

[EN: acid-base balance]

Morphosyntactic

adaptation de l'oeil- adaptation des yeux

[EN: eye adaptation]

Morphosemantic

intoxication à l’alcool - intoxication alcoolique

[EN: alcohol intoxication]

Others ...


Organisation of the specialised lexicon

Organisation of the specialised lexicon

3 types of relational tables for the 3 levels of representation (graphemic, inflection, derivation)

A full-entry lexicon (LMF compliant) that gathers all lexical information

inter-maxillaire | intermaxillaire

insulino-sécrétantes | insulinosécrétantes

scléro-cornéenne | sclérocornéenne

...

abdominal | abdomen

aplasique | aplasie

arachnoïdien | arachnoïde

argentique | argent

sérofibrineux | sérofibrineux | Afpms

sérofibrineuse | sérofibrineux | Afpfs

sérofibrineux | sérofibrineux | Afpmp

sérofibrineuses | sérofibrineux | Afpfp


Outline1

Outline

  • Context : UMLS for French

    • The desired coverage

    • The target lexical information

    • The organisation of a specialised lexicon

  • Acquiring lexical information

    • Initial coverage

    • Obtaining lexical entries from general lexicon

    • Guessing technique

  • Results

    • Consensus guessing

    • Acquisition of the full paradigm

    • General improvement

  • Conclusion and further work


Acquiring the lexical information

Acquiring the lexical information

Initial coverage of UMLF (previous project, UMLF, based on Baud et al. 1998)

17,192 lexical units

5,353 adjectives

11,799 nouns

36,211 word forms


Acquiring the lexical information1

Acquiring the lexical information

From general lexicon

Existing French general lexicon (Morphalou)

With a guessing technique


Acquiring the lexical information2

Acquiring the lexical information

  • From guessing technique (Tanguy & Hathout 2007)

  • 3 steps:

    • Learning phase : calculating the most frequent tag for each ending string in 2 existing lexicons

    • Guessing phase: assigning possible tag(s)

    • Cross validation with 2 guessing based on 2 lexicons


Acquiring the lexical information3

Acquiring the lexical information

  • Acquiring the full paradigm

    • All the inflectional forms

    • Lemma

  • Based on “productive” inflectional paradigms

    • 9 for adjectives

    • 3 for nouns

  • Algorithm based on lexical tries to cluster forms of the same paradigm


Outline2

Outline

  • Context : UMLS for French

    • The desired coverage

    • The target lexical information

    • The organisation of a specialised lexicon

  • Acquiring lexical information

    • Initial coverage

    • Obtaining lexical entries from general lexicon

    • Guessing technique

  • Results

    • Consensus guessing

    • Acquisition of the full paradigm

    • General improvement

  • Conclusion and further work


Acquisition from general lexicon results

Known words entries

Remaining words to describe

Term-Union

94,964

Initial UMLF

19,599

81,595

Morphalou

6,617

74,978

Acquisition from general lexicon: results


Acquisition with guessing techniques results

Acquisition with guessing techniques: results

74,978 unknown forms

44,515 analyses from Morphalou-based program

35,438 analyses from UMLF-based program

Cross-validation: 30,137 in common


Acquisition with guessing techniques evaluation

Acquisition with guessing techniques: evaluation

Wrong label

12

Proper names

49

Latin words

5

English words

1

Spelling/segmentation

10

Other

5

Total

82

  • Errors: 82 out of 1000 (8.2 %)


Acquisition of the full paradigm results

Acquisition of the full paradigm: Results

4,453 paradigms captured (incomplete or not, grouping 9352 word forms)

3,308 adjectives

514 nouns

 Automatic extension for the full paradigms (with canonical forms only)

Manually checked for the others


General improvement

General improvement

Source

Forms added

Still unknown in Term-union

Coverage

UMLF-v1

36,211

81,595

14,1%

Morphalou

17,828

74,978

21,0%

Acquisition

8,088

70,602

25,7%


Outline3

Outline

  • Context : UMLS for French

    • The desired coverage

    • The target lexical information

    • The organisation of a specialized lexicon

  • Acquiring lexical information

    • Initial coverage

    • Obtaining lexical entries from general lexicon

    • Guessing technique

  • Results

    • Consensus guessing

    • Acquisition of the full paradigm

    • General improvement

  • Conclusion and further work


Discussion and conclusion

Discussion and conclusion

The acquisition and evaluation of specialised lexical resources require a specific reference  Term-Union

Extract (full) lexical information

Assess lexical needs and target

Other acquisition techniques (CRF for inflectional information, rule-based techniques for derivational information)


Acknowledgment

Acknowledgment

  • This work was partially funded by project InterSTIS (ANR-07-TECSAN-010)

  • InterSTIS project: www.interstis.org


  • Login