1 / 18

Juliana Duque, Pablo Matos, Cristina Ciferri, Thiago Pardo and Ricardo Ciferri

A Process Based on Paragraph for Treatment Extraction in Scientific Papers of the Biomedical Domain. Juliana Duque, Pablo Matos, Cristina Ciferri, Thiago Pardo and Ricardo Ciferri presented by Juliana Duque. UFSCar Database Group and USP Natural Language Processing Group São Carlos, BR.

jenn
Download Presentation

Juliana Duque, Pablo Matos, Cristina Ciferri, Thiago Pardo and Ricardo Ciferri

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Process Based on Paragraph for Treatment Extraction in Scientific Papers of the Biomedical Domain • Juliana Duque, Pablo Matos, Cristina Ciferri, Thiago Pardo and Ricardo Ciferri • presented byJuliana Duque • UFSCar Database Group and USP Natural Language Processing Group São Carlos, BR

  2. Context and Motivation A Process for Treatment Extraction • A lot of electronic documents that report experiments • treatment adopted • patients with some kind of disease • number of patients enrolled in the treatment • symptoms and risk factors • positive and negative effects • Nowadays, researchers and doctors are not able to process this huge number of documents

  3. Context and Motivation A Process for Treatment Extraction These documents are in unstructured format, i.e., in plain textual form, specially in PDF It is necessary to transform these data from unstructured to structured format in order to submit it to an automatic knowledge discovery process

  4. Goal A Process for Treatment Extraction • Identify and extract treatments • Drugs, therapies and procedures • Process by paragraph • Empirical analysis of papers from Sickle Cell Anemia • Treatments mainly occurs in sentences with complications or in sentences very near in the same paragraph • Approaches for Extracting Information • Machine Learning • Rules • Dictionary

  5. Contributions A Process for Treatment Extraction • Theoretical: • Domain Knowledge • Methodology of Information Extraction • Practical: • Resources: collection of documents, dictionary and rules • Tools: Information Extraction

  6. Extraction Process for Treatment Final goal: data mining! A Process for Treatment Extraction

  7. Sentence Classification Treatment This result is both clinically meaningful and statistically significant. Hydroxyurea (HU) is considered to be the most successful drug therapy for severe sickle cell disease (SCD). The HU dose was given orally once a day, initially at 20 mg/kg. Hydroxyurea (HU) is ….. The HU dose was… ML Algorithm Others This result is both …… A Process for Treatment Extraction

  8. Sentence Classification Process:training and testing phase 1/2 A Process for Treatment Extraction Bag-of-words model • AVM configuration: • Minimum Frequency = 2 • Attribute Selection: • 1, for the case the n-gram occurs in the sentence (present) • 0 otherwise (absent) • Attributes: 1 to 3-grams • Not considered: stopwords removal and stemming • Partitioning Method: 10-fold cross-validation • Removed parentheses, brackets and apostrophes

  9. Sentence Classification Process:training and testing phase 2/2 A Process for Treatment Extraction • Filter pre-processing: • 1) No Filter • 2) Randomize • 3) Remove Misclassified - remove noise • 4) Resample - balancing of the classes • Algorithms: Support Vector Machine and Naïve Bayes • Best result: SVM - Remove Misclassified – Resample • C1: 95.01% accuracy • C2: 96.62% accuracy

  10. Results - Automatic ClassificationSVM Algorithm A Process for Treatment Extraction

  11. Rules Sentence of Treatment Sentence without POS: Fourteen patients had brain MRI and MRA evaluation after 4 years of hydroxyurea therapy. Sentence with POS: Fourteen_CDpatients_NNShad_VBDbrain_NN MRI_NNP and_CC MRA_NNP evaluation_NNafter_IN 4_CD years_NNSof_INhydroxyurea_NNtherapy_NN._. Rule - word representative + POS [\w\-]*_IN (?:[\w-/\\]* )?([\w\-]*_NN|[\w\-]*_NNP|[\w\-]*_NNS) (?:treatment_NN|therapy_NN) A Process for Treatment Extraction

  12. Rules Sentence of Treatment Sentence without POS: All patients were treated with antibiotics and,on average, became afebrile after a mean of two days of hospitalization. Sentence with POS: All_DTpatients_NNSwere_VBDtreated_VBNwith_INantibiotics_NNSand_CC ,_, on_INaverage_NN ,_, became_VBDafebrile_JJafter_INa_DTmean_NNof_INtwo_CDdays_NNSof_INhospitalization_NN ._. Rule – only POS (?:[\w\-]*_VBD|[\w\-]*_VBN) (?:[\w\-]*_IN )?([\w\-]*_NN|[\w\-]*_NNP|[\w\-]*_NNS) A Process for Treatment Extraction

  13. Dictionary In the MSH study, 299 adults were randomized to receive HU or placebo for a period of approximately 2 years. These results confirm the benefit of HU, even in very young children, and its possible role in primary stroke prevention. Term: Hydroxyurea Variation: HU A Process for Treatment Extraction Biomedical Database

  14. Conclusions A Process for Treatment Extraction • Classification • 79% accuracy – Classifier C1 – Complication • 71% accuracy – Classifier C2 – Treatment • Rules • 45% precision • 70% recall • New experiments: 59% precision and 75% recall • Dictionary • 100% precision - known occurrences of treatments • Variations of terms and synonyms

  15. Future Work A Process for Treatment Extraction Apply the proposed process to others terms in the context of Sickle Cell Anemia Investigate the identification of treatment and symptoms information in scientific papers of other diseases Using indexes to speed up the identification of terms Other biomedical areas may also benefit from our text mining approach

  16. A Process Based on Paragraph for Treatment Extraction in Scientific Papers of the Biomedical Domain Questions? • UFSCar Database Group and USP Natural Language Processing Group São Carlos, BR

  17. References A Process for Treatment Extraction • Ananiadou, S.; McNaught, J. (2006) (Ed.). Text mining for biology and biomedicine. Norwood, MA: Artech House, 302 p. • Cohen, K. B.; Hunter, L. (2008) Getting started in text mining. PLoS Computational Biology, v. 4, n. 1, p. 1-3. • Matos, P. F. (2010) Metodologia de pré-processamento textual para extração de informação sobre efeitos de doenças em artigos científicos do domínio biomédico. 159 f. Dissertação (Mestrado em Ciência de Computação) – Departamento de Computação, Universidade Federal de São Carlos, São Carlos. • Matos, P. F. et al. (2010) An environment for data analysis in biomedical domain: information extraction for decision support systems. In: García-Pedrajas, N. et al. (Eds.). IEA-AIE. 23th. Heidelberg: Springer, p. 306-316. • Tsuruoka, Y.; Tsujiii, J. I. (2004) Improving the performance of dictionary-based approaches in protein name recognition. Journal of Biomedical Informatics, v. 37, n. 6, p. 461-470.

  18. Formula A Process for Treatment Extraction • Precision: TP / (TP + FP) • Recall: TP / (TP + FN) • F-measure: (2 x Prec x Rec) / (Prec + Rec) • Accuracy: TP + TN / (TP + TN + FN + FP)

More Related