1 / 19

Text Classification for Healthcare Information Support

Text Classification for Healthcare Information Support. Rey-Long Liu ( 劉瑞瓏 ) Dept. of Medical Informatics Tzu Chi University, Taiwan. Background. Text categorization (TC) as a fundamental component for information processing Many TC techniques were developed

gianna
Download Presentation

Text Classification for Healthcare Information Support

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Classification for Healthcare Information Support Rey-Long Liu (劉瑞瓏) Dept. of Medical Informatics Tzu Chi University, Taiwan

  2. Background • Text categorization (TC) as a fundamental component for information processing • Many TC techniques were developed • Unfortunately, high-quality TC is often an unrealizable ideal • Very high precision • Very high recall

  3. Consultancy General Users (e.g. patients) Healthcare Professionals Classified Inquiry Inquiry Classification Confirmation Query Relevant Information Classified Query Classified Information Information Gathered Classified Information Base Information Gathering Systems High-Quality TC Background (Cont.) • An application scenario: healthcare information support

  4. Outline • Interaction as an approach to high-quality TC • Main consideration • Reducing the amount of the interaction • Criteria & straightforward interaction strategies • An intelligent interaction strategy: COM (Content Overlapping Measurement) • Empirical evaluation • Chinese cancer texts classification • Conclusion

  5. Interaction for High-Quality TC • Interaction with the user • Possibly a “final” approach • More application scenarios • Information recommendation & archiving • Definite relevant vs. potentially relevant • Main consideration • Reducing the number of interactions

  6. Interaction for High-Quality TC (Cont.) • Evaluation criteria • Confirmation Precision (CP) • Related to cognitive load to users • Confirmation Recall (CR) • Related to the quality of TC

  7. (A) Setting two thresholds to identify the DOA range for confirmation (o: positive validation document; x: negative validation document) : Acceptance Threshold Rejection Threshold Max DOA Min DOA x o x x o x o x o o o o x x x o o (B) Confirmation strategy: Prob = 1.0 (when RT  DOA(d, c)  AT) Prob = 0 (when DOA(d, c) < RT) Prob = 0 (when DOA(d, c) > AT) • Uniform Confirmation (UC): Preferring CR Interaction for High-Quality TC (Cont.) • Straightforward interaction strategies

  8. (A) Tuning a threshold in the hope to optimize F1 (o: positive validation document; x: negative validation document): The classifier’s Threshold (T) Max DOA Min DOA o x o x x o x o x o o o x x x o o • Probabilistic Confirmation (PC): Preferring CP (B) Confirmation strategy: Prob = 1.0 (when DOA(d, c) = threshold) Prob = 0 (when DOA(d, c) = Min) Prob = 0 (when DOA(d, c) = Max) Interaction for High-Quality TC (Cont.)

  9. Underlying Classifier Feature Selection ICCOM (1) Content Overlap Measurement (COM) Training Documents for Classifier Building Classifier Building Training Documents for Threshold Tuning (validation) (2) Threshold Tuning based on Content Overlapping Threshold Tuning Incoming Document Classified/Filtered Documents Documents to be Confirmed Training Testing ICCOM: Interactive Confirmation by COM (3) Content Overlap Measurement (COM) Classification

  10. ICCOM: Interactive Confirmation by COM (content overlapping measurement) • ProcedureCOM(c, d), where • (1) c is a category, • (2) d is a document for thresholding or testing • Return: Degree of content overlap (DCO) between d and c • Begin • (1) DCO = 0; • (2) For each term t that is positively correlated with c but does not appear ind, do • (2.1) DCO = DCO - 2(t,c); • (3) For each term t that is negatively correlated with c but appears ind, do • (3.1) DCO = DCO - (number of occurrences of t in d) 2(t,c); • (4) Return DCO; • End.

  11. ICCOM: Interactive Confirmation by COM (content overlapping measurement, cont.)

  12. ICCOM: Interactive Confirmation by COM (content overlapping measurement, cont.) “positively-correlated” if AD>BC; otherwise “negative-correlated” N: total number of documents, A: # documents that are in c and contain t, B: # documents that are not in c but contain t, C: # documents that are in c but do not contain t, and D: # documents that are not in c and do not contain t.

  13. Rejection Threshold (RT) The classifier’s threshold (T) Max DOA Min DOA o o o x x x o x o o o x x x o o Rejection Invoking COM to compute DCO Positive Confirmation Threshold (PCT) Negative Confirmation Threshold (NCT) o o x o x o o o o o Acceptance Confirmation x x o Rejection Confirmation ICCOM: Interactive Confirmation by COM (thresholding)

  14. ICCOM: Interactive Confirmation by COM (collaboration with the classifier) • ProcedureInteractiveHighQualityTC(c, d, T, RT, PCT, NCT), where • (1) c is a category, • (2) d is the document to be processed, • (3) T is the classifier’s threshold for c, • (4) RT is the rejection threshold for c, • (5) PCT is the positive confirmation threshold for c, and • (6) NCT is the negative confirmation threshold for c. • Return: • A decision (acceptance, rejection, or confirmation) for d with respect to c. • Begin • (1) DOAd = Invoke the classifier to compute DOA of d with respect to c; • (2) If (DOAd RT), Return “rejection”; • (3) Else • (3.1) DCOd = Invoke COM to compute DCO of d with respect to c; • (3.2) If (DOAd T) • (3.2.1) If (DCOd PCT), Return “acceptance”; • (3.2.2) Return “confirmation”; • (3.3) Else • (3.3.1) If (DCOd NCT), Return “rejection”; • (3.3.2) Return “confirmation”; • End.

  15. Empirical Evaluation • Chinese disease (cancer) texts • 16 types of cancers (e.g. liver cancer, lung cancer, …, etc.) top-ranked by the department of health in Taiwan • Collected by sending cancer names to “知識+” (knowledge+) in Yahoo! at Taiwan • For each cancer, there are 5 subcategories • Cause, symptom, curing, side-effect, and prevention • Therefore, we have 80 (16*5) categories with 2850 documents • 90% for training; 10% for testing • 2-fold cross validation (classifier building vs. thresholding)

  16. Empirical Evaluation (cont.) Classification of cancer information

  17. Empirical Evaluation (cont.) Classification of 40 symptom description without cancer names Note: For the 40 test symptom documents, RO+ICCOM conducts 35 and 51 confirmations in the 1st and 2nd folds, respectively

  18. Conclusion • High-quality TC is essential but often unrealizable • Interactive confirmation may be one final resort • Information recommendation & archiving • Healthcare information support • COM as a classifier-independent strategy for interaction

  19. Thank you!

More Related