1 / 37

Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam

Databases for Knowledge Discovery. Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam. Databases for Knowledge Discovery. Natural sciences physics, chemistry, engineering models, experiments, theories ► ’hard’ data Humanities

Download Presentation

Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases for Knowledge Discovery Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam

  2. Databases for Knowledge Discovery • Natural sciences • physics, chemistry, engineering • models, experiments, theories ► ’hard’ data • Humanities • arts, social sciences, economics • behavioural studies, text analysis ► ‘soft’ data • Biomedical and health sciences • biomedicine, health sciences • models, experiments, studies ► hard & soft data

  3. Databases for Knowledge Discovery • Biomedicine & health sciences • Biomedical research related to the 'hard' scientific approach as in physics and engineering • Clinical research using rather 'hard' data, and sometimes ‘soft’ subjective observations • Population-based research • data collected from populations of healthy and ill persons • This research can be subdivided into • retrospective research • prospective research

  4. Databases for Knowledge Discovery • Biomedicine & health sciences • Biomedical research related to the 'hard' scientific approach as in physics and engineering • Clinical research using rather 'hard' data, and sometimes ‘soft’ subjective observations • Population-based research • data collected from populations of healthy and ill persons • This research can be subdivided into • retrospective research • prospective research

  5. Databases for Knowledge Discovery Research Database Regional Database Regional Database Regional Database Biomedicine and Health Sciences Basic Research experiments Clinical Research patients Health Research populations Discovery of new scientific knowledge from large databases of measurements, observations and interpretations

  6. Databases for Knowledge Discovery • Biomedicine & health sciences • Until recently, basic research in biomedicine was done on organs and organisms. • Nowadays the fundamental challenges lay a magnitude lower: on the level of molecules and cells. • Research on organs and organisms is still of interest: breakthroughs from biomolecular research are to be translated to higher levels.

  7. Databases for Knowledge Discovery • Biomedical research • Knowledge contained in multiple databases • of refereed articles and • databases on genes and proteins • MedLine: 11 million abstracts; 500,000/year • searching for articles in sphere of interest • how to find new knowledge? • how to cope with serendipity?

  8. Databases for Knowledge Discovery • Biomedical research • Different methods to retrieve knowledge: • simple Boolean expressions • too specific: few references • too broad: avalanche of references • use of a more complex ‘fingerprint’ • combination of different databases • complex retrieval using ontology dbase for- ward in- verse

  9. Databases for Knowledge Discovery • Biomedical research

  10. Databases for Knowledge Discovery • Biomedical research

  11. Databases for Knowledge Discovery Emails Word RFPs Jobs CVs, Skills average average Articles books organisation fingerprints content fingerprints people fingerprints • Biomedical research

  12. Databases for Knowledge Discovery Find new associa- tions Matching methods Genetics Database Literature Database • Biomedical research Data mining A – B B – C A – C

  13. Databases for Knowledge Discovery Find new associa- tions Matching methods Genetics Database Literature Database • Biomedical research Composition of a thesaurus from separate databases GDB: AAA; BBB LocusLink: AAA; CCC Hugo NC: AAA OMIM: BBB; CCC SwissProt: BBB concept: AAA synonyms: BBB; CCC

  14. Databases for Knowledge Discovery • Biomedical research

  15. Databases for Knowledge Discovery Ontology database Collexion • Biomedical research ACS construc- tor ACS viewer ACS model ACS: Associative Concept Space ACS valida- tion

  16. Databases for Knowledge Discovery • Biomedicine & health sciences • Biomedical research related to the 'hard' scientific approach as in physics and engineering • Clinical research using rather 'hard' data, and sometimes ‘soft’ subjective observations • Population-based research • data collected from populations of healthy and ill persons • This research can be subdivided into • retrospective research • prospective research

  17. Databases for Knowledge Discovery UK 100 90 Growth of information systems in primary care 80 NL 70 60 50 Percentage of primary care practices 40 30 20 Computer- based patient records 10 Year 0 78 80 82 84 86 88 90 92 94 96 98 • Clinical research

  18. Databases for Knowledge Discovery • Clinical research BloodLink The impact of guidelines-based decision support on lab test ordering in primary care.

  19. Databases for Knowledge Discovery BloodLink Control Guideline- controlled clinical trialGroup Group No. of practices 21 23 No. of physicians 29 31 No. of patients 97,177 98,432 Sickfunds 52% 52% No. of order forms 12,786 12,700 • Clinical research

  20. Databases for Knowledge Discovery Test BloodLink Guideline Difference BloodLink control ESR 5612 -29% 7932 Hemoglobin 6061 -17% 7332 WBC count 3719 -26% 5039 Hematocrite 3611 -25% 4830 Creatinine 3314 -34% 5024 Erytrocytes 3360 -28% 4690 MCV 3159 -32% 4642 Differentiatie 3060 -26% 4151 Cholesterol 3413 -1% 4354 TSH 3213 +9% 2954 Gamma-GT 2004 -42% 3466 Glucose in serum 2964 19% 2501 ALAT (SGPT) 1892 -34% 2850 Potassium 1096 -53% 2320 ASAT (SGOT) 959 -58% 2269 Glucose fasting 1286 -20% 1611 Triglycerides 1398 1% 1380 HDL cholesterol 1350 -2% 1382 Natrium 745 -30% 1070 Free T4 618 -47% 1163

  21. Databases for Knowledge Discovery Test BloodLink Guideline Difference BloodLink control ESR 5612 -29% 7932 In case of thyroid disease, physicians were used to order the T4 test (free thyroxine); the protocol prescribed the TSH test instead (thyroid stimulating hormone) Hemoglobin 6061 -17% 7332 WBC count 3719 -26% 5039 Hematocrite 3611 -25% 4830 Creatinine 3314 -34% 5024 Erytrocytes 3360 -28% 4690 MCV 3159 -32% 4642 Differentiatie 3060 -26% 4151 Cholesterol 3413 -1% 4354 3213 +9% 2954 Free T4 TSH Gamma-GT 2004 -42% 3466 Glucose in serum 2964 19% 2501 ALAT (SGPT) 1892 -34% 2850 Potassium 1096 -53% 2320 ASAT (SGOT) 959 -58% 2269 Glucose fasting 1286 -20% 1611 Triglycerides 1398 1% 1380 HDL cholesterol 1350 -2% 1382 Natrium 745 -30% 1070 Free T4 618 -47% 1163

  22. Databases for Knowledge Discovery Test BloodLink Guideline Difference BloodLink control Tests, such as SGOT (serum glu- tamic oxalacetic transaminase), Gamma GT and SGPT, had been ordered almost automatically; the protocols, however, did not support such tests. The same applies to K+. ESR 5612 -29% 7932 Hemoglobin 6061 -17% 7332 WBC count 3719 -26% 5039 Hematocrite 3611 -25% 4830 Creatinine 3314 -34% 5024 Erytrocytes 3360 -28% 4690 MCV 3159 -32% 4642 Differentiatie 3060 -26% 4151 Cholesterol 3413 -1% 4354 TSH 3213 +9% 2954 Gamma-GT 2004 -42% 3466 Gamma GT Glucose in serum 2964 19% 2501 ALAT (SGPT) 1892 -34% 2850 ALAT (SGPT) Potassium 1096 -53% 2320 ASAT (SGOT) 959 -58% 2269 ASAT (SGOT) Glucose fasting 1286 -20% 1611 Triglycerides 1398 1% 1380 HDL cholesterol 1350 -2% 1382 Natrium 745 -30% 1070 Free T4 618 -47% 1163

  23. Databases for Knowledge Discovery BloodLink Control Guideline- controlled clinical trialGroup Group No. of practices 21 23 No. of GPs 29 31 No. of patients 97,177 98,432 Sickfunds 52% 52% No. of order forms 12,786 12,700 % of forms generated by BloodLink 89% 73% No. of requested tests 87,634 70,479 Average No. of tests per order1 6.9 5.5 1Student's t-test, N=44, p<0.001 • Clinical research

  24. Databases for Knowledge Discovery • Clinical research Cardiology

  25. Databases for Knowledge Discovery 100 90 80 70 sens (%) 60 50 40 30 20 10 0 100 90 80 70 60 50 40 30 20 10 0 spec (%) • Clinical research Critiquing system for hypertension # sens spec 1 0.94 0.36 2 0.86 0.70 3 0.72 0.82 4 0.65 0.75 5 0.73 0.69 6 0.70 0.78 7 0.88 0.52 8 0.74 0.77 CS 0.74 0.88

  26. Databases for Knowledge Discovery Class N NL LVH RVH BVH AMI IMI MIX OTH VH+MI NL 382 0.9 0.4 0.0 1.4 1.6 0.0 0.1 95.5 LVH 183 19.0 0.5 0.0 4.3 6.9 0.2 0.0 69.0 RVH 55 40.6 6.7 2.7 1.2 2.1 0.0 0.9 45.8 BVH 53 22.0 54.7 14.5 5.3 1.9 0.0 0.0 1.6 AMI 170 14.3 2.6 0.6 0.0 1.8 0.7 0.0 80.0 IMI 273 19.8 2.6 0.2 0.0 0.7 0.1 0.0 76.7 MIX 73 2.5 4.1 1.6 0.0 51.6 37.4 0.0 2.7 VH+MI 31 22.6 0.0 0.0 0.0 0.0 0.0 0.0 16.1 61.3 • Clinical research Reference

  27. Databases for Knowledge Discovery • Assessment of different interpretation programs 90 85 80 % agreement with referees 75 70 cardiologists systems 65 60 60 65 70 75 80 85 90 % agreement with clinical data • Clinical research Computer- assisted ECG inter- pretation

  28. Databases for Knowledge Discovery • Biomedicine & health sciences • Biomedical research related to the 'hard' scientific approach as in physics and engineering • Clinical research using rather 'hard' data, and sometimes ‘soft’ subjective observations • Population-based research • data collected from populations of healthy and ill persons • This research can be subdivided into • retrospective research • prospective research

  29. Databases for Knowledge Discovery 100 90 UK 80 Growth of information systems in primary care Central Database 70 NL 60 Percentage of primary care practices 50 40 CPR 30 20 Computer- based patient records 10 CPR Year 0 96 98 78 80 82 84 86 88 90 92 94 CPR CPR health care practices • Population-based research: • retrospective • Post-marketing surveillance of drugs • Combinations of drugs: interactions • Longitudinal databases of about 500,000 patients • Patient privacy and data security

  30. Databases for Knowledge Discovery Research database research data research data research data research data • Population-based research: • retrospective population- based research

  31. Databases for Knowledge Discovery Research database research data research data research data research data recessive • Population-based research: • retrospective population- based research Pedigree tree • coupling of clinical data to genealogical database • municipal records of > 20,000 individuals • each disorder could be coupled to common ancestor: genes involved in diabetes, Alzheimer’s disease, etc.

  32. Databases for Knowledge Discovery Research database research data research data research data research data • Population-based research: • prospective Rotterdam Study population- based research

  33. Databases for Knowledge Discovery Research database research data research data research data research data • Population-based research: • prospective Rotterdam Study population- based research • Prospective longitudinal database • 10,000 persions > 55 years of age • relationships between risks and diseases • cardiovascular and vessel-wall diseases, glaucoma neurologic diseases (Alzheimer), osteoporosis

  34. Databases for Knowledge Discovery Generation R Research database research data research data research data research data • Population-based research: • prospective population- based research

  35. Databases for Knowledge Discovery Generation R Research database research data research data research data research data • Population-based research: • prospective population- based research • Prospective longitudinal database • 10,000 children from pregnancy onwards • relations risks and genetics/environmental data • perinatal circumstances, diseases at young age cultural backgrounds, impact of education, etc.

  36. Databases for Knowledge Discovery • A formal ('forward‘ ) method in analysing large research databases may hamper the flexible attitude of a researcher, not knowing in advance what he may expect (serendipity). • ‘Hard’ and‘soft’ examples from biomedicine and the health sciences show that computers can be very helpful in finding new and unforeseen (‘inverse’ ) associations between the data stored in research databases. • Well-documented databases are an enormous treasure for the advancement of scientific research.

More Related