1 / 18

Please have a seat. Our program will commence shortly.

Please have a seat. Our program will commence shortly. B iomarker A utomated R etrieval T ool. K N. R C. Ronny Chan, Kim Ngo Earth Science Data Systems Dept. Bioinformatics Relationship. Science produces massive amounts of data Data needs to be analyzed, stored, & retrieved

ceana
Download Presentation

Please have a seat. Our program will commence shortly.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Please have a seat. Our program will commence shortly.

  2. Biomarker Automated Retrieval Tool K N R C Ronny Chan, Kim Ngo Earth Science Data Systems Dept.

  3. Bioinformatics Relationship • Science produces massive amounts of data • Data needs to be analyzed, stored, & retrieved  This is data-mining • We want to apply computer science to improve this process

  4. Motivation • Problems with conventional data mining • Time consuming • Accuracy not defined (subjective) • No objective scientific info retrieval tool Where are the Biomarkers?

  5. Cancer Biomarkers An indicator of cancerous growth. BIO +

  6. Proposed Solution Create a program that allows people to quickly scan literature for the most relevant keywords/biomarkers BAG-1 ERBB2 B.A.R.T. HER-2 EP-CAM HPEBP4

  7. Significance • What is the need of the project? • More efficient research • Save time B.A.R.T. conventional enhanced

  8. Goals • Make biomarker/keyword searches more efficient • Learn Java • Learn SQL

  9. Approach • Write a program • Read in articles • Use part of Vector Space Model algorithm to rank terms • Output relevant terms in statistical rankings BRCA1 they VS.

  10. Information Retrieval System Introduced by Gerald Salton in the 60’s. Used widely in different search engines Vector Space Model

  11. Algorithm for B.A.R.T. Keywords Input PubMed Query Agent Keyword Parser Content Analyzer Content Ranker Data Store Data Retrieval and Output

  12. Results • DCIS • CU-TP3982 • ERBB2 • HER-2 • HPEBP4 • BAG-1 • EP-CAM • 99M

  13. Lessons & Difficulties • Deciding on algorithm choice • Ease of implementation and effectiveness • Limited knowledge & experience • Java, SQL • Initial implementation is slow 5 ARTICLES = 160 sec 20 ARTICLES = 1904 sec 100 ARTICLES = 8^38 years UPDATE: AUGUST 18, 2004  100 ARTICLES = 8^19 years

  14. Future work • Apply different term weight functions to make results more robust • Optimize the program for speed

  15. Citations • http://ir.iit.edu/~dagr/cs529/files/handouts/03VectorSpaceImplementation-6per.PDF • http://classes.engr.oregonstate.edu/eecs/spring2004/cs419/10 • http://www.cs.ust.hk/~dlee/Papers/ir/ieee-sw-rank.pdf • http://hartford.lti.cs.cmu.edu/classes/95-778/Lectures/04-BooleanVectorSpaceB.pdf • Biomarkers Definitions Working Group. Biomarkers and surrogate endoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther. 69(3), 89-95 (2001).

  16. Acknowledgements National Science Foundation (NSF) National Institute of Health (NIH) Earth Science Data System, JPL Tina Xiao Paul Ramirez Chris Mattmann Roshanak Roshandel Sean Hardman Southern California Bioinformatics Summer Institute (So Cal BSI) SoCalBSI Professors Jacqueline Heras ALL SoCalBSI Colleagues

  17. VSM Example Q : malignant breast cancer D 1: detection of malignant level in the cell D 2: sighting of breast stage in the breast cancer D 3: detection of malignant stage in the cancer

  18. Example Continued… Keyword tf * idf

More Related