Please have a seat our program will commence shortly
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Please have a seat. Our program will commence shortly. PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Please have a seat. Our program will commence shortly. B iomarker A utomated R etrieval T ool. K N. R C. Ronny Chan, Kim Ngo Earth Science Data Systems Dept. Bioinformatics Relationship. Science produces massive amounts of data Data needs to be analyzed, stored, & retrieved

Download Presentation

Please have a seat. Our program will commence shortly.

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Please have a seat our program will commence shortly

Please have a seat. Our program will commence shortly.


B iomarker a utomated r etrieval t ool

Biomarker Automated Retrieval Tool

K N

R C

Ronny Chan, Kim Ngo

Earth Science Data Systems Dept.


Bioinformatics relationship

Bioinformatics Relationship

  • Science produces massive amounts of data

  • Data needs to be analyzed, stored, & retrieved

     This is data-mining

  • We want to apply computer science to improve this process


Motivation

Motivation

  • Problems with conventional data mining

    • Time consuming

    • Accuracy not defined (subjective)

  • No objective scientific info retrieval tool

Where are the Biomarkers?


Cancer biomarkers

Cancer Biomarkers

An indicator of cancerous growth.

BIO +


Proposed solution

Proposed Solution

Create a program that allows people to quickly scan literature for the most relevant keywords/biomarkers

BAG-1

ERBB2

B.A.R.T.

HER-2

EP-CAM

HPEBP4


Significance

Significance

  • What is the need of the project?

    • More efficient research

    • Save time

B.A.R.T.

conventional

enhanced


Goals

Goals

  • Make biomarker/keyword searches more efficient

  • Learn Java

  • Learn SQL


Approach

Approach

  • Write a program

    • Read in articles

    • Use part of Vector Space Model algorithm to rank terms

    • Output relevant terms in statistical rankings

BRCA1

they

VS.


Vector space model

Information Retrieval System

Introduced by Gerald Salton in the 60’s.

Used widely in different search engines

Vector Space Model


Algorithm for b a r t

Algorithm for B.A.R.T.

Keywords Input

PubMed Query Agent

Keyword Parser

Content Analyzer

Content Ranker

Data Store

Data Retrieval and Output


Results

Results

  • DCIS

  • CU-TP3982

  • ERBB2

  • HER-2

  • HPEBP4

  • BAG-1

  • EP-CAM

  • 99M


Lessons difficulties

Lessons & Difficulties

  • Deciding on algorithm choice

    • Ease of implementation and effectiveness

  • Limited knowledge & experience

    • Java, SQL

    • Initial implementation is slow

5 ARTICLES=160 sec

20 ARTICLES=1904 sec

100 ARTICLES=8^38 years

UPDATE: AUGUST 18, 2004

 100 ARTICLES=8^19 years


Future work

Future work

  • Apply different term weight functions to make results more robust

  • Optimize the program for speed


Citations

Citations

  • http://ir.iit.edu/~dagr/cs529/files/handouts/03VectorSpaceImplementation-6per.PDF

  • http://classes.engr.oregonstate.edu/eecs/spring2004/cs419/10

  • http://www.cs.ust.hk/~dlee/Papers/ir/ieee-sw-rank.pdf

  • http://hartford.lti.cs.cmu.edu/classes/95-778/Lectures/04-BooleanVectorSpaceB.pdf

  • Biomarkers Definitions Working Group.

    Biomarkers and surrogate endoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther. 69(3), 89-95 (2001).


Acknowledgements

Acknowledgements

National Science Foundation (NSF)

National Institute of Health (NIH)

Earth Science Data System, JPL

Tina Xiao

Paul Ramirez

Chris Mattmann

Roshanak Roshandel

Sean Hardman

Southern California Bioinformatics Summer Institute (So Cal BSI)

SoCalBSI Professors

Jacqueline Heras

ALL SoCalBSI Colleagues


Please have a seat our program will commence shortly

VSM Example

Q :malignant breast cancer

D 1:detection of malignant level in the cell

D 2:sighting of breast stage in the breast cancer

D 3:detection of malignant stage in the cancer


Example continued

Example Continued…

Keyword tf * idf


  • Login