session v life science identifiers use cases future directions
Download
Skip this Video
Download Presentation
Session V: Life Science Identifiers - Use Cases, Future Directions

Loading in 2 Seconds...

play fullscreen
1 / 14

Session V: Life Science Identifiers - Use Cases, Future Directions - PowerPoint PPT Presentation


  • 195 Views
  • Uploaded on

Session V: Life Science Identifiers - Use Cases, Future Directions. Recent History. LSIDs 3 years old I3C evaluating AGAVE, BSML encoded IDs as tuples/triples If we could not agree on a data standard, could we at least agree on how we write the identifiers. Today. OMG Spec

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Session V: Life Science Identifiers - Use Cases, Future Directions' - ina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
recent history
Recent History
  • LSIDs 3 years old
  • I3C evaluating AGAVE, BSML
    • encoded IDs as tuples/triples
  • If we could not agree on a data standard, could we at least agree on how we write the identifiers
today
Today
  • OMG Spec
  • google “+LSID +bioinformatics”
    • 686 results (10/27/04, 2:40pm)
    • 700 results (10/27/04, 7:20am)
how genepattern is using lsids
How GenePattern is using LSIDs
  • Identify analysis tasks and pipelines via LSIDs
  • Create sharable pipelines referencing tasks via LSIDs
  • Provide a repository and retrieval for analysis tasks by LSID
example all aml analysis
Example: ALL/AML Analysis

Training Data

Test Data

all_aml_train

27 ALL, 11 AML

expression samples

all_aml_test

20 ALL, 14 AML

expression samples

Preprocess

Filter uninformative genes

Preprocess

Filter uninformative genes

SOM Clustering

Cluster samples to separate tumor types

Weighted Voting

Train-test

Build a classifier and compute its accuracy on a test set

Class Neighbors

Find genes that most closely match a profile

Weighted Voting

Cross-Validation

Build a classifier and compute its accuracy using cross-validation

Golub and Slonim et al., 1999

example all aml analysis7
Example: ALL/AML Analysis

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.pipeline:00001:0

Training Data

Test Data

all_aml_train

27 ALL, 11 AML

expression samples

all_aml_test

20 ALL, 14 AML

expression samples

Preprocess

urn:lsid:broad.mit.edu

:cancer.software.genepattern.module.analysis:00020:0

Preprocess

urn:lsid:broad.mit.edu

:cancer.software.genepattern.module.analysis:00020:0

SOM Clustering

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00029:0

Weighted Voting

Train-test

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00027:0

Class Neighbors

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00001:0

Weighted Voting

Cross-Validation

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00028:0

Golub and Slonim et al., 1999

slide8
LSIDs enable
    • Reproducible research
      • exactly repeating an in silico experiment
    • ‘modernizing’ pipelines to latest
    • Tracking module provenance
  • Someday
    • Data will be available via LSID too…
future
Future…

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.pipeline:00001:0

Training Data

Test Data

urn:lsid:broad.mit.edu:

cancer.microarray:

abcde:1.0

urn:lsid:broad.mit.edu:

cancer.microarray:

zyxwv:1.0

Preprocess

urn:lsid:broad.mit.edu

:cancer.software.genepattern.module.analysis:00020:0

Preprocess

urn:lsid:broad.mit.edu

:cancer.software.genepattern.module.analysis:00020:0

SOM Clustering

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00029:0

Weighted Voting

Train-test

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00027:0

Class Neighbors

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00001:0

Weighted Voting

Cross-Validation

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00028:0

Golub and Slonim et al., 1999

other lsid use at the broad
Other LSID use at the Broad
  • Sample management
    • Sharing samples (tissues, clones, etc) between program groups
    • LSIDs identify samples
    • Permits scientists to find all experiments done with a sample in any Broad program
other lsid use at the broad11
Other LSID use at the Broad

2. GeneCruiser web service

  • annotation web service for microarray probes
  • maps probe set identifiers to GO, GenBank, SwissProt etc
  • Interface returns LSIDs to these other sources for their identifiers
use cases and future directions
Use Cases and Future Directions
  • What does it actually mean to identify a biological object such as "a gene"?
  • How does LSID address structural elements of biological and chemical objects?
  • What are the lessons learned from early implementations of LSID?
use cases and future directions13
Use Cases and Future Directions
  • What granularity of object do we identify?
  • Should LSID be a URI not a URN?
  • Should virtual persistent identifiers for derived/calculated properties be used?
  • What are the barriers to widespread use?
  • Data/Metadata split – is this a problem?
    • Phil Lord mentioned @end of yesterday in MyGrid talk
best lsid quote
Best LSID quote…
  • “LSIDs are in a sense just a sociological con trick, since they are nothing more than cheap and cheerful URNs” –David Shotten
ad