Session v life science identifiers use cases future directions
Download
1 / 14

Session V: Life Science Identifiers - Use Cases, Future Directions - PowerPoint PPT Presentation


  • 197 Views
  • Uploaded on

Session V: Life Science Identifiers - Use Cases, Future Directions. Recent History. LSIDs 3 years old I3C evaluating AGAVE, BSML encoded IDs as tuples/triples If we could not agree on a data standard, could we at least agree on how we write the identifiers. Today. OMG Spec

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Session V: Life Science Identifiers - Use Cases, Future Directions' - ina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Recent history l.jpg
Recent History Directions

  • LSIDs 3 years old

  • I3C evaluating AGAVE, BSML

    • encoded IDs as tuples/triples

  • If we could not agree on a data standard, could we at least agree on how we write the identifiers


Today l.jpg
Today Directions

  • OMG Spec

  • google “+LSID +bioinformatics”

    • 686 results (10/27/04, 2:40pm)

    • 700 results (10/27/04, 7:20am)


Broad use cases l.jpg
Broad Use Cases Directions


How genepattern is using lsids l.jpg
How GenePattern is using LSIDs Directions

  • Identify analysis tasks and pipelines via LSIDs

  • Create sharable pipelines referencing tasks via LSIDs

  • Provide a repository and retrieval for analysis tasks by LSID


Example all aml analysis l.jpg
Example: ALL/AML Analysis Directions

Training Data

Test Data

all_aml_train

27 ALL, 11 AML

expression samples

all_aml_test

20 ALL, 14 AML

expression samples

Preprocess

Filter uninformative genes

Preprocess

Filter uninformative genes

SOM Clustering

Cluster samples to separate tumor types

Weighted Voting

Train-test

Build a classifier and compute its accuracy on a test set

Class Neighbors

Find genes that most closely match a profile

Weighted Voting

Cross-Validation

Build a classifier and compute its accuracy using cross-validation

Golub and Slonim et al., 1999


Example all aml analysis7 l.jpg
Example: ALL/AML Analysis Directions

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.pipeline:00001:0

Training Data

Test Data

all_aml_train

27 ALL, 11 AML

expression samples

all_aml_test

20 ALL, 14 AML

expression samples

Preprocess

urn:lsid:broad.mit.edu

:cancer.software.genepattern.module.analysis:00020:0

Preprocess

urn:lsid:broad.mit.edu

:cancer.software.genepattern.module.analysis:00020:0

SOM Clustering

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00029:0

Weighted Voting

Train-test

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00027:0

Class Neighbors

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00001:0

Weighted Voting

Cross-Validation

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00028:0

Golub and Slonim et al., 1999


Slide8 l.jpg

  • LSIDs enable Directions

    • Reproducible research

      • exactly repeating an in silico experiment

    • ‘modernizing’ pipelines to latest

    • Tracking module provenance

  • Someday

    • Data will be available via LSID too…


Future l.jpg
Future… Directions

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.pipeline:00001:0

Training Data

Test Data

urn:lsid:broad.mit.edu:

cancer.microarray:

abcde:1.0

urn:lsid:broad.mit.edu:

cancer.microarray:

zyxwv:1.0

Preprocess

urn:lsid:broad.mit.edu

:cancer.software.genepattern.module.analysis:00020:0

Preprocess

urn:lsid:broad.mit.edu

:cancer.software.genepattern.module.analysis:00020:0

SOM Clustering

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00029:0

Weighted Voting

Train-test

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00027:0

Class Neighbors

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00001:0

Weighted Voting

Cross-Validation

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00028:0

Golub and Slonim et al., 1999


Other lsid use at the broad l.jpg
Other LSID use at the Broad Directions

  • Sample management

    • Sharing samples (tissues, clones, etc) between program groups

    • LSIDs identify samples

    • Permits scientists to find all experiments done with a sample in any Broad program


Other lsid use at the broad11 l.jpg
Other LSID use at the Broad Directions

2. GeneCruiser web service

  • annotation web service for microarray probes

  • maps probe set identifiers to GO, GenBank, SwissProt etc

  • Interface returns LSIDs to these other sources for their identifiers


Use cases and future directions l.jpg
Use Cases and Future Directions Directions

  • What does it actually mean to identify a biological object such as "a gene"?

  • How does LSID address structural elements of biological and chemical objects?

  • What are the lessons learned from early implementations of LSID?


Use cases and future directions13 l.jpg
Use Cases and Future Directions Directions

  • What granularity of object do we identify?

  • Should LSID be a URI not a URN?

  • Should virtual persistent identifiers for derived/calculated properties be used?

  • What are the barriers to widespread use?

  • Data/Metadata split – is this a problem?

    • Phil Lord mentioned @end of yesterday in MyGrid talk


Best lsid quote l.jpg
Best LSID quote… Directions

  • “LSIDs are in a sense just a sociological con trick, since they are nothing more than cheap and cheerful URNs” –David Shotten