metadata what is it and why we need it l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Metadata – What is it, and why we need it PowerPoint Presentation
Download Presentation
Metadata – What is it, and why we need it

Loading in 2 Seconds...

play fullscreen
1 / 16

Metadata – What is it, and why we need it - PowerPoint PPT Presentation


  • 146 Views
  • Uploaded on

Metadata – What is it, and why we need it. By: Roman Olschanowsky roman2u@sdsc.edu. Metadata - data about data?. System metadata (most file systems) Developed for OS, not very helpful to you Size, owner, permissions, timestamps, … Standardized metadata

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Metadata – What is it, and why we need it' - roxy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
metadata what is it and why we need it
Metadata – What is it, and why we need it

By: Roman Olschanowsky

roman2u@sdsc.edu

metadata data about data
Metadata - data about data?
  • System metadata(most file systems)
    • Developed for OS, not very helpful to you
    • Size, owner, permissions, timestamps, …
  • Standardized metadata
    • File headers: jpeg, mp3, DICOM(s), …
    • Dublin Core: Title, Creator, Subject, Date, …
  • User defined metadata
    • XML: (Whatever I want !!!)
    • Database: (Whatever I want !!!)
    • SRB: (Whatever I want !!!)
system metadata
System Metadata

Q: If all I have is a plain file system, how do I do metadata?

A: Organization, build a meaningful hierarchy

Patient (Roman)

label

mri

surf

Log File

Surface File

Label File

flash

brain

wm

filled

aseg

norm

transforms

Transform File

Slice File

Slice File

Slice File

Slice File

Flash File

Parameter_maps

Slice File

a good hierarchy is this enough
A good hierarchy - Is this enough?
  • I now have 1000’s of patients.
  • Dr. Suchandsuch asks me: How many of your patients have a cranial thickness greater than .5 inches?
  • We can dig through all the images and measure the thicknesses, but now where to store the results?
  • 50% are greater than .5 inches
  • Great! Now how many of those are male, and were scanned with a GE system?
  • Sir, 75% male and GE, other 25% male too but scanned with different systems (fictional numbers)
standardized metadata
Standardized Metadata
  • Dublin core: What is the bare minimum metadata that needs to be present?
    • Everybody's idea of ‘bare minimum’ is different
    • What’s left isn’t very useful: Format: Power Point File
  • File Headers:
    • Very useful
    • (Think of them as system metadata for that file type)
    • Width: 10px|bite rate: 128 Kbps|Scanner: GE
    • But, the more files you have the slower it gets!
    • Who decides what that header is? Does everybody actually follow that standard?
user defined metadata
User defined metadata
  • Finally, a place to store my “cranial thickness” attribute.
  • XML:
    • Great! It’s not platform or application specific.
    • But, it’s usually slow, and with lots of overhead.
  • Database:
    • Great! It’s fast and it gives me my answers, more flexible (primary / foreign keys)
    • But, it’s expensive (Labor, licenses) Worst: It’s separate from the data, things can become out of sync.
  • SRB:
    • Great! It’s fast and it’s apart of the same system as the data.
    • But, what if I take the data out of the system? How does the metadata leave too?
srb metadata
SRB metadata
  • User Defined Metadata
    • 9 string fields, 2 integer fields (unlimited elements)
    • Attribute Value Units
    • Width = 10 meters
  • DAI (Database access Interface) README.DAI
    • Register your own DB with SRB
    • Query/Insert/Update with “Ssql” (Shadow object)
  • Extensible Schema README.extensibleschema
    • Add your own metadata tables directly to MCAT
    • You can associate new metadata on file by file basis
    • Need access to MCAT as well as SRB server
    • Not Easy! Lots of additional steps for every file
slide8

Subject

Insitution_VisitID

Study

SPM Analysis

Analysis_

user_

timestamp_

toolcode

Analysis n+1

Analysis n+1

Analysis n+1

Analysis n+1

Analysis n+1

SPM Format

Snapshot 1

Snapshot n

Analysis n

Analysis n

Analysis n

Analysis n

TaskData

KSPACE

Original

DICOM

Native

Series

BIRN Human Collection and Metadata hierarchy

Analyses on many subjects across institutions

BIRN_ID

Timestamp

Collections

Metadata

XML file

XML file

Analyses on a subject across institutions and studies

VisitID?

XML file

XML file

Analyses on many series of a subject within an institution

StudyID?

XML file

XML file

Analyses on muliple Series done at 1 institution

Image/Scanner

Parameters?

XML file

XML file

Analyses on images from this Series

XML file

XML file

………

XML file

XML file

XML file

XML file

XML file

Freesurfer

LDMM

Original is a pointer to the corresponding original scanner format

XML file?

XML file?

slide9

Directory Hierarchy

SRB Metadata

XML elements

(non-structural)

HID Database

Notes

BIRN

Should analyses that cross multiple data levels be split out to separate hierarchy?

Human

All Analysis collections are writeable so that users can create their own analysis collections (snapshots)

Research Project

(Name__ID)

<project>

Project ID

nc_experiment

Analysis

Subject

(BIRN ID)

BIRN ID

Timestamp

<subjectConst>

nc_humanSubject

Analysis

Institution Visit

(Visit__Site ID_Visit #)

Visit ID

Institution ID

<subjectVar>

Analysis

nc_expComponent

Study

(Study__ID #)

<scanner>

Analysis

Study ID

Series

(Series__localID)

Series Number

Scanner Parameters?

nc_expSegment and [protocol section]

Analysis

Separate the native data and analysis for easier access control and separation (Brian’s email)

Analysis

Native

Native Data: Represents an upload of the “original data”

Analysis: Represents a different analysis (either partial or full)

[research and derived data sections]

<acqProtocol>

<expProtocol>

<datarec>

Image Parameters?

Snapshot 1

Snapshot 1

DICOM

AFNI

Analysis

Sub Tree

• • •

Analyze

Derived versions of an individual series should remain with that series?

Snapshot N

(Ver__SER)

Snapshot N

all problems solved
All problems solved?

Why are you calling it “skull thickness”?

It’s suppose to be “cranial thickness”!

You have to query on “brain”, not “purkinje cell”

But, a “purkinje cell” is part of the “brain” shouldn’t the system know that?

ontologies
Ontologies

For AI systems, what "exists" is that which can be represented. When the knowledge about a domain is represented in a declarative language, the set of objects that can be represented is called the universe of discourse.

We can describe the ontology of a program by defining a set of representational terms. Definitions associate the names of entities in the universe of discourse (e.g. classes, relations, functions or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms.

Formally, an ontology is the statement of a logical theory.

distribution of ryanodine receptor in cerebellum
Distribution of Ryanodine receptor in cerebellum?
  • Navigates down domain map
  • Situates result in context of domain map

Brain

has a

Cerebellum

has a

Purkinje Cell Layer

has a

Purkinje cell

is a

neuron

anatom domain map
ANATOM Domain Map
  • Rule-based ontology map
  • Encodes conceptual and semantic relationships using F-logic
scared
Scared?

Do:

  • Design a file hierarchy
  • Agree on a “Standard Vocabulary”
  • Add metadata in the right places, and several places
  • You can always add or change things later, doesn’t have to be perfect the first time
  • If it’s there you will use it!
  • What metadata do other people want?
  • Automate the process! (scripts and or workflows)

Do not:

  • Wait. It’s harder to add metadata after the fact.
  • Do things manually, see #7 above
  • Attempt an ontology, professionals are working on them already! (Unless it’s already in your approved grant)
thanks
Thanks!

Questions?

www.sdsc.edu/srb

srb@sdsc.edu