540 likes | 741 Views
Enterprisecover all aspects of scientific researchdata capturereagent use and purchasing trackingProtocol-specificcover a specific protocoldata capture. Types of LIMS. data warehouse. sample management. inventory management. data collection. instrument management. . . . . . chain of custody. r
E N D
1. What is a LIMS ? LIMS – Laboratory Information Management System
Computerized system that tracks and manages samples through a protocol
interfaces for both laboratory personnel and instruments
helps support high throughput operations Laboratory Information Management System or LIMS is a system designed specifically for the particular laboratory. This might include research and development labs, testing labs, quality assurance labs, and more. Typically, LIMS connect the analytical instruments in the lab to workstations or personal computers. These instruments – for example chromatographs -- are used to collect data. An instrument interface is used to forward the data from the hardware to the PC, where the data is organized into meaningful information. This information is further sorted and organized into various report formats. A full-featured LIMS will manage the various lab data on every step of the protocol.
(Next slide).
<What is high throughput? What is small scale?>
Laboratory Information Management System or LIMS is a system designed specifically for the particular laboratory. This might include research and development labs, testing labs, quality assurance labs, and more. Typically, LIMS connect the analytical instruments in the lab to workstations or personal computers. These instruments – for example chromatographs -- are used to collect data. An instrument interface is used to forward the data from the hardware to the PC, where the data is organized into meaningful information. This information is further sorted and organized into various report formats. A full-featured LIMS will manage the various lab data on every step of the protocol.
(Next slide).
<What is high throughput? What is small scale?>
2. Enterprise
cover all aspects of scientific research
data capture
reagent use and purchasing tracking
Protocol-specific
cover a specific protocol
data capture Types of LIMS
5. Microarrays large-scale sequencing projects like the human genome project have given us the ability to examine the complete transcriptome (the transcriptional response to an environmental challenge
new (and expensive) technology
large output of data
6. Microarray Data produced in a tabular format (rows and columns)
users are relatively unsophisticated in computational and informatic skills
much data ends up in spreadsheets which lack the capability to handle rich datasets (no complex query or visualization capabilities)
7. Microarray Databases plethora of databases and schemas
three types of interactions:
local data management
publication of data in a repository
analysis of repository data
the latter two interactions require a certain level of sophistication to consolidate exogenous data
8. Microarrays: Concept
9. Microarrays: Raw Data
10. Microarrays: Data
11. Local Databases make data available to local researchers
may have WWW-based tools
database and compute server centralized and closely linked
12. GeneX National Center for Genome Resources
www.ncgr.org/research/genex
relational database with Perl, R, and Java components
13. GeneX Features Free
integrated and extensible toolset
multiple types of array technology in single database
experiment-centric design
supports an XML specification to allow interchange between databases
14. BASE BioArray Software Environment
http://base.thep.lu.se/
Relational database (MySQL) with WWW interface built upon C++/javascript/PHP
15. BASE Features Free
MIAME compliant
user administration
array production
sample management
17. Repositories provide public access to multiple datasets
create standard database similar to sequence
automatic deposition of data upon publication
18. Stanford Microarray Database genome-www4.stanford.edu/MicroArray
www-based database and a dataset distribution system
relational database
perl/java toolset
supports some complex querying as well as browsing for datasets
datasets distributed as compressed flat-files and/or graphical images
19. GEO Gene Expression Omnibus
www.ncbi.nlm.nih.gov/geo/
data repository and distribution system
precomputed definitions and descriptions of data to aid in data set retrieval
20. Data Interchange Proposed interchange standard
MIAME
Proposed OMG exchange standards
MAML
GEML
NetGenics
21. MIAME Minimal Information About a Microarray Experiment
www.mged.org/Annotations-wg/
Goal
specify the minimum amount of information needed to ensure interpretability
facilitate creation of repositories
encourage journals and funding agencies to require submission of data to repositories
22. Design Considerations reflect data accurately
efficient access to data
efficient storage of data
compatibility with other databases
23. Data Representation
24. MIAME Considerations Experimental design: the set of hybridization experiments as a whole
Array design: each array used and each element (spot) on the array
Samples: samples used, extract preparation and labeling
Hybridizations: procedures and parameters
Measurements: images, quantitation, specifications
Normalization controls: types, values, specifications
26. Background Center for Biomedical Genomics and Informatics
Engaged in a number of gene expression studies ranging from liver disease, osteoarthritis and cancer
Species studies human and rat
cDNA in house printed slides (5K human chip, 40K human chip)
27. GMU Clinical Genomics studying the relationship between disease and genome expression
clinical measurements
standard battery of tests
genomic measurements
gene expression levels
genetic variation
derive correlation between clinical/genomic factors and treatment outcome
29. Dataflow
30. Generic difference in gene expression patterns We do this via visual inspection following clustering (genes and samples)
Often we will reduce the number of genes by some criterion (e.g., cluster only on genes that are 2-fold expressed in at least one sample/category)
Often we will group the number of samples by condition in order to compensate for the lack of replicates
31. Clustering of genes and samples
32. Disease vs. Normal 9 genes
Normal vs. disease9 genes
Normal vs. disease
33. Clinical Data Challenges Collection
text formats
disperse sources
Storage
heterogenous
incomplete
degenerate
Protection
HIPPA regulations
34. Large Clinical Databases Nadkarni and Brandt (1998) JAMIA 5, 511
Issues involved in data mining EAV databases
Nadkarni et al. (1999) JAMIA 6, 478
Extension of EAV with classes and relationships
Chen et al. (2000) JAMIA 7, 475
Performance of EAV/CR
35. Issues with Clinical Data Too many columns
Over 43,000 attributes
Sybase capacity
1024 columns per table
32 indexed
up to 50 tables per query
Sparse data
Multiple entries
36. Sample Clinical Table
37. Solution: EAV Entity-Attribute-Value
form of row modeling
turns columns into rows
eliminates sparse data
reduction in database size
Faster single value queries
Pushes depth rather than width
38. EAV Clinical Table
39. Accessing Single Attributes
40. Limitations for Data Mining Complex boolean queries tough
no set operations
Complex SQL
nested subqueries
self-joins
Performance
41. Ad Hoc Query Interface Presents a user interface which generates the required complex SQL queries
42. EAV/CR Simulation of a complex logical schema using an extensive yet simple physical schema
Addition of object tables to contain like attributes
strong data typing
Creates metadata about objects to help describe the relationships between data objects
45. Testing EAV/CR Data sources
used microbiology data from VA patients
extracted from existing DB
loaded in EAV/CR schema
scaled by replicating data with new IDs
Benchmarking
two attribute centered queries
two entity-centered queries
48. Results Comparable speeds for entity queries
massive hit for attribute query
up to 10-fold worse
"ancestor" improvement
represents denormalization
space for performance trade-off
49. EAV for Clinical Genomics ? performance issues a problem
data mining on attributes
I/O issues
full EAV not feasible
partial row modeling a good option
50. Clinical Database Used CGO database out of Univ of Arkansas as a template
Myeloma database
Want to generalize it for any cancer
52. Altering CGO remove gene chip references
affymetrix
MIAME/MAGE non-compliant
attach to GeneX
generalize clinical system
row model test results
row model questionaires
54. HIPAA Health Insurance Portability and Accountability Act
“ensure the integrity and confidentiality of [patient] information, protect against reasonably anticipated threats or hazards to the security or integrity of the information or unauthorized uses or disclosures of the information”
55. Clinical Data Flow