1 / 34

PIMS data management and harvesting

PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?. Information Management System. Information Management System (IMS) is a joint database and information management system

erna
Download Presentation

PIMS data management and harvesting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PIMSdata management and harvesting

  2. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

  3. Information Management System • Information Management System (IMS) is a joint database and information management system • A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data • Information management is the handling of knowledge acquired by many disparate sources in a way that optimizes access by all who have a share in that knowledge

  4. Scientific goals • Recording laboratory information • A lot of data keeping • 10,000s of experiments • 1,000,000s of samples • Data interchange and interoperation • Collaboration in protein production • Share data between stages and sites • Data transfer to beamline or NMR ops • Data mining and reporting • Analysis • Negative results can be mined to improve methods • Scientific publications • Data deposition

  5. PIMS • Protein Information Management System • Started in January 2005 • 5 years UK project, funded by the Biotechnology and Biological Sciences Research Council (BBSRC) • Based on the Protein Production Data Model paper • Proteins. 2005 Feb 1;58(2):278-84. “Design of a data model for developing laboratory information management and analysis systems for protein production.”

  6. Scope of PIMS Target selection Bioinformatics import Target optimisation Cloning Expression Purification & Concentration Crystallisation Microcrystals export Molecular Biology Data collection Phasing Model building Crystallography Refinement

  7. BBSRC SPoRT funding Scottish Structural Proteomics Facility (SSPF) Universities of Dundee, St. Andrews, Glasgow and Warwick. Membrane Protein Structure Initiative (MPSI) Universities of Glasgow, Leeds, Oxford, Sheffield, Imperial College, Birkbeck College, UMIST and CCLRC Daresbury. Protein Information Management System (PIMS) CCP4, Diamond Oxford Protein Production Facility IBBMC, University Paris Sud European Bioinformatics Institute York Structural Biology Laboratory Daresbury Laboratory Other UK protein scientists Other protein scientists worldwide BBSRC funding PIMS SSPF MPSI Stakeholders

  8. Collaborations • Seamless data transfer and a consistent UI ... • ... from target to structure deposition • ... so far as possible • Bioinformatics: SSPF pipeline, EBI workflow • Crystallization: NKI, EMBL Hamburg & Grenoble (BIOXHIT) • Data transfer: e-HTPX • Data collection: DNA, X-track • Structure solution: CCP4, CCPN • Instruments: Kendro, Csols

  9. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

  10. Design • The data model • focuses on what data should be stored • is used to design the entities (classes or tables) that we are dealing with, their various attributes, and their relationships • The goal of the data model is to make sure that the all data objects required are completely and accurately represented

  11. Reliability • Loss of data is inexcusable • Must be able to correct wrong data • Must keep audit trails • Must allow future changes • All made feasible by • Data model • Database • Software engineering standards

  12. HalX: an open-source LIMS (Laboratory Information Management System) for small- to large-scale laboratories. Acta Crystallogr D Biol Crystallogr. 2005 Jun;61(Pt 6):671-8. Prilusky J, Oueillet E, Ulryck N, Pajon A, Bernauer J, Krimm I, Quevillon-Cheruel S, Leulliot N, Graille M, Liger D, Tresaugues L, Sussman JL, Janin J, van Tilbeurgh H, Poupon A. OPPF based on Nautilus MOLE: a data management application based on a protein production data model. Proteins. 2005 Feb 1;58(2):285-9. Morris C, Wood P, Griffiths SL, Wilson KS, Ashton AW. Ancestry

  13. PIMS • The aim is to provide a Laboratory Information Management System (LIMS) • for Laboratories that produce proteins from target genes • can be incorporated into commercial software in the area of biotech and protein production • Improve the quality of the experimental data deposited into PDB • by providing a software for lab scientists to harvest their daily experimental data from protein production to structure • My roles • Data Model • Database / Persistence layer / Java API • Java Applet development

  14. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

  15. Why is Data Modelling Important? • A Data Model is a plan for building a database • detailed enough to be used to create the physical structure • simple enough to communicate to the end user the data structure • The Unified Modelling Language (UML)

  16. Data Model • Related to protein production & crystallisation • Suitable for large & small facilities • Required to reproduce the samples & experiments involved • Used for tracking samples, experiments & results • Developed to help software developers to collect, store and exchange information through the provision of a common platform

  17. Protein production work is generally the investigation of a particular protein, the Target The work often aims to produce a derivative of the Target, such as a single domain or complexes Area covered target protein production crystallisation NMR tube X-Ray NMR phasing structure

  18. The Core Data Model

  19. Change Control Board • The data model is a work in progress • The science is developing too • Local protocols, which are novel and confidential • Not easy work • Thanks to… • Geoff Barton (Dundee) • Steve Prince (Manchester) • Anne Poupon (IBBMC) • Jon Diprose (OPPF) • Alun Ashton (Diamond) • Rasmus Fogh (CCPN)

  20. Implemented in UML (Object Domain) Developed within a framework provided by the CCPN project Information stored in the UML Data Model is used to generate automatically SQL schema, Java Application Program Interfaces (APIs) and Documentation UML Data Model Generation machinery framework XML schema Python API Doc SQL schema Java API www.ccpn.ac.uk

  21. DB SQL schema Architecture • The API provides methods to access the underlying DB to store and retrieve data • This allows applications to manipulate data without a detailed knowledge of the way in which the data is stored • Various different applications make use of the API • LIMS • Any High Throughput applications (non-GUI) • They are able to exchange data easily storage API Tools: GUI, standalone applications,… Java API Persistence layer

  22. From data model to application • Data Model • Use cases • Scientific logic into requirements • Specifications • security, performance, usability, etc • Java API • Test data • UI Design • Application

  23. Modular Construction • http://www.pims-lims.org/project/use-case-suite.html Training & Support Workflow Reporting Visualisation Data Mining Scheduling Data Capture Mobile Data Collection Instrument Management Inventory Management Sample Management Bioinformatics System Administration Setup & Configuration Access Rights Management Project Management Reference Data

  24. Reference data • Supplier details • Protocols • documenting set of editable default protocols • user interface design with Ed Daniel • Reagents • protocol-related reference samples • chemical hazard information • e.g. R and S-phrases • documenting lab chemicals as ‘MolComponents’ • includes synonyms, formula, CAS-number and mass • naming system under discussion with NKI • ~400 identified, ~180 based on crystallisation screens

  25. Analytical Data: A Tower of Babel Integration CSols produces a widely used Instrument Integration Package if the PIMS I/O is implemented in a reasonable timescale CSols may develop a PIMS Driver Kendro/Thermo LC MS IR NMR Instrument management

  26. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

  27. What can PIMS do for you? Not a lot right now Whatever you want, eventually ... ... as long as it's data management for protein production

  28. Version 0.2 • October 2005 • Then incremental delivery • … for one customer at a time and integrate with trunk • … and repeat until project complete

  29. Protocol Editor

  30. Applet Protocol Editor • Choose a step from a list • Draw Temperature step • List of the protocol's steps already done and reload them from the bottom of the screen • Record the protocol in DB • Display the protocol's list from DB in the explorer and reload anyone of them

  31. Applet Workflow • Select in tabulation the experiment categories • Drag and drop the selected experiments • Build a workflow or load an existing one • Associate a protocol to an experiment

  32. A collaborative framework • … to develop a family of LIMSes • Developers have difficulty in justifying the time required to create the software needed • The biologist doesn't want to wait • The result is a rapidly written LIMS that is fragile and cannot scale if the project grows up • Need a generic LIMS • helps to solve these problems by giving developers a tool that can scale to meet the needs of a large project • And which welcome plugins for novel methods

  33. Conclusion • Each “Click” could be a lot of coding ... • What do molecular biologists really want? • Expectations are High! • Users make an indispensable contribution • Tell us when it's not good enough ... • ... we will respond

  34. PIMS developer group Chris Morris (CCP4) Anne Pajon (EBI) Ed Daniel (Daresbury) Peter Troshin (MPSI) Jo van Niekerk (SSPF) Susy Griffiths (YSBL) Jon Diprose (OPPF) Katherine Pilicheva (OPPF) Anne Poupon (IBBMC) Eric Oeuillet (IBBMC) Sabrina Haquin (IBBMC) Alun Ashton (Diamond) EBI-MSD Kim Henrick Wim Vranken John Ionides CCPN Wayne Boucher Rasmus Fogh Tim Stevens Dan Acknowledgements

More Related