1 / 75

Cornell University Bioinformatics Facility

Development of a cross institutional database and data management system for research, clinical management and quality indicators in GI disease Jaroslaw Pillardy Bioinformatics Facility Cornell University Andrew Talal Division of Gastroenterology, Hepatology , and Nutrition

danyl
Download Presentation

Cornell University Bioinformatics Facility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Development of a cross institutional database and data management system for research, clinical management and quality indicators in GI disease Jaroslaw Pillardy Bioinformatics Facility Cornell University Andrew Talal Division of Gastroenterology, Hepatology, and Nutrition University at Buffalo, State University of New York Center for the Study of Hepatitis C Weill Cornell Medical College

  2. Cornell University • Bioinformatics Facility Provides bioinformatics state-of-the-art computational resources and analysis tools, and expertise in their applications, to the university community and to outside investigators. • Research collaboration • Access to bioinformatics computational resources • Software, database and website development • Project design and data analysis consultation • Bioinformatics workshops and training

  3. Development of a cross institutional database and data management system for research, clinical management and quality indicators in GI disease. The database and data management system was developed initially for Andrew Talal Lab in Weill Cornell Medical College. The system (Datamart) integrated tissue bank data and clinical data. The development started in 2008.

  4. Presentation Overview • Overview of the Datamart • Current limitations and future development • Applications

  5. Overview of the Datamart

  6. What is Datamart? • Information Management System (IMS) with • Flexible user-defined data structure • Integration of existing data from multiple sources • Management of Lab data (e.g. tissue bank, sequencing) • Web browser access • Point-and click interface for all interface functions • Data export to file or database • Text data mining (in progress)

  7. Datamart components: server Three parts: Web interface ASP.NET C# application. Requires Windows Server. Database MS SQL database. Requires MS SQL Server.Hep C database with 35,299 patients and 33,164,068 labs required 477 GB space (as of 9/3/2013). IRB approval has been obtained to have data on patients examined by WCMC GI division providers.Data from all patients seen by GI providers at Weill Cornell are eligible for inclusion in Datamart.

  8. Sample Repository 3. Repository 2,500 patients with 5,890 blood, liver, plasma, serum, PBMC, and buffy coat samples. We have two separate IRB protocols for samples in which we prospectively consent patients for either blood or liver with or without blood collection. The second consent is given to patients undergoing liver biopsy.

  9. Datamart components: client • Four major web browsers are supported: • Internet Explorer (7 or newer) • Firefox (3 or newer) • Safari (5 or newer) • Mozilla API based (1 or newer) • Most mobile browsers work, we may develop a mobile application to improve mobile content browsing if there is demand • The interface refuses to work with incompatible browsers

  10. How is Datamart different? Meta-IMS: Users define how data is stored and what are relations between forms Various data sources: Integrates with hospital clinical databases Internet access: Accessible from any device with web browser Data mining: Convert text based records into numeric records Using previous data: Easy import of previous lab data from Access or Excel repositories

  11. Data Structure defined by the user In a classical system any change in data structure requires programmatic changes in database and interface. In the Datamart, the data layout is defined by authorized users via web interface. Data is stored in forms and fields. Each field may be unrestricted data of a given type (txt, int, dec, etc.) or restricted to a list of values (pulldown).

  12. Data Structure defined by the user.

  13. Data Structure defined by the user.

  14. Database Layout defined by the user

  15. Accessing Data Browsing data with filters Data can be traversed using browsing form, with filters including all form fields. It is also possible display patient information if needed. Advanced search Custom queries, build online using web interface.

  16. Browsing data with filters Data can be traversed using browsing form, with filters including all form fields. It is also possible display patient information if needed.

  17. Browsing data with filters Data can be traversed using browsing form, with filters including all form fields. It is also possible display patient information if needed.

  18. Advanced search Queries can be built using graphical interface, by choosing forms, fields, filters. Queries can be stored in the server for other users. Resulting data can be exported to a file or a database table.

  19. Datamart advanced search

  20. Entering Data Manually Data can be entered or modified manually if needed. Each record can be edited, linked forms can be edited together. An example is data from the sample repository in which data on liver or blood samples need to be entered. Data import Preferred way of entering data is data import.

  21. Manually entering data

  22. Manually entering data

  23. Importing clinical data from hospital database (e.g. EPIC) Data can be imported from various database sources via database queries. Any data source that can be queried in SQL can be imported. Imported data are matched to the existing data and the format converted as appropriate. External data are then added or internal data is updated.

  24. Importing clinical data from hospital database (e.g. EPIC) Data matching is done using multiple fields (e.g. for patients “last name”, “DOB”, “PAT_ID”, “MRN”, “IDX” etc). Records with inconsistent match or otherwise inconsistent data are reported to the operator for manual inspection. Import can be stopped and restarted at any time. If an external source does not support incremental export, dump of the previous version of the export can be used to limit the amount of data to process.

  25. Importing clinical data from hospital database (e.g. EPIC) Rules for data matching and conversion are defined by an authorized user via graphical interface. Consistency check is performed during import, any data failing consistency check can be ignored or reported as inconsistent for inspection and correction.

  26. Importing data from other repositories Data collected in other media such as a text file, Access database or Excel spreadsheet can also be imported. This type of import is not automated and requires programming help. MS Access file can be configured as SQL query compatible resource by a server administrator, and then imported the same way as any other external source. Datamart has been initialized with data from Access database which was set up as SQL queryableresource.

  27. Importing data from other repositories Data collected in other media such as a text file, Access database or Excel spreadsheet can also be imported. This type of import is not automated and requires programming help. Excel file can be imported directly using a customized C# script written for this particular file, or the data can be imported into SQL database or Access database and then imported using the standard system

  28. Exporting data to text or database • Data can be exported from data browsing or a query. • Data can be exported to • tab separated text file, which can be further converted into any appropriate format (e.g. SAS) or opened in Excel. • database table for further processing • This type of data export can be very useful for clinical trials or transferring data between labs. Data can be further processed.

  29. Exporting data to text or database

  30. Exporting data to text or database

  31. Datamart security The IMS is protected with passwords and encryption / decryption keys. Every user is assigned to a role that defines the access and modifications rights. Sensitive data (names, DOB, MRN) are encrypted and can be only accessed by users with encryption / decryption keys. Not even database administrators have access to these data.

  32. Datamart security: no key login Multiple slides

  33. Datamart security The users’ activity in the LIMS is logged, including logins, logouts, data export and import. No data can be physically removed from the database – new versions are just visible in front of the old. The IMS has auditing and data recovery procedures. Any data record can examined using auditing page, where all the previous versions can be inspected, and the changes reverted if necessary.

  34. Datamart security: auditing previous record versions

  35. Datamart security: auditing previous record versions

  36. Datamart security • Client related security • Encrypted communication enforced • Only selected browser versions are accepted • Page will log out automatically if not used • Client browser caching is switched off • User cannot use “back” function from the browser

  37. Datamart security • Users • Access to data can be controlled on multiple levels • allow or deny access to encrypted data • allow or deny access to particular forms • allow or deny access to editing functions • allow or deny access to export/import functions • Users can be assigned group rights based on projects.

  38. Datamart data mining Many records from hospital databases contain useful data in free text format This data can be extracted and imported as numerical or categorized data points to other forms This feature is under development.

  39. Datamart data mining First, data is imported as is and stored in the database as free text. Second, data mining procedure is run (via web interface). Data mining procedure finds key phrases and words and extracts data in question (e.g. liver cirrhosis stage). The extracted data is converted into appropriate numerical or dictionary format and stored in another form, or another field in the current (mined) form. This way both formats are preserved: the original free text and extracted data. Data mining may be re-run in the future when improved data mining procedures are available to update extracted data.

  40. Current limitations and future development

  41. Optimization • The query design module needs to be further expanded to include more complicated data searching capabilities, including text search inside pre-defined entries and more flexible form joining. • The interaction with the database should be optimized taking into account database size, a balance between different types of data, which will result in significant speed increase.

  42. Data import • The Datamart is designed to import data form one external sourceat a time. • The import module itself is very flexible and can accept a variety of formats and data source types (any data source supporting SQL queries is compatible), but only one source at a time may be configured and used. • This feature is going to be changed to multiple simultaneous sources.

More Related