The CDW Data Lifecycle - Internals, Data Flows, and Business Intelligence (Closet Skeletons Version) Richard Pham Enterprise Architect OI&T Corporate Data Warehouse – Architecture Richard.Pham@va.gov
CDW Informatics and Analytics Ecosystem REGION 2 REGION 4 REGION 1 • Hardware Stats • 411 Servers • 4 PB Storage • 54 Racks CDW – Corporate Data Warehouse RDW – Regional Data Warehouse REGION 3
Some Things Never Change • VHA and OI&T have a tense/unhappy relationship • OI&T project management bureaucracy is onerous • The use and oversight of contractors is problematic • Pharmacy knows what they are doing (more so than OI&T)
There were problems… • How do I maintain each file? • If I change one file, what happens to the other files? • How do I control growth of the files?
And there were more problems… • How can the databases share common elements like patient? • What if some idiot changes one table structure that collapses everything else? • Who remembers how this database was designed?
This is only two packages, think of the 100+ that are in VistA • Now, try extrapolating those trends in your head • Have a picture in your mind?
Even more problems…. • Is my data timely (Extract to production system time lag)? • Are the extracts one-time? Are they repeatable? • Who manages all these extracts? • No seriously, this becomes a really ugly problem
Why Am I Giving This Presentation? • Quite simply, feedback on: • “I don’t understand what you mean when you say “File” or “Pointer.” • “Where does the data come from?” • “How does the data get to CDW?” • Also, while you are using the CDW to prepare your work, it really helps if you know the origins of where the data comes from…
DHCP/VistA/CPRS/HealthEVet • VistA – Veterans Health Information Systems and Technology Architecture – 2nd Generation Architecture. Refers both to the architecture and the database which the architecture supports • DHCP - Decentralized Hospital Computer Program – The DOS (Unix-like) system where many of VistA’s non-clinical entries take place • CPRS - Computerized Provider Record System – A user-friendly GUI providing access to clinical order entry functions • HealthEVet – 3rd Generation of VA’s EMR. Planned inclusions are patient-facing applications, better alignment with coding standards, and MDS compliance.
Objectives • The main objective is to understand the data lifecycle of VA’s VistA/CPRS and the user experience of VistA/CPRS • A high-level overview of VistA Internals • Learn about data structures and outputs in VistA • Learn where data enters and travels throughout the VA • Try to make sense of data resources within the VA and how they are accessed
Core Patient Care Functionality • VistA is first and foremost an Electronic Medical Record. The architecture design supports veteran health care.
Core Patient Care Functionality • VistA Internals • DHCP • CPRS
VistA Internals 101 • MUMPS • Server and Operating System • Kernel • “Three Wise Men (Managers)” • TaskMan • MailMan • FileMan • Modules
Massachusetts General Hospital Utility Multi-Programming System (MUMPS or M) • My definition in English • M is a programming language designed for hierarchical databases that is convenient for medical applications or anything else where speed and data storage upkeep are a problem and programmer intelligence/organization is not • My technical definition • M is a Turing-complete, low and high-level, imperative, machine-compiled (no longer interpreted) programming language utilizing a hierarchical global array file structure • Used commonly in healthcare and financial industry settings
Structure of The Veterans Administration Data Efforts (Late 1970s) VHA Ancestor Department of Medicine and Surgery (DMAS) OI&T Ancestor Office of Data Management & Telecommunications (ODM&T) VHA-OI Ancestor Computer Assisted System Staff (CASS)
Comparing The Two Offices CASS ODM&T • Decentralized design philosophy • Rapid, agile development • SME-involved development • Centralized design philosophy • Bureaucratic, process-focused development • Development without SME’s
Highlights of ODM&T Development • Took 6 years to deploy APPLES Pharmacy at 10 sites • A 1980 paper detailing ODM&T’s transactional patient treatment file (PTF) system promised an interactive national solution by 1990. • Navigating the mandated 17 steps between system specification and deployment alone is said to have required at least 3 years.
Beginnings of DHCP • There were subject matter experts that believed that they could put out useful applications faster than the ODM&T sloth • Development of the testing and principles was done unofficially throughout the early to late 1970s
Original DHCP Design Principles • A commitment to rapid prototype development • All use ANSI MUMPS • Modular Design • Actively Maintained Data Dictionary • Code Sharing/Portability • Involve the SME’s
DHCP Kernel • Functions as both an operating system for VistA applications and an M virtual machine • Kernel shields DHCP modules from needing to know hardware and OS configurations on the server • Isolates M to the ANSI standard (1995) • Provides a toolbox of standard functions for most programmers
MUMPS Classic Database • One Data Type • String (Text) • Other types • Cardinal Numbers • Float Numbers • $H Dates • One Data Storage Type • Multidimensional Array aka Globals • Dynamic (duck) typing
VistA Data Organization • Namespace • File • Field • Record • 654 (VAMC Reno) • File 120.5 (GMR Vitals) • Field 0.1 (DATE/TIME VITALS TAKEN) • IEN-1, BP, 140/90 • Most Files have an entry at the 0.001 Field called “IEN” or “Internal Entry Number” as an identity key to mark the record as unique
From The Beginning - Entry • An entry is a “piece” of data • Richard – First Name • Pham – Last Name • 05/03/1983 – Date of Birth
Record (Row) • A group of related data • Richard Pham • M • 05/03/1983
Field • A group of related data • Richard – First Name • Pham – Last Name • 05/03/1983 – Date of Birth
File • A group of related fields and the records that we have • File 200 – NEW PERSON • Richard – First Name – 200 • Pham – Last Name – 200 • Date of Birth -
File Relationships • One-to-One - Pointer • One-to-Many (Subfile, Multiple) • Self-referential (Recursive) • Reverse Recursive (Past Records) • Forward Recursive (Replace Records) • Pointer with Logic – Multiple POinter
File Relationships - Pointers • When two files share a common field with each other, this is called a pointer • There are three major types • Pointer - One record in one file matches to one record in another file • Self-Referential – One record in one file matches to one record in the same file (in the past or the future) • Multiple – One record in one file matches to many records in one file (parent-child) • Variable – One record and some logic matches to one file
Pointers File 52 PRESCRIPTION Field 2 Patient File 2 PATIENT All fields One-to-one
Self-Referential Pointer Warning – DO NOT $o these fields without programmer assistance! You will bring down DHCP this way!!! File 100 OE/RR Field 9 Replaced Order File 100 OE/RR (Past Order) Present-to-Past File 100 OE/RR Field 9.1 Replaced Order File 100 OE/RR (Future Order) Present-to-Future
Multiple Subfile File 52 PRESCRIPTION Field 52 Refill Subfile File 52.1 REFILL All fields One-to-many
File 50.605 DRUG CLASS Multiple Subfile File 50 DRUG One-to-many files File 120.8 PATIENT ALLERGIES Field 1 GMR Allergy File 50.6 NATIONAL DRUG 120.2 GMR ALLERGIES File 50.416 DRUG INGREDIENT
Computed/MCode • A placeholder that does not contain any stored information • Calculated ad hoc when you look up the value • Warning – For this reason, the value ALWAYS has the possibility of changing
How Complicated Is The Pharmacy Package? • 440 files in the File 50 Series • 3,175 fields • 527 Pointers • 310 External References