RECONSTRUCTING COMPLEX HEALTH SERVICE DATA WITH

RECONSTRUCTING COMPLEX HEALTH SERVICE DATA WITH by Gilbert MacKenzie & Xuefang Li The Centre for Medical Statistics

Introduction Increasingly there is interest in monitoring and evaluating Hospital Performance in the NHS. This has led to the compilation of • League Tables Classically, such data are observational and their formal statistical evaluation is by covariate or “case-mix” adjustment But NHS hospital performance indicators may not be patient-based or directly related to patient well-being – hence complicating “case-mix” adjustment

Performance Indicators These may include: • Occupied Bed-days • Re-admission Rates • Medial Outlier Rates NB: Often these are Episode – rather than patient-based

Classical Data Structure Typically, in any period, hospital data are of the form • Patient • Admission(s) • Completed Consultant Episode(s) But Performances Indicators are typically based on the Activity of the hospital ie, on the last two above.

North Staffs Study • Aims • To compile a Cumulative Event Patient History (CEPH) file • For All ordinary Admissions to all Hospitals in North Staffordshirefor 1997-2000 • In order to provide a patient-based analysis of performance on a year to year basis

North Staffs Study • Data • Data Source LHA NHS Warehouse Data • Episode-Based for All Admissions to all Hospitals in NS for 1997-2000 • Some 326,236 episodes over this period. • But NHS identifier missing in 25% of episodes !!

Missing Values for Main Variables in the 325,236 Episodes    

North Staffs Study Ideal Data Structure Record 1 Pat 1, Adm 1, E1 Record 2 Pat 1, Adm 1, E2 Record 3 Pat 1, Adm 2, E1 Record 4 Pat 2, Adm 1, E1 Record 5 Pat 2, Adm 2, E1 Record 6 Pat 2, Adm 2, E2 Record 7 Pat 2, Adm 3, E1 Record 8 Pat 3, Adm 1, E1 Patient 1 Patient 2 Patient3

Simple Patient Matching Algorithm NHS Matching Criteria : C= (Sex, DOB, Postcode) Step 1: Define Set A as Missing ID (n=82,906 episodes) Step 2: Define Set B as Known ID (n=242,330 episodes) Step 3: Use C to match Set A with Set B Step 4: Consolidate 13,114 Matches in Set B Step 5: Call the reduced Set A set Set A* &Set B , Set B* Step 6: Use C to to match Set A* with Set B* Step 7: Consolidate Matches in Set B* Step 8. Finally Use C to match A** with A** Step 9: Allocate new NHS numbers to residual in A***

Matching Result • Overall Result • Total Missing = 82,906 (25.5%) • After 1st Match = 69, 771 (21.5%) • After 2nd & 3rd = 46, 527 (14.3%) Accuracy • About 4%-5% are wrongly matched • Also about 7% with known NHS numbers were really different people (Sex, DOB, Postcode).

Data Structure Ideal Attained * Record 1 Pat 1, Adm 1, E1 Record 2 Pat 1, Adm 1, E2 Record 3 Pat 1, Adm 2, E1 Record 4 Pat 2, Adm 1, E1 Record 5 Pat 2, Adm 2, E1 Record 6 Pat 2, Adm 2, E2 Record 7 Pat 2, Adm 3, E1 Record 8 Pat 3, Adm 1, E1 * But variable number of records per patient Now the Target CEPH File is a Flat SPSS System File With one record per patient

Data Structure Target CEPH filestructure Record 1 Pat 1, E1 E2E1EX Record 2 Pat 2, E1 E1 E2E1 Record 3 Pat 3, E1EX EX EX . Where 1) E’s are sets of episode data 2) E1 E2 => Relates to same Admission 3) EX => a set of system missing values 4) Within patient the E records are in chronological order. Regular Cases by Variables System File

Defining Complex File Structures SPSS File type GROUPED command File Type Grouped File='c:\my documents\oldcare\LHA20-21new.dat' Record= #epi_id 300-301 Case=nhs_num 1-10 missing=nowarn. Record Type 1. Data list /V0101 12 V0201 14-24 (A) V0301 26-33 (A) V0401 35 … Record Type 2 . Data list /V0102 12 V0202 14-24 (A) V0302 26-33 (A) V0402 35 … Etc to a max of 51 episode records for the NS study End File Type. NB Other subcommands include: Duplicate, Skip, Ordered, Case.

Data Structure Making the CEPH fileuseable Record 1 Pat 1, E1 E2E1EX {index structure} Record 2 Pat 2, E1 E1 E2E1{index structure} Record 3 Pat 3, E1EX EX EX {index structure} . . . Now build up useful patient-based Performance Indicator quantities using SPSS’s powerful transformation language – use Vector and Loops to store and search frequently used quantities & addresses, eg

Examples of Index Building Comment compute SUMR - the number of episodes (records) per patient vector vdata48=V4801 to V4851. compute #sumr=0. loop #j =1 to 51. if ( not(missing (Vdata48(#j )) )) #sumr= #sumr+1. end loop. compute sumr=#sumr. Comment compute SUMA - the number of admissions per patient. Vector vdata47=V4701 to V4751. compute #suma=0. loop #j =1 to 51. if ( not ( missing (vdata47(#j )) ) ) #suma=vdata47(#j ) . end loop. compute suma=#suma.

Examples of Index BuildingCont’d Comment compute first episode address for each admission. Vector vdata47=V4701 to V4751. Comment Zeroise. Do repeat i=Adm01 to Adm51. Compute i=0. end repeat. Comment Declare Missing. missing values adm01 to adm51 (0). Vector Adm=Adm01 to Adm51. Comment compute address. compute Adm(1)=1. compute #k=1. loop #J=2 to sumr. do if (vdata47(#j) eq vdata47(#j-1) +1). compute #k = #k+1. compute adm(#k)=#j. end if. end loop.

North Staffs Study • Results • Data Episodes = 321788, Admissons= 284,965 Patients = 188,745 • Comprehensive patient-based Index built covering all major NHS Performance Indicators • 5 Files: 1997, 1998, 1999, 2000 & 1997-2000 • Descriptive analysis by Hospital types and diagnostic category (modelling to follow)

North Staffs Study

North Staffs Study Numbers of patients by trust by year 1997 1998 1999 2000 Acute Trust43975 44396 42006 44517 Combined Trust 4392 4051 5084 3965 Total 47554 48138 46194 46859

North Staffs Study Table 3.3 Average & Median length of stay by disease by year in the Acute Trust

North Staffs Study Table 3.4 Average & Median length of stay by disease by year in the Combined Trust

North Staffs Study 

Some Conclusions • The Complex File Commands Mixed, Group and Nested are very useful - flexible and safe. • Need to be revised to remove Dependence on ASCII input for complex health data – too big. • Transformation language is SPSS means Database Index can be built easily. • Patient-based Performance Indicators as a standard is an exciting prospect. • Results in North Staffs suggest that health of population is declining – leading to greater utilisation with time.

RECONSTRUCTING COMPLEX HEALTH SERVICE DATA WITH