1 / 40

Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon

Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon. What do these have in common ?. One person knows – rest of you will have to wait !. What I hope to cover. Based on the Aberdeen Maternity and Neo-natal Data Bank ( AMNDB ) History of lookups – Dummy case SQL tabfiles

nerita
Download Presentation

Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lookupsor “the Good, the Bad and the Ugly ”John S. Lemon

  2. What do these have in common ? One person knows – rest of you will have to wait !

  3. What I hope to cover • Based on the Aberdeen Maternity and Neo-natal Data Bank ( AMNDB ) • History of lookups – • Dummy case • SQL tabfiles • Second data base • Second data base + secondary indexes + lookup command

  4. What is a “lookup” • Converting text -> numeric code • Or numeric code -> text • Data entry staff enter ‘uterine adhesions’ • Converted to ( and stored as ) 621.5 • Why ? • Space

  5. Why use a ‘lookup’ • Alternatives are storing full text • ‘uterine adhesions’ - stored as 19 bytes / characters • This is one of the shorter ones !!! • Problem of different ‘spelling’ • Uterine Adhesions • UTERINE ADHESIONS etc. • Which to use when searching

  6. Why use a ‘lookup’ • Store ‘uterine adhesions’ as 621.5 - stored as 8 bytes • less if using integers • or fixed length string codes ‘A6215’ - stored as 7 bytes

  7. Why use a ‘lookup’ • Value labels - not practical • Size limit on label length • Record specific • Problems managing an increasing list • Ordering / Sorting • Large numbers of codes:text pairs Occupations 13872 Drugs 5357 Operations 2754 Diseases / Illnesses / Complications 4404

  8. Dummy case • Only option in version 2 • Use a CASEID which was unique • In AMNDB used –999999 • This case only contained data for the ‘lookup’ record types • It worked but ….

  9. Dummy case - problems • Specific to that data base • Not shared – unless • SIR FILE DUMP - from ‘active’ • READ INPUT DATA - to ‘research’ • Have to repeat for all data bases that need lookups

  10. Dummy case - problems • Can only go ‘one way’ unless duplicate data with different Key Fields RECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODE RECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME • Huge problems of maintenance

  11. Dummy case - problems • MAX KEY SIZE • May have seen in LIST STATS • What is it ? • Need to have quick look at the ‘structure’ of a SiR record • Simplified not definitive view • If you want a complete definition – ask Tony !!

  12. SiR record structure 1 CASEID Max Key Size Data 2 CASEID Max Key Size Data Actual Size CASEID Max Key Size Data 3 Actual Size • Essentially two components • Key - fixed length - same for ALL records • May not be ‘filled’ for all records • Data - variable length

  13. Dummy case - problems • Without dummy case • Max Key Size - approx 20 bytes long • Text field – 50 characters long • Max Key Size - approx 60 bytes long • Extra 40 bytes • For EVERY record • 1.5 million * 40 bytes = approx 60 Mbytes • Trivial today but in 1988 ……………

  14. Dummy Case – solutions (sic) • How to resolve size problems • Reduced length of text field – 30 chars • One way only • Only in one data base – data entry • Used ‘marvellous feature’ of sequential data base access • SIR3 file held on tape(s) • Only PROCESS CASES ALL • No CASE IS

  15. Lookups use in AMNDB • Two main programmes / suites of programmes • Batch run to convert text -> code • Interactive programme for each record that requires lookups • Need both for different reasons

  16. Lookups use in AMNDB • Batch run to convert text -> code • Does all record types in data base that has data for looking up • Only converts records where text code is already available • Marks successful lookups • Leave unsuccessful ones for interactive

  17. Lookups use in AMNDB • One interactive programme per record with data for looking up • Functions in same way as batch except • If no match prompts for one of following - • Retype • Delete • Edit • Add new lookup code

  18. SQL tabfiles • Stuck with Dummy case until • SQL tabfiles • SAVE TABLE • SQL indexes • One external file could hold lookups

  19. SQL tabfiles • One external file – many advantages • One multi-user file for all lookups • Simplified maintenance - one file to backup • Multiple indexes per table ( cf. record ) • Max Key Size problem removed • BUT ……………

  20. SQL tabfiles - problems • Can only see the ‘whole’ file picture • No equivalent of File dump / Add recs • Need SQL+ - clumsy • PQL programs - not intuitive • Could use Forms but not easy • Journalling was suspect • Ended up using EXPORT for backup • Exporting SQL tabfiles was idiosyncratic

  21. SQL tabfiles - problems • If tabfile is ‘volatile’ – small frequent updates • Tabfile can get ‘corrupt’ • Verify ‘drops’ table instead of correcting • Updates appeared to work but later find records missing / corrupt • Just had to ‘live with it’ ……… and hope for better

  22. Second data base • No worries about Max Key Size • Multi user • Reliable DBMS utilities and functionality • UNLOAD • Journalling • File Dump / Add Recs • Easily look at the data • But ……………………

  23. Second data base • Considered but not used • Why ? • No alternative ‘views’ / indexes • Back to two copies of data RECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODERECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME • No real advantages – ‘devil you know’

  24. What do these have in common ? Still puzzled ??

  25. Two data bases + Secondary Index • Then in SiR2002 • Two, or more, data bases • Secondary Indexes • LOOKUP command • Decided to ‘go for broke’ • Use all three ‘features’ to remove previous problems

  26. Two data bases + Secondary Index • Two things spring to mind • Can of worms • Snail in a well

  27. Two data bases + Secondary Index • The can of worms was trying to understand code written many years ago • How many of you have ‘revisited’ PQL you wrote 5 years ago ? • Can you remember what you were trying to do ?

  28. Two data bases + Secondary Index • Do you add Comments to explain for future benefit C Only get records for Males over 60 • Use | for ‘inline’ comments . END PROCESS REC | PREGNANCY • It was ‘grey hair’ time – yet again !! • Not sure which is worse • teenage daughters • old SiR code

  29. Two data bases + Secondary Index • By hard work and perseverance managed to ‘decode’ the old code • This time I added Comments as I worked it out !! • So that was the can of worms sorted

  30. Two data bases + Secondary Index • Just left the snail in the well • How does this relate to SiR code • Climb 3 feet during day – slip back two feet at night • At almost every stage I encountered ASFs • ‘Another SiR Feature’

  31. Two data bases + Secondary Index • Enormous thanks to Tony for help, aid, assistance and patience • Six new versions of SiR within one week • Gradually felt that I was ‘climbing’ higher and higher

  32. Two data bases + Secondary Index • One day after yet another new version - programme ran OK • Message to Tony “I’m out of the well !! “ • Even so did some more testing

  33. Two data bases + Secondary Index • Two hours later I sent a message using a phrase from this show “I’ve fallen in the water” • Yet another problem

  34. Two data bases + Secondary Index • Yet again Tony came to the rescue • Apart from one bit of my code I need to sort the programmes are working OK • So how do we use • A second data base • Secondary Indexes • The LOOKUP command

  35. Two data bases + Secondary Index • All lookups are held in a second, caseless data base • The key fields are the numeric codes to keep MAX KEY SIZE to minimum • Journalling is turned on

  36. Two data bases + Secondary Index • The same code might refer to multiple text strings • Threatened Abortion • Thr Abr • TAbr • All mean same • Use AUTOKEY to cope • Secondary indexes on the text strings

  37. Two data bases + Secondary Index • Experiences so far • Still testing but looks good • Faster and more reliable • Can look at the data easily • Correcting invalid data is easy • All the power and features of vPQL and DBMS utilities for maintenance • Only one system to learn / remember

  38. Lookups – a summary • Dummy case • Rigid • Inflexible • Cumbersome • Can use DBMS etc. for maintenance • Now luckily replaced

  39. Lookups – a summary • SQL tabfiles • Very flexible • Obtuse • No easy maintenance • SQL+ is cumbersome • Reliability / integrity of tabfiles • Better but for long term – flawed • Ad-hoc work – re-building every time

  40. Lookups – a summary • Final solution with Second data base, Indexes & LOOKUP • Reliable • PQL is familiar • Fast • Combines good features of dummy case and tabfiles • Perhaps time will tell – but looks good

More Related