Finding eliminating rogue hex characters in text fields
Download
1 / 14

Finding & Eliminating Rogue Hex Characters in Text Fields - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

Finding & Eliminating Rogue Hex Characters in Text Fields. Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie. The Problem. Chart abstraction data containing several comment fields (255 chars each) Some values with "random" line feeds.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Finding & Eliminating Rogue Hex Characters in Text Fields' - shaman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Finding eliminating rogue hex characters in text fields
Finding & Eliminating Rogue Hex Characters in Text Fields

Martha CoxCancer Outcomes Research Program

CDHA / Dalhousie


The problem
The Problem

Chart abstraction data containing several comment fields (255 chars each)

Some values with "random" line feeds


Patient ID Comments

--------------------------------------------------------------------------------------------

013 Found hyperplastic polyp

--------------------------------------------------------------------------------------------

017 colonscopy performed in Bridewater - showed a large rectal tumor as well as

multiple polyps throughout the colon.

--------------------------------------------------------------------------------------------

028 Pt did not have surgery.

Biopsy from endoscopy came back as moderatley

differentiated adneocarcinoma

--------------------------------------------------------------------------------------------

031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon

and hepatic flexure.

-No evidence of intraluminal tumor at this point.

--------------------------------------------------------------------------------------------

038 report not present.

--------------------------------------------------------------------------------------------

040 office sigmoidscopy done in April, 2003 and was found to be normal.

A second

sigmiodscopy was done in sx.

--------------------------------------------------------------------------------------------

056 colonscopy confirmed the presence of a low-lying carinoma of the rectum.

--------------------------------------------------------------------------------------------

084 lap attempted X 2 but resection could not be carried out.

Most questions N

A for laparatomy.

--------------------------------------------------------------------------------------------

155 had a hemicolectomy

--------------------------------------------------------------------------------------------

157 3

4 tumor above reflection, 1

4 was below reflection

--------------------------------------------------------------------------------------------



Lots of suggestions
Lots of suggestions

  • compress? kcompress?Returns seem to be between words. Compress would smash 2 words together.

  • translate or tranwrd?Should work, but these wouldn't take a hex value for me.

    Besides, which character(s) is the problem?


How to find the bad word

data charlist;

set shrug.sample1

(where=(PATIENT in (28)));

length single singlhex $1;

loopx = length(trim(COMMENT));

do i = 1 to loopx;

single = substr(COMMENT, i, 1);

singlhex = single;

output;

end;

keep single singlhex;

run;

How to find the Bad Word


Patient 28 s comment one char at a time

Obs single singlhex

20 g 67

21 e 65

22 r 72

23 y 79

24 . 2E

25 20

26

0D

27 0A

28

0D

29 0A

30 B 42

31 i 69

Patient 28's comment, one char at a time


Repair program

data shrug.sample2;

set shrug.sample1;

badword = trim('0D'x) || left('0A'x);

goodword = ' ';

COMMENT = tranwrd(COMMENT,

badword,

goodword);

drop badword goodword;

run;

Repair Program


Results
Results

Patient ID Comments

--------------------------------------------------------------------------------------------

013 Found hyperplastic polyp

--------------------------------------------------------------------------------------------

017 colonscopy performed in Bridewater - showed a large rectal tumor as well as

multiple polyps throughout the colon.

--------------------------------------------------------------------------------------------

028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley

differentiated adneocarcinoma

--------------------------------------------------------------------------------------------

031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon

and hepatic flexure. -No evidence of intraluminal tumor at this point.

--------------------------------------------------------------------------------------------

038 report not present.

--------------------------------------------------------------------------------------------

040 office sigmoidscopy done in April, 2003 and was found to be normal. A second

sigmiodscopy was done in sx.

--------------------------------------------------------------------------------------------

056 colonscopy confirmed the presence of a low-lying carinoma of the rectum.

--------------------------------------------------------------------------------------------

084 lap attempted X 2 but resection could not be carried out. Most questions N

A for laparatomy.

--------------------------------------------------------------------------------------------

155 had a hemicolectomy

--------------------------------------------------------------------------------------------

157 3

4 tumor above reflection, 1

4 was below reflection

--------------------------------------------------------------------------------------------


Hmm...

  • Noticed that the breaks seemed to occurring where one might have used a slash (“/”).

  • Working in a VMS batch environment; no Display Manager.

  • Looking at the data via PROC REPORT with “flow” for the comments column.

So, is this a data problem or a reporting problem?



The Answer!

Split character in PROC REPORT

  • not just for column headers

  • also used to split long text values in the body of the report

  • default character is slash


Final results
Final Results

Patient ID Comments

--------------------------------------------------------------------------------------------

013 Found hyperplastic polyp

--------------------------------------------------------------------------------------------

017 colonscopy performed in Bridewater - showed a large rectal tumor as well as

multiple polyps throughout the colon.

--------------------------------------------------------------------------------------------

028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley

differentiated adneocarcinoma

--------------------------------------------------------------------------------------------

031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon

and hepatic flexure. -No evidence of intraluminal tumor at this point.

--------------------------------------------------------------------------------------------

038 report not present.

--------------------------------------------------------------------------------------------

040 office sigmoidscopy done in April, 2003 and was found to be normal. A second

sigmiodscopy was done in sx.

--------------------------------------------------------------------------------------------

056 colonscopy confirmed the presence of a low-lying carinoma of the rectum.

--------------------------------------------------------------------------------------------

084 lap attempted X 2 but resection could not be carried out. Most questions N/A

for laparatomy.

--------------------------------------------------------------------------------------------

155 had a hemicolectomy

--------------------------------------------------------------------------------------------

157 3/4 tumor above reflection, 1/4 was below reflection

--------------------------------------------------------------------------------------------



ad