1 / 76

Another Way to Attack the BLOB:

Another Way to Attack the BLOB:. Server-side Access via PL/SQL and Perl. Why Server-side?. Your choice of tools to handle queries and generate reports Complete programmatic control Easier to write complex reports No (well, fewer) limitations Easier to restrict database access to the masses.

buffy
Download Presentation

Another Way to Attack the BLOB:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Another Way to Attack the BLOB: Server-side Access via PL/SQL and Perl

  2. Why Server-side? • Your choice of tools to handle queries and generate reports • Complete programmatic control • Easier to write complex reports • No (well, fewer) limitations • Easier to restrict database access to the masses

  3. Syllabus Brief MARC record review The BLOB Plan of Attack Data Retrieval via PL/SQL Required tools for Perl: getting DBD & DBI Data Retrieval via Perl

  4. Brief MARC record review The BLOB Plan of Attack Data Retrieval via PL/SQL Required tools for Perl: getting DBD & DBI Data Retrieval via Perl

  5. MARC? • MARC is an acronym for MAchine Readable Cataloging

  6. MARC • MARC is an acronym for MAchine Readable Cataloging. • It’s a standard format for storing an item’s data.

  7. MARC • MARC is an acronym for MAchine Readable Cataloging. • It’s a standard format for storing an item’s data. • It’s machine readable, but not so easy for us humans to read.

  8. MARC • MARC is an acronym for MAchine Readable Cataloging. • It’s a standard format for storing an item’s data. • It’s machine readable, but not so easy for us humans to read. • With a bit of practice, a raw MARC record can be parsed by hand.

  9. MARC • MARC is an acronym for MAchine Readable Cataloging. • It’s a standard format for storing an item’s data. • It’s machine readable, but not so easy for us humans to read. • With a bit of practice, a raw MARC record can be parsed by hand. • However, doing so is about as exciting and satisfying as trying to thread a needle one-handed.

  10. A MARC record’s three pieces: • Leader • Directory • Data

  11. Partial view of a MARC record 01551nam 22003738a 4500001001300000003000600013005001700019008004100036 010001700077035001800094040001800112043001200130049003000142050002500172 074000900197082001600206086001700222099001700239100001800256245011000274 260011200384300003800496490005400534500016500588500007500753500003400828 500003900862504005200901650004600953650005000999650004901049710002901098 830005001127ocm10726696 OCoLC19961223115432.0840406s1996 dcuab b f000 0 eng a 84600065 a(GPO)97054409 dGPOdDLCdMvI an-us-az awdoc,sudci3114100999573400aQE611.5.U6bF84 1996 a06 /\/\/\/\/\/\/\/\/\/\/\/\/\/\ skipping part of record here /\/\/\/\/\/\/\/\/\/\/\/\ turalzArizonazMohave County.2 aGeological Survey (U.S.) 0aGeologic al Survey professional paper ;v1266. this is the leader

  12. Partial view of a MARC record 01551nam 22003738a 4500001001300000003000600013005001700019008004100036 010001700077035001800094040001800112043001200130049003000142050002500172 074000900197082001600206086001700222099001700239100001800256245011000274 260011200384300003800496490005400534500016500588500007500753500003400828 500003900862504005200901650004600953650005000999650004901049710002901098 830005001127ocm10726696 OCoLC19961223115432.0840406s1996 dcuab b f000 0 eng a 84600065 a(GPO)97054409 dGPOdDLCdMvI an-us-az awdoc,sudci3114100999573400aQE611.5.U6bF84 1996 a06 /\/\/\/\/\/\/\/\/\/\/\/\/\/\ skipping part of record here /\/\/\/\/\/\/\/\/\/\/\/\ turalzArizonazMohave County.2 aGeological Survey (U.S.) 0aGeologic al Survey professional paper ;v1266. this is the directory

  13. Partial view of a MARC record 01551nam 22003738a 4500001001300000003000600013005001700019008004100036 010001700077035001800094040001800112043001200130049003000142050002500172 074000900197082001600206086001700222099001700239100001800256245011000274 260011200384300003800496490005400534500016500588500007500753500003400828 500003900862504005200901650004600953650005000999650004901049710002901098 830005001127ocm10726696 OCoLC19961223115432.0840406s1996 dcuab b f000 0 eng a 84600065 a(GPO)97054409 dGPOdDLCdMvI an-us-az awdoc,sudci3114100999573400aQE611.5.U6bF84 1996 a06 /\/\/\/\/\/\/\/\/\/\/\/\/\/\ skipping part of record here /\/\/\/\/\/\/\/\/\/\/\/\ turalzArizonazMohave County.2 aGeological Survey (U.S.) 0aGeologic al Survey professional paper ;v1266. this is the data

  14. Dissection of MARC record leader (pertinent details) 01551nam 22003738a 4500001001300000003000600013005001700019008004100036 010001700077035001800094040001800112043001200130049003000142050002500172 074000900197082001600206086001700222099001700239100001800256245011000274 260011200384300003800496490005400534500016500588500007500753500003400828 500003900862504005200901650004600953650005000999650004901049710002901098 830005001127ocm10726696 OCoLC19961223115432.0840406s1996 dcuab b f000 0 eng a 84600065 a(GPO)97054409 dGPOdDLCdMvI an-us-az awdoc,sudci3114100999573400aQE611.5.U6bF84 1996 a06 /\/\/\/\/\/\/\/\/\/\/\/\/\/\ skipping part of record here /\/\/\/\/\/\/\/\/\/\/\/\ turalzArizonazMohave County.2 aGeological Survey (U.S.) 0aGeologic al Survey professional paper ;v1266. data starts at this offset, the base address record length

  15. Dissection of MARC record directory how to parse it 01551nam 22003738a 4500001001300000003000600013005001700019008004100036 010001700077035001800094040001800112043001200130049003000142050002500172 01551nam 22003738a 4500 header 001 0013 00000 003 0006 00013 005 0017 00019 008 0041 00036 010 0017 00077 035 0018 00094 040 0018 00112 etc. tag len offset Each 12-character “triplet” is associated with one field.

  16. Where in the record does a field’s data start? 01551nam 22003738a 4500001001300000003000600013005001700019008004100036 010001700077035001800094040001800112043001200130049003000142050002500172 01551nam 22003738a 4500 header 001 0013 00000 003 0006 00013 005 0017 00019 008 0041 00036 010 0017 00077 035 0018 00094 040 0018 00112 etc. Where a field’s data starts is determined by adding its offset to the base address. Data for the first field, tag 001, begins at position 373, tag 003 begins at 386, tag 005 begins at 392, etc. tag len offset

  17. Partial view of a raw MARC record, data section 01551nam 22003738a 4500001001300000003000600013005001700019008004100036 010001700077035001800094040001800112043001200130049003000142050002500172 074000900197082001600206086001700222099001700239100001800256245011000274 260011200384300003800496490005400534500016500588500007500753500003400828 500003900862504005200901650004600953650005000999650004901049710002901098 830005001127ocm10726696 OCoLC19961223115432.0840406s1996 dcuab b f000 0 eng a 84600065 a(GPO)97054409 dGPOdDLCdMvI an-us-az awdoc,sudci3114100999573400aQE611.5.U6bF84 1996 a06 /\/\/\/\/\/\/\/\/\/\/\/\/\/\ skipping part of record here /\/\/\/\/\/\/\/\/\/\/\/\ turalzArizonazMohave County.2 aGeological Survey (U.S.) 0aGeologic al Survey professional paper ;v1266. The “box characters” below are the MARC format binary separation characters.

  18. Partial view of a raw MARC record, data section 01551nam 22003738a 4500001001300000003000600013005001700019008004100036 010001700077035001800094040001800112043001200130049003000142050002500172 074000900197082001600206086001700222099001700239100001800256245011000274 260011200384300003800496490005400534500016500588500007500753500003400828 500003900862504005200901650004600953650005000999650004901049710002901098 830005001127<TAG>ocm10726696 <TAG>OCoLC<TAG>19961223115432.0<TAG>840406s 1996 dcuab b f000 0 eng <TAG><SUB>a 84600065<TAG><SUB>a( GPO)97054409<TAG><SUB>dGPO<SUB>dDLC<SUB>dMvI<TAG><SUB>an-us-az<TAG> <SUB>awdoc,sudc<SUB>i31141009995734<TAG>00<SUB>aQE611.5.U6<SUB>bF84 199 6<TAG><SUB>a06 /\/\/\/\/\/\/\/\/\/\/\/\/\/\ skipping part of record here /\/\/\/\/\/\/\/\/\/\/\/\ tural<SUB>zArizona<SUB>zYavapai County.<TAG>0<SUB>aGeology, Structural<S UB>zArizona<SUB>zMohave County.<TAG>2 <SUB>aGeological Survey (U.S.)<TAG > 0<SUB>aGeological Survey professional paper ;<SUB>v1266.<TAG><EOR> The MARC format uses the following characters: <TAG> hex 1e tag delimiter <SUB> hex 1f subfield delimiter <EOR> hex 1d end of record indicator

  19. Programmer’s MARC format review • Get the record length from the 1st 5 columns.

  20. Programmer’s MARC format review • Get the record length from the 1st 5 columns. • Get the data base-address from columns 13-17.

  21. Programmer’s MARC format review • Get the record length from the 1st 5 columns. • Get the data base-address from columns 13-17. • Parse through the directory for the desired field by looking at the 1st 3 columns of each tag’s 12-character “triplet”. Get the tag’s length (next 4 columns) and offset (last 5 columns of the “triplet”).

  22. Programmer’s MARC format review • Get the record length from the 1st 5 columns. • Get the data base-address from columns 13-17. • Parse through the directory for the desired field by looking at the 1st 3 columns of each tag’s 12-character “triplet”. Get the tag’s length (next 4 columns) and offset (last 5 columns of the “triplet”). • Read the tag’s data by: Adding the tag’s offset to the record’s base address. Starting at that position, read the tag’s data for tag length columns.

  23. Programmer’s MARC format review • Get the record length from the 1st 5 columns. • Get the data base-address from columns 13-17. • Parse through the directory for the desired field by looking at the 1st 3 columns of each tag’s 12-character “triplet”. Get the tag’s length (next 4 columns) and offset (last 5 columns of the “triplet”). • Read the tag’s data by: Adding the tag’s offset to the record’s base address. Starting at that position, read the tag’s data for tag length columns. • Make sure the position you’re reading from is not beyond the end of the record.

  24. Programmer’s MARC format review Beware of the common “off by 1” error. Depending on the language you’re using, you could be off by 1 in either direction regarding your position within the record. • Get the record length from the 1st 5 columns. • Get the data base-address from columns 13-17. • Parse through the directory for the desired field by looking at the 1st 3 columns of each tag’s 12-character “triplet”. Get tag’s length (next 4 columns) and offset (last 5 columns of the “triplet”). • Read the tag’s data by: Adding the tag’s offset to the record’s base address. Starting at that position, read the tag’s data for tag length columns. • Make sure the position you’re reading from is not beyond the end of the record.

  25. Brief MARC record review The BLOB Plan of Attack Data Retrieval via PL/SQL Required tools for Perl: getting DBD & DBI Data Retrieval via Perl

  26. The BLOB Plan of Attack • Voyager’s BLOB data is stored the same way for the Auth, Bib, and Mfhd data tables. table_data (where “table” is auth, bib, or mfhd) table_id record_segment seqnum

  27. The BLOB Plan of Attack table_data (where “table” is auth, bib, or mfhd) table_id record_segment seqnum A MARC record is typically stored entirely in one row in the table. Longer records which are longer than the record_segment size have to be stored in more than one row.

  28. The BLOB Plan of Attack tabledata (where “table” is auth, bib, or mfhd) table_id record_segment seqnum Each table_id is unique to an item’s record. However, if more than one row makes up a record, we will have duplicate table_ids. In that case, we’ll have seqnum = 1, 2, 3, etc., for that record.

  29. The BLOB Plan of Attack An example of a record contained completely in one row. This record is ready to be processed after extraction from the record_segment.

  30. The BLOB Plan of Attack This longer record is spread across 3 rows. Assemble the MARC record by concatenating MARC data in seqnum order: MARC-record = record_segment<-seqnum1 + record_segment<-seqnum2 + record_segment<-seqnum3 This record is then ready to be processed.

  31. Brief MARC record review The BLOB Plan of Attack Data Retrieval via PL/SQL Required tools for Perl: getting DBD & DBI Data Retrieval via Perl

  32. PL/SQL Example The example code retrieves a few MARC records, and displays them on the screen in human-readable format, along with some diagnostics. (The code examined in the following slides starts on Page 2 of the handout.)

  33. PL/SQL Example Use a cursor to retrieve data Also declare necessary variables in this section

  34. PL/SQL Example Open the cursor and start looping through the rows

  35. PL/SQL Example Get a row from the cursor into the program variables

  36. PL/SQL Example Assemble the marc record. The typical record fits into one row, thus seqnum = 1 and we skip the loop.

  37. PL/SQL Example For a longer, multi-segment record (from an earlier example), we 1st have seqnum=3 & put it into marc. Then we have seqnum=2 and PREPEND that to marc. Last we exit the loop since now seqnum=1 and the last statement here takes care of that.

  38. PL/SQL Example Why go “backwards” in assembling a MARC record? If we predicate the segment-to-marc-record assembly on when the auth_id changes in our loop structure, once it changes we've gone too far and can't go back to get the last segment to completely assemble the now previous record. It’s simpler to predicate looping on seqnum in reverse order because there will always be a seqnum of 1. If there are multiple segments, we'll always end with a seqnum of 1 and still be on the same auth_id and can go on processing the record. This reasoning is not for PL/SQL only, although that is “where” the idea came from.

  39. PL/SQL Example Now that we have a MARC record, let’s get the record length and data base-address. We set our pointer to the start of the directory and start looping through the directory.

  40. PL/SQL Example As we loop through the directory, we read the tag id, its length, and its offset in the data part. The actual tag address where we get the data is the data base-address plus the offset.

  41. PL/SQL Example In the last line here, the subfield indicators (hex 1f = dec 31) are replaced by the vertical bar character “|” for better readability.

  42. PL/SQL Example Along with the subfield indicator character substitution, we add some space formatting to further increase readability. Thus, instead of 0aPetroleumxDrilling fluids we get 0|a Petroleum |x Drilling fluids for tag data.

  43. PL/SQL Example

  44. PL/SQL Example Now we can output the tag’s data. Output is broken into 80 character chunks to get around the 255 character limit of dbms_output and for better readability.

  45. PL/SQL Example We’re done with this tag, so we move on to the next tag in the directory. At the end, close loops and clean up. End looping for directory traversal End looping for cursor Don’t forget that this ending character is required for your PL/SQL code to run!

  46. PL/SQL Example Demo… example.pls

  47. Brief MARC record review The BLOB Plan of Attack Data Retrieval via PL/SQL Required tools for Perl: getting DBD & DBI Data Retrieval via Perl

  48. Additional tools required for Perl to talk to Oracle: • DBI, the generic DataBase Interface software. • DBD, the specific DataBase Driver, for Oracle in our case.

  49. Getting and installing DBI and DBD Point your browser to: http://www.cpan.org/authors/id/TIMB/ Complete the above URL with “DBD-Oracle-1.12.tar.gz” to get DBD software “DBI-1.20.tar.gz” to get DBI software

  50. Getting and installing DBI and DBD gunzip each file. un-tar each file. READ the instructions! Installation takes 4 or 5 steps and requires you to be root. If you don’t have root access, or if you’re uncomfortable doing any of this, seek out your SysAdmin for assistance.

More Related