1 / 21

Subject Analysis: Computer Assisted Indexing

Subject Analysis: Computer Assisted Indexing. Bekele Negeri INIS Unit Nuclear Information Specialist (Adapted from A. Nevyjel’s presentation). 07 – 11 October 2013 Vienna, Austria. Subject Indexing Tools. There are two main INIS products used for indexing: WinFibre and CAI

azia
Download Presentation

Subject Analysis: Computer Assisted Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Subject Analysis:Computer AssistedIndexing Bekele Negeri INIS Unit Nuclear Information Specialist (Adapted from A. Nevyjel’s presentation) 07 – 11 October 2013 Vienna, Austria INIS Training Seminar

  2. Subject Indexing Tools There are two main INIS products used for indexing: WinFibre and CAI • WinFibre – for input preparation both bibliographic and subject indexing • CAI(Computer Assisted Indexing) – for subject classification and indexing INIS/ETDE Thesaurus and INIS Subject Category Codes are incorporated in both. INIS Training Seminar

  3. Indexing with FIBRE INIS Training Seminar

  4. Computer-assisted Indexing - CAI • Kick-off Meeting Jan 2004 • Implementation and Customisation Jun 2004 • Production Indexing from Jun 2004 ongoing • CAI version 1.0 final acceptance Aug 2004 • Tuning of the system from Aug 2004 ongoing • CAI batch processing for Member States Dec 2004 • CAI online from remote for MS Nov 2007 INIS Training Seminar

  5. INIS Training Seminar

  6. CAI Thesaurus Extension • Thesaurus • Valid Descriptors 22,051 • Forbidden Terms 8,675 • Total 30,726 • CAI • Hidden Terms ~35.000  Terminological Knowledge Base INIS Training Seminar

  7. CAI Thesaurus extension “Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. • handled similar to “forbidden terms” with one or more USE relations • CAI internal only • not exported to INIS production system • not exported to FIBRE • not printed in any appearance of the thesaurus • support identification of descriptors in the free text INIS Training Seminar

  8. Hidden Terms: Compounds and Isotopes Descriptor hidden term free text MAGNESIUM BORIDES MgB_2 MgB2 ACETIC ACID C_2H_4O_2 C2H4O2 CESIUM 137 Cesium 137, Cesium-137 "1"3"7cs 137Cs 137 caesium 137 Caesium, 137-Caesium caesium 137 Caesium 137, Caesium-137 137 cesium 137 Cesium, 137-Cesium 137 cs 137 Cs, 137-Cs s 137 Cs 137, Cs-137 cs"1"3"7 Cs137 cs137 Cs137 INIS Training Seminar

  9. Hidden Terms: Elementary Particles and countries Descriptor hidden term free text ELECTRON NEUTRINOS #nu#_e νe MUON NEUTRINOS #nu#_#mu# νμ TAU NEUTRINOS #nu#_#tau# ντ RHO-770 MESONS #rho#-770 ρ-770 OMEGA-782 MESONS #omega#-782 ω-782 Country Names: CAMBODIA kampuchea COTE D'IVOIRE ivory coast GREECE hellas MYANMAR burma THAILAND siam INIS Training Seminar

  10. Hidden Terms: UK/US Spellings Descriptor hidden term A CENTERS a centres ACTIVITY METERS activity metres ANALOG COMPUTERS analogue computers ANESTHESIA anaesthesia ARCHAEOLOGY archeology AUSTRIAN ORGANIZATIONS austrian organisations BALLISTIC MISSILE DEFENSE ballistic missile defence BAYARD-ALPERT GAGES bayard-alpert gauges BEAM ANALYZERS beam analysers BEHAVIOR behaviour CATALOGS catalogues INIS Training Seminar

  11. Hidden Terms: Other Spellings Descriptor hidden term Singular/Plural FUNGI fungus FUNGI funguses G MATRIX g matrices G MATRIX g matrixes Reverse Sequence ATOM-MOLECULE COLLISIONS atom-molecule scattering ATOM-MOLECULE COLLISIONS molecule-atom scattering ATOM-MOLECULE COLLISIONS atom-molecule reactions ATOM-MOLECULE COLLISIONS molecule-atom reactions ATOM-MOLECULE COLLISIONS atom-molecule interactions ATOM-MOLECULE COLLISIONS molecule-atom interactions INIS Training Seminar

  12. Further Improvements necessary • “+” and “-“ signs • K+ KAONS PLUS, KAONS MINUS, POTASSIUM IONS • Case sensitivity • TiN TIN (instead of TITANIUM NITRIDES) • gas  GALLIUM SULFIDES • “…who is the …”  WHO (World Health Organization) • Verbs versus Nouns • “… this leads us to …”  LEAD • “… this leaves it ….”  LEAVES • Homographic terms • Solutions SOLUTIONS or MATHEMATICAL SOLUTIONS • Nuclear Reactions, e.g. 14N(γ,α)10B • Targets • Beams • Reactions INIS Training Seminar

  13. INDEXING PROBLEMS • General terms (energy, physics, materials, uses etc. • Misleading CAI suggestions: • Thesaurus terms: PRODUCTIONandPARTICLE PRODUCTION SOLUTIONandMATHEMATICAL SOLUTION IGNITIONandTHERMONUCLEAR IGNITION WALLS andTHERMONUCLEAR REACTOR WALLS PLANTSandNUCLEAR POWER PLANTS MEMBRANES (classic) andmembrane (in brane theory) COLORandCOLOR MODEL (elementary particle characteristics) TRANSPORT, etc. INIS Training Seminar

  14. INDEXING PROBLEMS • chemical compounds/ case sensitivity/homonyms: INDIUM IONS for “in ions” ASTATINE 200 for at 200oC VISIBLE RADIATION for light (weight) HELIUM 6 for “consisting of 6 He 3 tubes” VISIBLE RADIATION for “light weight” • temperature, pressure, etc. range • abbreviations: TNA for Thermal Neutron Analysis and TRINONYLAMINE MPA for Maximum Permissible Activity MPa (Mega Pascal) INIS Training Seminar

  15. CAI Batch used by China Czech Republic (seldom) Georgia (only in 2012) Germany Iran Uzbekistan Vietnam CAI Online in use by Austria Bulgaria Cuba Israel (registering) Japan Mexico Netherlands (seldom) Uruguay CAI online for Member Statesintroduced in July 2007 CAI online and CAI batch are now regular services for Member States INIS Training Seminar

  16. CAI Batch and Online Processing • Input: MemSt-CC-yymmdd-xxxxxxxxxxx • MemSt is a standard prefix (meaning “member state”) • CC is the country code • yymmdd is the date when the file was generated • xxxxxxxxxxx is any additional identification • Examples • MemSt-AR-041203-thisismytestfile • MemSt-FR-041212-fileidentification INIS Training Seminar

  17. CAI Batch Processing • Output: _MemSt-CC-yymmdd-xxxxxxxxxxx • These files will carry the CAI suggested descriptors in tag 800, preceded by the string ##CAI suggestions##; • Example: • 800^##CAI suggestions##; DESCRIPTOR1; DESCRIPTOR2; DESCRIPTOR3; ……. • sent back to the member state for reviewing INIS Training Seminar

  18. INIS Training Seminar

  19. CAI Batch and Online ProcessingReviewing Process • Delete all suggested descriptors which are too general • Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature ranges,... • nuclear reactions • chemical compounds, alloys, etc. • CAI is cleaning up BT/NTs  clean up BT/NTs from manual additions • Clean up suggestions from homographic terms INIS Training Seminar

  20. CAI Batch and Online ProcessingFinalisation Process CAI batch • When reviewing of the record completed:Delete “##CAI suggestions## “ • When reviewing of all records completed: Submit file to “INIS Input Box” CAI online • When reaching the last record:press “export and exit” button • File goes directly to INIS production system, or if required, sent back to Member State for reviewing INIS Training Seminar

  21. Thank you! INIS Training Seminar

More Related