Types structures of information resources
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

Types & structures of information resources PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Types & structures of information resources. What is out there for searching ? What’s under the hood? essential knowledge for searchers [email protected] ; http://comminfo.rutgers.edu/~tefko/. Central ideas As a searcher you start with knowing:. Information resources. Their organization .

Download Presentation

Types & structures of information resources

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Types structures of information resources

Types & structures of information resources

What is out there for searching ? What’s under the hood?

essential knowledge for searchers

[email protected]; http://comminfo.rutgers.edu/~tefko/

Tefko Saracevic


Central ideas as a searcher you start with knowing

Central ideasAs a searcher you start with knowing:

Information resources

Their organization

How structured, prepared

indexed, classified, tagged, labeled, abstracted, full text treated … … …

stored

made accessible

All in laying the ground for searching

Knowing what is under the hood

  • What is out there available for searching

  • And there is a LOT!

  • In this lecture & course we will explore a sample only

    • to illustrate

      • from which you can generalize

      • and explore later more fully in other courses or professionally

Content

Structure

Tefko Saracevic


Types structures of information resources

ToC

Definitions & terminology

Examples of vendors

Structure of records in databases

Indexes – as used in searching

Conclusion

Tefko Saracevic


A few concepts that we are familiar with but still worth revisiting

1. Definitions & terminology

A few concepts that we are familiar with, but still worth revisiting

Tefko Saracevic


Definitions

Definitions

Resource:

Database( from Webopedia)

A collection of information organized in such a way that a computer program can quickly select desired pieces of data. You can think of a database as an electronic filing system.

source of help: somebody or something that is a source of help or information

Generic: A broad range of sources of information in a variety of formats

The data and information assets of an organization , incl. a library

Databases, files, systems containing organized information records

  • Dialog, Google are inf. resources

Information resource:

Tefko Saracevic


Definitions cont

Definitions (cont.)

From Webopedia again:

Traditional databases are organized by fields, records, and files.

  • A field is a single piece of information

  • A record is one complete set of fields

  • And a file is a collection of records.

  • E.g. a telephone book is analogous to a file. It contains a list of records, each of which consists of three fields: name, address, and telephone number

  • A catalog is a file. It contains a list of records (catalog entries) describing books in a library . Each record has fields, such as author, title, publisher, date, subject headings ….

Tefko Saracevic


On fields for searching

On fields for searching

  • Records (documents , objects) used in information resources are always organized in fields

    • but different resources may and do use different set of fields

    • metadata provides information ABOUT a record; used for instance in Web records; always organized in fields

  • Indexes used in searching are organized, divided by fields

  • Fields serve to guide, point out, or otherwise facilitate searching

  • Searching is automatically always done by fields, even if one does not know that or has no idea of fields

  • But more about fields later

Tefko Saracevic


Who provides inf resources for searching

Who provides inf. resources for searching?

  • Terminology as to who & what can be confusing & not consisted - so beware & do your own translation

    • Provider: aproducerofdatabases; there are great many providers covering many fields

      • e.g. Dept. of Education produces ERIC – a database of abstracts & indexes of educational materials (articles, reports)

    • Vendors or aggregators: organizations or companies that get databases from providers or set of sources like journals from publishers & organize them for searching; there is a large number of vendors

    • some providers are their own vendors:

      • e.g. Chemical Abstract runs STN (Scientific & Technical Network)

Tefko Saracevic


Illustrates the ever changing information industry

2. Examples of vendors

Illustrates the ever changing information industry

Tefko Saracevic


Example of a vendor

Example of a vendor:

  • Dialog is oldest on the market – started in 1972

  • Acquires databases from information providers

    • it has over 900 databases

  • Organizes content according to uniform structures

  • Describes the content

    • done in Bluesheets

      • a most important search tool for you!

  • Provides uniform & complex searching capabilities

    • geared toward professionals

      • you have to master them for effective searching

  • Creates some own files

    • e.g super indexes as Dialindex

  • Access

    • mostly through libraries & companies as subscribers

    • RUL does not have it, but in class free access

Tefko Saracevic


Story of dialog illustrative of turbulences in inf industry

Story of Dialogillustrative of turbulences in inf. industry

1964 Roger Summit started Information Sciences Laboratory at Lockheed Missile & Space Company

  • in the 1960’s developed Recon – online system for NASA (government contract)

    1972 Summit convinced Lockheed of online commercial potential & it went public as Dialog

  • advent of online information industry

    1981 became subsidiary of Lockheed

  • moved to Palo Alto, CA

    1989, the company was sold to Knight-Ridder - had other inf. resources

  • incorporated DataStar, a European online company with 350 mostly European oriented databases - still there

1997 Dialog was bought by the U.K.-based M.A.I.D. Corp.

  • moved to Cary, N.C. – still there

    2000 The Thomson Corporation (now ThompsonReuters) acquired Dialog

  • in 1992 Thomson bought ISI with citation indexes that became Web of Knowledge incl. Web of Science

  • Dialog was bought by ProQuest

    • ProQuest was Bell & Howell, also UMI, also University Microfilm …

    • has many inf. products & services

    • among them CSA another online vendor with over 100 databases

      Still in business!

  • Tefko Saracevic


    Btw why do we still teach dialog

    BTW – why do we still teach Dialog?

    • Dialog is a legacy database – grandady

      • some call it a dinosaurs

    • So why do we use Dialog for exercises?

    • Several reasons:

      • oldest and largest surviving vendor

      • by far has a most comprehensive set of databases

      • has a well developed instructional program

    But most importantly:

    • serves as a good test bed to develop searching skills that are generalizable

      • learning what is under the hood of all databases

    • what you will systematically learn from using Dialog can be translated to all searching

      • & you get an insight into problems with searching

    Tefko Saracevic


    Newest large database

    Newest large database:

    • Scopus started in 2004 by Elsevier – a HUGE publisher

    • Very different from Dialog

      • integrates over 17,000 journals & other materials (has no separate databases, but could be searched by broad fields, type of materials, etc.)

    • Indexes all (or takes existing indexing for some)

    • Elsevier also has

      • Scirus – free science search engine

      • ScienceDirect– journals’ full texts, available on RUL, Indexes and databases

    • Provides intuitive searching

      • geared toward end user

      • also provides various other capabilities e.g. citation tracking

    • Most subscribers libraries & companies

      • but through them access to end users

        • RUL was subscribed, but dropped

        • in class you have free access

    • Major competition to Web of Science(RUL has it)

    Tefko Saracevic


    Types of information databases

    Many types are available:

    Bibliographic

    Numeric

    Full text

    Directory

    Image

    still, film, video

    Sound

    spoken word, music

    Multimedia

    Real time

    Some that are in Dialog are also available elsewhere or on their own

    Some vendors have exclusive right to some databases

    Many you find in RUL

    Types of information databases

    Tefko Saracevic


    Other vendors aggregators sample from rul 275 databases links require rul login

    Other vendors/aggregatorssample from RUL 275 databases; links require RUL login

    Various disciplines or areas

    Particularly related to LIS

    ACM Digital Library

    ASIST Digital Library

    Computing Reviews

    IEEE Xplore

    Library, Information Science & Technology Abstracts (LISTA)

    Library and Information Science Abstracts (LISA)

    Library Literature and Information Science

    Professional Development Collection

    Resources for College Libraries (RCL)

    Agricola

    America: History and Life

    Business and Industry Database

    Dissertations and Theses

    Education Index/Abstracts/Full Text

    Factiva

    Hispanic-American Periodicals Index

    LexisNexis Academic

    Medline

    Oceanic Abstracts

    Pollution Abstracts

    Women's Studies International

    Tefko Saracevic


    A big big problem

    a BIG, BIGproblem

    • In Dialog & some other vendors you can search a number of databases at the same time

      • so called federated searching

        • in Dialog using file 411, Dialindex (get it: 411 … )

    • In Scopus you search the whole thing – if you wish

    • BUT in RUL & elsewhere there is no federated searching

      • you have to search each database separately

      • at RUL through Searchlight you can search 8 databases

        • others you have to search one at the time

      • someday there will be federated searching, but at present do not hold your breath

    Tefko Saracevic


    As would imagine

    as would imagine …

    Tefko Saracevic


    Describing organizing nature of content

    3. Structure of records in databases

    Describing & organizing nature of content

    © Tefko Saracevic


    Now unto structures getting under the hood

    Now unto structures – getting under the hood

    • Databases structure own records – documents, objects …

      • why? to describe various parts of content for computers to recognize – these are fields, as mentioned

        • you can recognize that a section of a document is a title, but a computer has to be told that a title is a title

          • so that it can (among others) search for terms in a title when you request so

    • Fields in records are labeled as to content or function

      • most fields in databases indicate the same content

        • e.g. title, author, index terms, abstract, text parts, source, …

      • but various databases do it in their own way

        • in whatever convoluted way they do it, it is not that hard to decipher

    Tefko Saracevic


    Labeling schemes

    Labeling schemes

    • Many structure schemes were developed that prescribed what to label & what to call the label – meta languages

      • by providers, vendors, organizations, authorities

      • in different subjects, domains

      • for different types of objects

    • Meta tags are used on the web – to describe & index

      • semantic web is in development, to further enable description of and searching for meaning

    • MARC is a form of meta language

    • To use these schemes for effective searching you have no choice but to get familiar

    Tefko Saracevic


    Transparency of structures

    Transparency of structures

    • In some databases description of structure is readily available

      • even though it may look forbidding, complicated

        • good example: Bluesheets in Dialog

        • search fields in Scopus

    • In others, structure is there but has to be discovered by surmising

      • even in and particularly in

    • But clever, appropriate use of structure in searching is key to effective searching

    Tefko Saracevic


    Example dialog file 438 bluesheet

    Example: Dialog file 438 Bluesheet

    Describes the content of the file

    © Tefko Saracevic


    File 438 record fields each field is searchable e g ti title au author so source jn journal

    SAMPLE RECORD [top]

    file 438 record & fields- each field is searchablee.g. /TI=title; AU=author; SO=source; JN=Journal; …

    Indicates field & abbreviation

    Tefko Saracevic


    Organization of indexes in dialog it has two kinds of indexes

    Organization of indexes in Dialogit has two kinds of indexes

    • Dialog has a Basic Index – searched by default

    • Entering a command s (or select) digital and libraries

      • finds all documents that have the term digital and the term libraries anywhere in the document

      • s digital and libraries/TI finds documents that have these terms in the title

    • Dialog has also Additional Indexes

      • these are for Authors (AU), Sources (SO) , Publication Years (PY) … & many more

      • searched as s (or select) digital and libraries and AU=Saracevic

    All other databases have similar arrangements as to indexes, but are not that clearly visible as in Dialog, but are searchable in selections

    Tefko Saracevic


    File 438 searching in basic index it is searched by default

    file 438: searching in Basic Index - it is searched by default

    Examples how to search in basic index by words & other fields

    S means select command; W means with – terms next to each other in that order

    Tefko Saracevic


    File 438 fields in additional indexes

    Additional index is searched by indicating the field to be searched – examples how to search them

    file 438: fields in Additional Indexes

    Neat trick:

    If you want to search the latest update only, add to search UD=9999

    Tefko Saracevic


    File 438 fields in limit

    file 438: fields inLimit

    Searches can be limited to cover documents with given attributes – examples how to limit searches

    S2 means set 2 as retrieved previously

    Tefko Saracevic


    File 438 additional uses of structure

    file 438: additionaluses of structure

    Results can be sorted or ranked by given fields – examples how to sort or rank results

    Tefko Saracevic


    File 438 options in displaying of results

    file 438: options in displaying of results

    Results can be displayed & then printed in a number of ways – examples of available formats

    But watch out! In real life some formats are free other cost $$$$!

    Tefko Saracevic


    Economics tail that wags the whole dog

    Economics – tail that wags the whole dog

    • In class Dialog searching is free

      • & you can use it for class exercises & learning

    • In real life Dialog(as every other vendor)has an elaborate economic structure

      • different files have different price tags for use

      • time of use is calculated in DialUnits

        • a Byzantine structure of charges - it is beyond understanding

      • in different files different formats have different price attached

        • full formats in some files are really hefty!

    Tefko Saracevic


    Where to find all about structure

    Where to find all about structure?

    • In Dialog in BlueSheets (file 415)

      • consult often! and again! and again! and again!

      • files have similarities and differences in structure – BlueSheets show that

    • For other vendors:

      • some have similar description as BlueSheets

      • some indicate fields that can be searched

        • it shows structure

      • in some revelation comes from checking what is available in advanced searching or in tips for searching

      • in some structure has to be surmised

    Tefko Saracevic


    Structure in search engines databases

    Structure in search engines & databases

    • Mostly not readily apparent

      • but all have capabilities to be used in searching

    • Again: revelation comes from checking what is available in Advanced Search, Search Features, Search Tips, Help, & the like

    • Most users do NOT take advantage of using available structures in searching

      • professional searchers do

        • part of their tool kit & competencies

    Tefko Saracevic


    Example structure from advanced search

    Example: structure from Advanced Search

    Records are structured & can be searched by these fields & topics

    More fields available

    © Tefko Saracevic


    Example of structure from scopus features

    Example of structure from Scopus(features)

    More choices

    More fields available

    Records are structured & can be searched by additional 10 or so pull down fields

    Subjects areas choices

    © Tefko Saracevic


    Example of structure from library literature information science full text at rul

    Example of structure fromLibrary Literature & Information Science Full Text (at RUL)

    Records are structured & can be searched by additional 20 or so pull down fields

    More fields available

    © Tefko Saracevic


    Similarities differences

    All vendors & search engines have basic & advanced Boolean-type search capabilities

    but how it is done & bells and whistles differ

    once you master concepts you can then do an AHA! when you encounter a variation & then translate

    Many vendors & search engines have advanced search features

    many above & beyond Boolean

    Similarities & differences

    • All vendors rank output results

      • but how it is done differs

      • by default most (Dialog, Scopus & most others) use LIFO – Last in First Out

        • but also allow for a number of other ways. e.g. by source

    • Search engines use ranking by relevance, clustering, PageRank & other criteria

      • proprietary – they do not tell you about it - not easy to discern

    Tefko Saracevic


    Similarities differences1

    Similarities & differences …

    • Most users

      • do not know or care about structure

      • do not search beyond default capabilities

      • do not look beyond one or two pages of results

      • miss many potentially relevant results

      • do not know what is under the hood

      • can’t do advanced – more sophisticated – searching

    • Professional searchers

      • know that structure is very much connected to searching

      • learn about & use available structures

      • understand defaults & use advanced capabilities as necessary

      • know “tricks” for not missing stuff or not getting to much or to much junk

      • explore in order to learn what is under the hood

    Tefko Saracevic


    As used in searching

    4. Indexes

    As used in searching

    Tefko Saracevic


    We all know what an index is but to refresh

    We all know what an index isbut to refresh

    An index is a list of words and associated pointers to where those words can be found in a document

    Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval

    - example of automatic indexing

    • Many kinds of indexes e.g.

      • back of the book index, alphabetical , subject, classified, faceted, …

    • As to creation:

      • manual, automatic,

      • today trend is toward automatic creation of indexes

        • by means of computer algorithms to select words or phrases to identify content

    Here we deal with index structures & in next lecture we deal with indexing vocabularies

    Tefko Saracevic


    Inverted indexes

    Inverted indexes

    • All databases have some kind of inverted index

      • searching is done through them

        Inverted index:

          An index containing terms, as keys, mapped to references to the documents they appear in. The index is sorted by its keys. “Inverted” means that the documents are found by matching on terms, rather than the other way around.

        From Apple Glossary

    • End of the book index is an inverted index

    • First inverted indexes were made in 12th century

      • concordance of the Bible

        • a concordance is an alphabetical list of the principal words used in a book or body of work, with their position indicated as immediate context

    • In contrast, sequential index is a full index for each document – one by one

    Tefko Saracevic


    Making searching of inverted indexes

    Making & searching of inverted indexes

    • Inverted indexes can be made from regular sequential indexes for every document

    • But also from regular texts

      • abstracts and full texts

    • Automatic indexes are made from texts – now easily

      • following given algorithms

      • omitting “stop” words

        • Dialog has 9: AN, FOR, THE, AND, FROM, TO, BY, OF, WITH

    • Searching is then done on the inverted index

      • so it is useful to understand the structure

        • for a document every word is identified as where it appears in text

        • search looks for appearance

          e.g. if “digital” is in position 8 in sentence 10 & “library” is in position 9 in sentence 10 , then in a search is for “digital library” the algorithm looks what positions of terms “digital” & “library” is next to each other in same sentence, finds them & retrieves them as hit

    Tefko Saracevic


    Inverted indexes1

    Inverted indexes

    Useful to know how they function to understand search & retrieval. Steps:

    • Each document is indexed

      • every word in a document is taken as index term with exception of stop words

      • position in text is noted

    • Indexes for all documents are merged

      • index terms are arranged alphabetically in the bowel of the system

        • under each index term are document numbers in which it appears & position in text for that document

    Tefko Saracevic


    Types structures of information resources

    Example on creating an inverted index (from Walker & Janes, 1999)Four documents: 101, 102, 103, 104Fields: TI=Bold; AB=text; DE=descriptor

    Tefko Saracevic


    Types structures of information resources

    Terms for each document – after stop words eliminated

    © Tefko Saracevic


    Types structures of information resources

    Inverted index – a few last terms after letter R are missing, no space on page

    Doc no.

    Field

    Position

    Terms

    © Tefko Saracevic


    In conclusion

    In conclusion

    Searching is more art than science, but an art that needs a lot of knowledge what is behind it

    Tefko Saracevic


    Types structures of information resources

    Thanks

    Tefko Saracevic


    You can do it

    You can do it!

    Try!

    just start moving your mouse on the empty page serving as canvas

    Tefko Saracevic


  • Login