1 / 20

Structured Queries and Boolean Operators for Effective Information Retrieval

Learn how to construct structured queries using connectors and Boolean operators to retrieve relevant information from databases. Explore the use of proximity and set handling to refine search results. Understand the importance of database structure and the blue sheet in optimizing search queries.

ftiedemann
Download Presentation

Structured Queries and Boolean Operators for Effective Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIS618 lecture 5 Thomas Krichel 2003-02-26

  2. structure • operations on any file (database) • connectors • booleans • set handling • display • file-specifics • bluesheet • Introduction to structured queries

  3. Use of connectors • Connectors are used to put several words together. • One instance where this is useful is when you have words that on their own mean different things. • For example "mate" is a herbal beverage consumed in South America. Looking for mate on the Internet retrieves a lot of singles' pages.

  4. example: terms related to "mate" What other terms to be used? • matear (drink mate) • matero (mate drinker) • cebar (prepare mate) • cebador (mate preparer) • yerba (mate herb) • bombilla (mate straw)

  5. connectors I • '(W)' requires terms to appear one after the other next to each other e.g. 'yerba(W)mate?' matches "yerba mate". • '(i W)' where i is an integer, means followed by at most i words, e.g. 'ceba?(3W)mate?' matches "cebar un maravilloso mate" but not "cebador guapo mirando un buen mate"

  6. connectors II • '(N)' requires terms to be next to each other e.g. 'yerba(N)mate?' matches "yerba mate" or "mate yerba". • '(i N)' where i is an integer, means proximity by at most i words, e.g. 'ceba?(3N)mate?' matches "cebar mate" or "matear con la cebadora". • '(S)' searches for the occurrence of connected terms in the same paragraph.

  7. using Boolean operators • In your query, you can combine several expressions with Boolean operators • Example: "S LIBRARY(W)SCHOOL? AND DISTANCE(W)EDUCATION" • But I usually do not issue such fancy queries.

  8. executing several searches • there can be several searches done sequentially, and the results sets are saved by the system. • Each time the system assigns a set number, Si, • These can be combined in Boolean expressions, e.g. 's S1 or S2 and S3' • Remember that Boolean operations are set-theoretic!

  9. Boolean operators on sets • when using Booleans, be aware that "and" has higher precedence than "or". • Thus: a or b and c is not the same as (a or b) and c but it is a or (b and c) • use parenthesis when in doubt

  10. DS (display sets) • This command can be executed any time to review the sets that have been formed since the last B (begin) command. • This can be useful to review your search history.

  11. the target command • "target set" where set is a search result set creates a subset of the "statistically most relevant results" in the original set. • I have not seen details about how this subset is computed. • new result set is being formed.

  12. display: the type command type set/format/range • set is a result set • format is a format • range can be • start – end • start is a record number to start • end is a record number to end • all

  13. standard delivery formats • 2 -- full record except abstract • 3 or medium – citation • 5 or long – full except full text • 6 or free – title and dialog number • 8 or short – title plus indexing terms • useful to find other indexing terms • 9 or full – everything • KWIC or K – keywords in context

  14. options for delivery • I once tried to email results to me, to no avail • You can save the html of the search results in the browser. • You can print the results within the browser.

  15. Looking at database structure • Up until now, we have looked at commands that take a full-text view of the database. • Such commands can be executed for every database. • If we want to make more precise queries, we have to take account of database structure.

  16. blue sheet • each database name is linked to a blueish pop-up window called the blue sheet for the database • This is called the bluesheet. • It contains the details of the database.

  17. closer look at the bluesheet • file description • subject coverage (free vocabulary) • format options, lists all formats • by number (internal) • by dialog web format (external, i.e. cross-database) • search options • basic index, i.e. subject contents • additional index, i.e. non-subject

  18. basic vs additional index • the basic index • has information that is relevant to the substantive contents of the data • usually is indexed by word, i.e. connectors are required • the additional index • has data that is not relevant to the substantive matter • usually indexed by phrase, i.e. connectors are not required

  19. search options: basic index • select without qualifiers searches in all fields in the basic index • bluesheet lists field indicators available for a database • also note if field is indexed by word or phrase. proximity searching only works with word indices. when phrases are indexed you don't need proximity indicators

  20. http://openlib.org/home/krichel Thank you for your attention!

More Related