1 / 35

LIS618 lecture 1

Learn about different search strategies, such as the building block approach, snowballing approach, and successive fraction approach, for effective online information retrieval. Understand how to use Boolean operators and truncation techniques to refine your queries.

eddiev
Download Presentation

LIS618 lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIS618 lecture 1 Thomas Krichel 2004-02-01

  2. structure of talk • Recap on Boolean (aurally) • Before online searching • Working with DIALOG • Overview • Search command • Boolean exercise (on the fly)

  3. before a search I • What is the purpose of the query? • brief overview • comprehensive search • What perspective on the topic is required? • scholarly • technical • business • popular

  4. before search II • What type of information does the patron want? • fulltext • bibliographic • directory • numeric • Are there any known sources? • authors • journals • papers • conferences

  5. before search III • What are the language restrictions? • What, if any, are the cost restrictions? • How current need the data to be? • How much of each record is required?

  6. concept analysis • This is the art/science of taking the topic to search for and develop facets. Example “Internet filtering in Libraries” • Internet filter • Libraries • Controversy not technical issues • We may also need the think about the aim of the search.

  7. search aims • a known needle in a known haystack • a known needle in an unknown haystack • an unknown needle in an unknown haystack • any needle in a haystack • the sharpest needle in a haystack • most of the sharpest needles in a haystack

  8. search aims • all the needles in a haystack • affirmation of no needles in a haystack • things like needles in a haystack • is there a new needle in the haystack • where are the haystacks • needles, haystacks, anything

  9. types of searches • known-item searches • negative searches • selective dissemination of information • topical or subject searches • passage searching, where the user is only interested in part of the item

  10. search strategies I • Building block approach • Do a number of elementary searches • Combine the resulting sets with Boolean operators • This is what I did in the example in the previous lecture • Works only with the Boolean model

  11. search strategies II • Snowballing approach • Start with a very specific query • Think of other term that can be added to get more results • Stop when a reasonable number of results are achieved. • Not sure this really works well in practice.

  12. search strategies III • The successive fraction approach is the opposite of the snowballing approach • First search for a broad concept • Then repeat the query by adding various limiting factors. • Can work well if the IR system allows to repeat and edit queries. • But queries can become unwieldy.

  13. search strategies IV • Most specific facet first • Conduct concept analysis • Look for the most specific facet • Search that first, add others later • Presupposes that you have done a decent concept analysis.

  14. two steps in DIALOG • step one: select databases (aka files) to look at • step two: perform searches on the selected databases • You may wonder why one does not have one single step like in a search engine. Discuss. • today we concentrate on the second step

  15. working on selected files • We assume that we have selected database that we know and we look at the search interface on the selected database. • The database selection process is a bit more complicated, covered next week. • First, let us login and look at the command prompt. • Then we select the first database (file) with the begin command

  16. the ‘begin’ command • As its name suggests, usually the first command. • begin number, number,… • selects files with numbers number • Once they are selected they can be searched. • Now select the ERIC "begin 1" • "Begin 1" can be abbreviated as "b 1"

  17. substeps in the second step • Identify search terms • Use Dialog basic commands to conduct a search • View records online or print the results

  18. the 's' (select) command • Once issued the "begin" command to select a database, we issue the "s" command on the database. • "s query_expression" where query_expression is a query expression. • This will search the index of selected database in full-text view for the query issued • It will not find any of the following: "an and by for from of the to with". They are stop words.

  19. query expression • A query expression contains search terms expressed in special ways • You can truncate search terms. • You can build an elementary expression by putting several keywords together. This is achieved by DIALOG's connectors. • You can combine several expressions with the use of Boolean operators • We will cover this is in turn now.

  20. truncation of terms I • Open Truncation • "select path?" retrieves all words that begin with path: paths, pathos, pathway, pathology • Controlled-Length Truncation • "select path??" retrieves the root and up to two additional characters: paths, pathos

  21. truncation of terms II • Embedded Character truncation can be used for variant spellings: • "select organi?ation" -> organization organisation  • "select fib??board" -> fiberboard fibreboard  • This truncation feature is also useful for searching for unusual plural forms: • "select wom?n" -> woman women • Apparently you can also do prefixes by putting the ? in the beginning. • "?mobile" -> automobile metamobile

  22. use of connectors • Connectors are used to put several words together. • One instance where this is useful is when you have words that on their own mean different things. • For example "mate" is a herbal beverage consumed in South America. Looking for mate on the Internet retrieves a lot of singles' pages.

  23. example: terms related to "mate" What other terms to be used? • matear (drink mate) • matero (mate drinker) • cebar (prepare mate) • cebador (mate preparer) • yerba (mate herb) • bombilla (mate straw)

  24. connectors I • '(W)' requires terms to appear one after the other next to each other e.g. 'yerba(W)mate?' matches "yerba mate". • '(i W)' where i is an integer, means followed by at most i words, e.g. 'ceba?(3W)mate?' matches "cebar un maravilloso mate" but not "cebador guapo mirando un buen mate"

  25. connectors II • '(N)' requires terms to be next to each other e.g. 'yerba(N)mate?' matches "yerba mate" or "mate yerba". • '(i N)' where i is an integer, means proximity by at most i words, e.g. 'ceba?(3N)mate?' matches "cebar mate" or "matear con la cebadora". • '(S)' searches for the occurrence of connected terms in the same paragraph.

  26. using Boolean operators • In your query, you can combine several expressions with Boolean operators • Example: "S LIBRARY(W)SCHOOL? AND DISTANCE(W)EDUCATION" • But I usually do not issue such fancy queries.

  27. executing several searches • There can be several searches done sequentially, and the results sets are saved by the system. • Each time the system assigns a set number, Si, • These can be combined in Boolean expressions, e.g. 's S1 or S2 and S3' • Remember that Boolean operations are set-theoretic!

  28. Boolean operators on sets • When using Booleans, be aware that "and" has higher precedence than "or". • Thus: a or b and c is not the same as (a or b) and c but it is a or (b and c) • Use parenthesis when in doubt

  29. DS (display sets) • This command can be executed any time to review the sets that have been formed since the last B (begin) command. • This can be useful to review your search history.

  30. the target command • "target set" where set is a search result set creates a subset of the "statistically most relevant results" in the original set. • I have not seen details about how this subset is computed. • A new result set is being formed.

  31. display: the type command type set/format/range • set is a result set • format is a format • range can be • start – end • start is a record number to start • end is a record number to end • all

  32. standard delivery formats • 2 -- full record except abstract • 3 or medium – citation • 5 or long – full except full text • 6 or free – title and dialog number • 8 or short – title plus indexing terms • useful to find other indexing terms • 9 or full – everything • KWIC or K – keywords in context

  33. options for delivery • I once tried to email results to me, to no avail • You can save the html of the search results in the browser. • You can print the results within the browser.

  34. http://openlib.org/home/krichel Thank you for your attention!

  35. to do: set up consistent notation

More Related