1 / 26

University of Palestine

University of Palestine. Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester 2008-2009. CHAPTER 4. QUERY LANGUAGES. What is a Query?. Query is a representation of the user’s information needs. It is composed of keywords and documents containing such keywords are searched for.

kemp
Download Presentation

University of Palestine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2nd Semester 2008-2009

  2. CHAPTER 4 QUERY LANGUAGES

  3. What is a Query? • Query is a representation of the user’s information needs. • It is composed of keywords and documents containing such keywords are searched for. • A query may not represent the information needs exactly because: • Information needs are difficult to describe ( semantic difficulty) • Query must be in a format acceptable to the retrieval system ( syntactic difficulty) • A Query can be a word or combination of several words

  4. Types of Queries • Single-Word Queries. • Context Queries. • Boolean Queries. • Natural Language Queries.

  5. Single-Word Queries • The elementary query in a text retrieval system is a word. • A word is a sequence of letters surrounded by separators. • Where the alphabet is split into ‘letters’ and ‘separators’. • The choice of what is a letter and what is a separator left to the manager of the text database

  6. Single-Word Queries Cont… Example: - The word ‘On-line’: The hyphen is not a letter but do not split a word.

  7. Single-Word Queries Cont… • The division of the text into words is not arbitrary, because of that, many models (i.e. the vector model) are completely structured on the concept of the words, and the words are the only type of queries allowed. • The result of the word queries is the set of documents containing at least one of the words in the query, and the resulting documents are ranked according to a degree of similarity to the query.

  8. Context Queries • Many systems have the ability to search words in a given context, that is near other words. • For Example: Network is relevant to computer or computer science.

  9. Types of Context Queries • Phrase: - • Is a sequence of single-word queries. • Example: search for ‘enhance’, and then search for ‘retrieval’. • It is understood that the separators in the text need not be the same as those in the query (e.g. two spaces versus one space). • The previous example could match a text as ‘…enhance the retrieval…’.

  10. Types of Context Queries Cont... • Proximity: - • The sequence of single words or phrases is given together with a maximum allowed distance between them. • For example: • the query ‘enhance retrieval’ should occur within four words, and then a match could be ‘… enhance the power of retrieval …’. • The distance may be measured in characters or words depending on the system.

  11. Types of Context Queries Cont… • The words and phrases may or may not appear in the same order as in the query. • Proximity queries can be ranked in the same way if the ranking technique does not depend on physical proximity. This is because the proximity means that the words are in the same paragraph.

  12. Boolean Queries • Composed of: • Atoms (i.e., basic queries) that retrieve documents, and • Boolean operators which work on their operands (set of documents) and deliver set of documents (determinants). • A Query Syntax Tree used to represent Boolean Queries. Where leaves correspond to the basic queries, and the internal nodes to the operators.

  13. Boolean Queries Cont. Example of Query Syntax Tree This query will retrieve all the documents whish contain the word ‘translation’ as well as either the word ‘ syntax’ or the word ‘syntactic’

  14. Boolean Queries Cont… • With Boolean systems, no ranking of the retrieved documents is provided, a document either satisfies the Boolean query or it does not. • No partial matching between a document and a user query.

  15. Natural Language • It is a user language query (spoken English, Arabic, or French, etc.). • The query is an enumeration of words and context queries. • All the documents matching a portion of the user query are retrieved.

  16. Natural Language Cont… • Higher ranking is assigned to those documents matching more parts of the query. • A threshold may be selected so that the documents with very low weights are not retrieved. • Boolean queries are a simplified abstraction of natural language queries.

  17. Pattern Matching • A pattern is a set of syntactic features that must occur in a text segment. • Patterns allows the retrieval of pieces of text that have some property. • The segments that satisfies the pattern are the ‘match’ pattern.

  18. Pattern Matching Cont… Types of patterns: • Words: • A string (sequence of characters). e.g.: ‘computer’, ‘space’, etc. . • Prefixes: • A string which must form the beginning of a word. Given the prefix ‘comput’ all the documents containing words such as: ‘computer’, ‘computing’ , etc. are retrieved.

  19. Pattern Matching Cont... • Suffixes: • A string which must form the termination of a word. Given the suffix ‘ters’ all the documents containing words such as: ‘computers’, ‘painters’, etc. are retrieved. • Substrings: • A string which can appear within a word. Given the substring ‘tal’ all the documents containing words such as: ‘coastal’, ‘talk’, etc. are retrieved.

  20. Pattern Matching Cont... • Ranges: A pair of strings which matches any word lying between them in lexicographical order. For example: The range between words ’held’ and ‘hold’ will retrieve strings such as ‘hoax’ and ‘hissing’.

  21. Pattern Matching Cont... • Allowing Errors: • a word together with an error threshold. This search pattern retrieves all words which are ‘similar’ to the given word. • Errors come from typing, spelling, or from OCR software. • The query should try to retrieve the given word and what are likely to be its erroneous variant.

  22. Pattern Matching Cont... • The similarity model used is the Levenshtien distance or edit distance. • The edit distance between two strings is the minimum number of characters insertions, deletions, and replacements needed to make them equal. • Example: the query word ‘flo wer’ is at distance 1 from ‘flower’.

  23. Pattern Matching Cont... • Regular Expressions: • Is a general pattern built up by simple strings and the following operators: • Union • Concatenation • Repetition • Example: The query: ‘pro (blem | tein) (s | ε) (0 | 1 | 2)*’ will match words such as: ‘problem02’ and ‘proteins’.

  24. Pattern Matching Cont... • Extended Patterns (EP): • Are subsets of the regular expressions which are expressed with a simple syntax. • The retrieval system converts the EP into regular expressions, or search them with a specific algorithm. • Each system supports its own EPs.

  25. Pattern Matching Cont... Examples (EP): • Classes of characters • Case-insensitive matching. • Ranges of characters. • Conditional expressions • i.e., a part of the pattern may or may not appear.

  26. Pattern Matching Cont... • Wild characters • Match any sequence in the text.e.g.: ret* • Combinations that allow some parts of the pattern to match exactly and other parts with errors.

More Related