1 / 10

Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval

Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval. Instructor Mr. Gautam Das University of Texas at Arlington Email: gdas@cse.uta.edu. Database. IR. Data Collection of Documents { Unstructured piece of information }

ccarolyn
Download Presentation

Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr. Gautam Das University of Texas at Arlington Email: gdas@cse.uta.edu Basics of Databases and Information Retrieval

  2. Database IR • Data • Collection of Documents { Unstructured piece of information } • Follows Rank and Relevance query model • Output is the document • Consist of Schema • Relational Model • Data stored in form of tables • Follow typical Query Model and Joins • Output in form of tuples which are made of joins from one or more tables Basics of Databases and Information Retrieval

  3. Types of Queries • Conjunctive Queries { Car , Accident } Will search for the word either “Car” or “Accident”. • General Boolean Queries { Car + Accident – Arlington } Will Search for words “Car” and “Accident” but should not have word “Arlington”. Basics of Databases and Information Retrieval

  4. Retrieval Models of IR • Boolean Retrieval Model • Ranked / Relevance Retrieval Model { One which is missing in databases } Basics of Databases and Information Retrieval

  5. Parameters Used for Ranking in Typical Information Retrieval System • Parameter 1 • Occurrence and Frequency • The number of times the specified word occurs in the document decides the rank • The position it occurs at e.g. Title, Sub Title. Basics of Databases and Information Retrieval

  6. Parameters Used for Ranking in Typical Information Retrieval System • Parameter 2 • Proximity • If two or more words are specified in the search string then the documents containing those words near to each other should be ranked higher. Basics of Databases and Information Retrieval

  7. Parameters Used for Ranking in Typical Information Retrieval System • Parameter 3 • Stemming • Uses various verbal forms of word for seraching. • E.g. Run => Ran, Run over, Running • Exact match of word should be ranked higher • E.g. If the word “info” is searched then the document containing word “infotech” should be ranked after the document containing exact match as “info”. Basics of Databases and Information Retrieval

  8. Parameters Used for Ranking in Typical Information Retrieval System • Parameter 4 • Frequency across Documents • The words like a, an, the etc. should be suppressed as more probability is that those are irrelevant as far as searching criteria is concerned. • If we are searching for ‘Microsoft Corporation’ then the specific word “Microsoft” is more important than the general word “Corporation” Basics of Databases and Information Retrieval

  9. Parameters Used for Ranking in Typical Information Retrieval System • Parameter 5 • Page Access Frequency • If the page is accessed more number of times i.e. If the page is popular then it should be ranked higher • This kind of ranking requires to maintain log about the frequency of page access Useful in case of systems which store News, Stories or readable articles. Basics of Databases and Information Retrieval

  10. Parameters Used for Ranking in Typical Information Retrieval System • Parameter 6 • Number of In-Links to the Page • It is the number of times other pages on web are having links to the page be ranked. • Again a parameter for deciding the popularity of a page. Basics of Databases and Information Retrieval

More Related