Loading in 2 Seconds...
Loading in 2 Seconds...
ITEC 4020 M Group 18 Amna Al-Omari Divya Love Norbert Megler Omer Saleem Sachin Uppal Shahla Defileh WEB SEARCH SYSTEM Presentation Overview Brief Overview of Assignment Objective Structure and Functionality Search Demonstration Questions WEB SEARCH SYSTEM Introduction
Brief Overview of Assignment Objective
Structure and Functionality
Our website, can be found at http://unix.aml.yorku.ca:8080/w04_g18/search.jsp
Our Web Search system is based on inverted file indexing using the XML document which has been created by the crawler that was supplied to us. Our site contains 3 main WebPages:
* The main Search page which is built by JSP and contains a text box and 2 buttons (reset and submit).
* The result page which is built by JSP and contains all the hyperlinks for all documents that hold the keyword.
* The display page which is built by Xml and displays the “clicked on” document.
1- java class which will read the given XML file and split it into 1139 separate XML documents.
- We read the XML file using FileInputStream and BufferedReader.
- The file is read one line at a time and each line is compared to the index “<PubmedArticle>” which signals the beginning of a new article.
- Upon detection of word a new XML document file is created
- The file number is kept track off and once the whole article is written into a file, the file counter is incremented by one.
2- create a Temporary (merged) file which goes through the entire 1100 document and identifies:
their document number
3- Next we create the First level index.
- It uses a simple java class which reads from the Temporary file.
- This index includes a counter (which represents the total number of terms), the terms which appear only once, number of the document that includes that specific term, and the total number of frequency of each term; all of this is then written into a text file.
4- Next is the creation of the second level indexing, created by a simple java class which includes counter, term, document number, and frequency.
We are using MVC (model view controller) architecture i.e. Servlet acting as controller, JSP is used for displaying results and Java Bean has the main business logic. Once the user submits the keyword, the search functionality goes through the first level index to find the counter number for that specific term and then matches that counter number with all the XML document number which appears in the second index. Then goes through all the XML documents and grabs all the relevant document title for display.
XSL files take care of this functionality. Once the user clicks on any hyperlink that specific document displays through XML which uses XSL file.