slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
ITEC 4020 M Group 18 Amna Al-Omari Divya Love Norbert Megler Omer Saleem Sachin Uppal Shahla Defileh PowerPoint Presentation
Download Presentation
ITEC 4020 M Group 18 Amna Al-Omari Divya Love Norbert Megler Omer Saleem Sachin Uppal Shahla Defileh

Loading in 2 Seconds...

  share
play fullscreen
1 / 10
Download Presentation

ITEC 4020 M Group 18 Amna Al-Omari Divya Love Norbert Megler Omer Saleem Sachin Uppal Shahla Defileh - PowerPoint PPT Presentation

betty_james
515 Views
Download Presentation

ITEC 4020 M Group 18 Amna Al-Omari Divya Love Norbert Megler Omer Saleem Sachin Uppal Shahla Defileh

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. ITEC 4020 M Group 18 Amna Al-Omari Divya Love Norbert Megler Omer Saleem Sachin Uppal Shahla Defileh

  2. WEB SEARCH SYSTEM Presentation Overview Brief Overview of Assignment Objective Structure and Functionality Search Demonstration Questions

  3. WEB SEARCH SYSTEM Introduction Our website, can be found at http://unix.aml.yorku.ca:8080/w04_g18/search.jsp Our Web Search system is based on inverted file indexing using the XML document which has been created by the crawler that was supplied to us. Our site contains 3 main WebPages: * The main Search page which is built by JSP and contains a text box and 2 buttons (reset and submit). * The result page which is built by JSP and contains all the hyperlinks for all documents that hold the keyword. * The display page which is built by Xml and displays the “clicked on” document.

  4. WEB SEARCH SYSTEM Logical structure 1- java class which will read the given XML file and split it into 1139 separate XML documents. - We read the XML file using FileInputStream and BufferedReader. - The file is read one line at a time and each line is compared to the index “<PubmedArticle>” which signals the beginning of a new article. - Upon detection of word a new XML document file is created - The file number is kept track off and once the whole article is written into a file, the file counter is incremented by one.

  5. WEB SEARCH SYSTEM CONTINUED 2- create a Temporary (merged) file which goes through the entire 1100 document and identifies: all terms their document number frequency

  6. WEB SEARCH SYSTEM Continued.. 3- Next we create the First level index. - It uses a simple java class which reads from the Temporary file. - This index includes a counter (which represents the total number of terms), the terms which appear only once, number of the document that includes that specific term, and the total number of frequency of each term; all of this is then written into a text file.

  7. Continued 4- Next is the creation of the second level indexing, created by a simple java class which includes counter, term, document number, and frequency.

  8. Searching Functionality We are using MVC (model view controller) architecture i.e. Servlet acting as controller, JSP is used for displaying results and Java Bean has the main business logic. Once the user submits the keyword, the search functionality goes through the first level index to find the counter number for that specific term and then matches that counter number with all the XML document number which appears in the second index. Then goes through all the XML documents and grabs all the relevant document title for display.

  9. Displaying of Results. XSL files take care of this functionality. Once the user clicks on any hyperlink that specific document displays through XML which uses XSL file.

  10. The End Questions? Thank You