1 / 42

Johnson Graduate School of Management Library Project

Clients: Ken Bolton Lynn Brown Angela K. Horne Don Schneder Doris Smith JGSM Library Reference Team. Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu. Johnson Graduate School of Management Library Project. Since the last presentation….

Download Presentation

Johnson Graduate School of Management Library Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clients: Ken Bolton Lynn Brown Angela K. Horne Don Schneder Doris Smith JGSM Library Reference Team Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu Johnson Graduate School of Management Library Project JGSM Library Project - CS 501

  2. Since the last presentation… Tasks accomplished: • Decided on using PHPDig as the backend • Implemented many functional requirements • Adjusted PHPDig code to improve ranking based on client requirements • Discussed with the client additional functionality to be added to the system JGSM Library Project - CS 501

  3. Presentation Outline • New Requirements / Why PHPDig? • Implemented Functionality • Abstract Display • Advanced Search • Administrative Features • Ranking Adjustments • Task List for Final Milestone (Things to Do) • Demo of Current System JGSM Library Project - CS 501

  4. New Requirements • Boosting • Display Statistics Page • Batch Adding • Search Results Display • Add/Remove Categories JGSM Library Project - CS 501

  5. Why PHPDig? Non-technical • Client prefers using PHP/MySQL since both technologies are on their web server • JGSM Library site has less than 300 HTML pages • A requirement: database • Client involved in decision of continuing with PHPDig • Focus on maintainability and usability JGSM Library Project - CS 501

  6. Why PHPDig? Technical • PHPDig code is relatively short • PHPDig = Open Source = Free to modify • Florida State University, Dept. of Biology • www.bio.fsu.edu/phpdig • The Kiwi Search Engine http://www.linknz.co.nz/ • 123,000+ web sites indexed • Ranking is similar to Lucene since they both use the same ranking algorithm (tf-idf) • PHPDig version 1.8.7 www.phpdig.net JGSM Library Project - CS 501

  7. Implemented Functionality: Abstract Display • Purpose • Users can get a description written by a librarian/administrator • Implementation • Modified PHPDig code to look for an abstract • Added a table to the database: auxiliary • spider_id : int • full_url : string • abstract : string • category : string JGSM Library Project - CS 501

  8. Example of Abstract Display JGSM Library Project - CS 501

  9. Example of Abstract Display (Cont’d) JGSM Library Project - CS 501

  10. Our Current Working Interface • We now have a functional interface which can actually perform searches, and display results. • The interface has evolved from the prototype previously presented, based on feedback from our clients. JGSM Library Project - CS 501

  11. Evolved Interface • Started with the prototype presented for progress report 1 as target design. • One we started working with PhpDig’s template system, made some slight changes to the original target interface due to the reality of what PhpDig can handle. JGSM Library Project - CS 501

  12. Evolved Interface JGSM Library Project - CS 501

  13. Evolved Interface • After presenting this design to our clients and discussing possible alternatives, we jointly came up with the current working design: JGSM Library Project - CS 501

  14. Our Current Working Interface: Advanced Search JGSM Library Project - CS 501

  15. Our Current Working Interface: Search Results JGSM Library Project - CS 501

  16. How We Implemented the Interface • PhpDig uses a template system • Allow us to write HTML code for the search page, and use special PhpDig tags to generate form controls, results, etc., within that page JGSM Library Project - CS 501

  17. How We Implemented the Interface • Some problems came up during this process: • Problem: Some of the static HTML generated automatically by PhpDig tags to produce the search form does not match our desired style. • Solution: We do not depend on PhpDig to generate all of the form HTML, some is hand-coded by us to match our style JGSM Library Project - CS 501

  18. How We Implemented The Interface • Some problems arose during this process: • Problem: Some of the dynamic HTML generated by PhpDig tags also does not match our style. • Solution: We cannot hand-code this HTML (category drop-down, etc.), so we modified the PhpDig source code which is called in response to these tags so that the generated HTML matches our desired style. JGSM Library Project - CS 501

  19. Where To Go From Here • Based on future discussions with our client, we will continue to refine the interface towards an ideal goal. • More source-level changes to PhpDig to get the details right • Example: Context currently cuts off words in the middle JGSM Library Project - CS 501

  20. Administrative Features Implemented: • Add a page • Options: abstract & category • Remove a page from database • Update a page in database • Options: update abstract & category • Content is re-indexed JGSM Library Project - CS 501

  21. Administrative Features To be Implemented: • Manual ranking abilities • Give a page more weight overall • Give a page more weight for certain words • Feedback • Kerberos authentication JGSM Library Project - CS 501

  22. Administrative Features To be Implemented: (continued) • Display statistics • Statistics useful to the administrators, such as most frequent searches, searches with no results, etc • Batch adding of pages • Category Administration JGSM Library Project - CS 501

  23. Ranking • Improved from before, mostly complete • Formula similar to Lucene default now: • Our formula: JGSM Library Project - CS 501

  24. coord function • coord(): q is the # of query terms matched in document Q is # terms in query • only relevant in search for “any of the terms” JGSM Library Project - CS 501

  25. Current Progress Completed: • Ranking implementation complete Left to do: • Admin Panel to modify boosted pages/words • Uses boost, but need to finalize how to modify boosting parameter JGSM Library Project - CS 501

  26. Boosting Methods • Two possibilities: 1. Admin modifies score of page relative to current score. 2. Specify position a page should appear given a one-term query. JGSM Library Project - CS 501

  27. Pros and Cons • Method 1: Modify relative to current score + More careful manipulation of score possible + Faster to code, more time to test - More difficult to use • Method 2: modify rank + Easier to use - Adjustments only possible on one-word queries JGSM Library Project - CS 501

  28. Task List for Final Milestone • Feedback • Confirmations and errors will be adjusted to display the message on the administrative page to improve usability. JGSM Library Project - CS 501

  29. Display stats page • Links for the relevant log pages will be added to the main administration page. JGSM Library Project - CS 501

  30. Batch adding • To facilitate the indexing process, we will add batch adding feature to the main administration page. JGSM Library Project - CS 501

  31. Adjust search results display • The page description will have no cut off words and that the client is satisfied with the search results interface. JGSM Library Project - CS 501

  32. Limit by category • Search by category will be implemented. JGSM Library Project - CS 501

  33. Administrative function to add and remove categories • Adding and removing categories will be implemented and linked to the administrative page. JGSM Library Project - CS 501

  34. Administrative function to weight ranking • Manual ranking adjustments will be added so that the client would be fully satisfied with the search results. JGSM Library Project - CS 501

  35. Authentication • Access to the administration page will use Cornell University’s Web Authentication (CUWebAuth) for authentication. JGSM Library Project - CS 501

  36. Unit Testing and Integration Testing • Every unit that is implemented will be fully unit tested on our own computers, and also integrated into the rest of the code for integration testing. JGSM Library Project - CS 501

  37. Installation and Refinement • The installation of the final system will take place early before the next milestone in order to avoid any delay. • This time period is reserved for any last minute minor changes to the system to ensure the client’s satisfaction. JGSM Library Project - CS 501

  38. Documentation and Training Slides • Our final milestone includes a detailed documentation of the project, training slides and an informal training session to help administrators to learn the control of the system. JGSM Library Project - CS 501

  39. Deployment • After careful testing and feedback, the search system will go live. JGSM Library Project - CS 501

  40. Timeline JGSM Library Project - CS 501

  41. Demo… JGSM Library Project - CS 501

  42. The End. • Questions? • Comments? JGSM Library Project - CS 501

More Related