1 / 44

Network software system laboratory

R e a l - T i m e S e a r c h E n g i n e. Network software system laboratory. Rana Shahout & Ibrahim Baransi  supervisor  :           Edward Bortnikov Winter 2011. Agenda. The problem & motivation Background in search systems  The architecture CIP policies Software design. What?.

edith
Download Presentation

Network software system laboratory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-TimeSearchEngine Network software system laboratory Rana Shahout & Ibrahim Baransi  supervisor :           Edward Bortnikov Winter 2011

  2. Agenda • The problem & motivation • Background in search systems  • The architecture • CIP policies • Software design

  3. What? What is the project goal? Serving fresh search results when the data is constantly changing Nowadays websites changes in a high frequency, such as Twitter, Facebook, news .

  4. Background in search systems Search caches Why is that a problem ? Search engine uses cache optimization which makes the search engine faster and efficient, when the data a dynamic data, some of cache’s information become irrelevant. Search engines search for the queries first in the cache, and only if there is cache miss they search in the Index. Thus, when the data is dynamic, it is existing in the cache, and the search engine returns UNCORRECT result

  5. General picture

  6. Why?

  7. The Architecture

  8. Data structures required for implementation Index- Lucene Index Directory : Lucene is a free text-indexing and -searching API written in Java, a typical Lucene index is stored in a single directory in the file system on a hard disk Cache- It was implemented as a linked-list with hash table. Replacement policy is LRU

  9. CIP-- CACHE INVALIDATION PREDICTORS  The CIP is formed of two major parts: Synopsis generator is responsible for preparing synopses of the new documents coming in . Invalidator interacts with the runtime system and decides which cached entries to invalidate according to two policies.

  10. Invalidation Policies • Basic: invalidates each query (in the cache) which appear in the synopsis. • Score: • Find out all the queries (in the cache) which are contained in the synopsis, for each one of them compute score(q,d)- where d is the added/updated document – and invalidate top K results.

  11. Illustration

  12. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  13. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  14. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  15. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  16. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  17. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  18. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  19. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  20. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  21. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  22. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  23. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  24. Basic Invalidation Cache CIP Will help here ! Added Document • President Barak Obama meets Mubarak in London

  25. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  26. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  27. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  28. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  29. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  30. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  31. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  32. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  33. Basic Invalidation Cache My work is done Added Document • President Barak Obama meets Mubarak in London

  34. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  35. Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London

  36. Score Invalidation- K=1 Cache Added Document • President Barak Obama meets Mubarak in London

  37. Score Invalidation- K=1 Cache Added Document-d • President Barak Obama meets Mubarak in London

  38. Score Invalidation- K=1 Cache Added Document-d • President Barak Obama meets Mubarak in London

  39. Score Invalidation- K=1 Cache Added Document-d • President Barak Obama meets Mubarak in London

  40. Software Design – UML Diagrams Search Query, with miss in cache

  41. Software Design – UML Diagrams Add a document to index with basic invalidation

  42. Skills We acquired the following skills in this project: • Knowledge: reading scientific publications • Java (& Advanced Java topics) • Working with Web-server.(apache) • Learning Lucene features and how to use it. • Building software Cache. • UML • XML parsing • HTML

More Related