1 / 26

A Specialised Search Engine for Neuroscience WebPages

N euro S earch. A Specialised Search Engine for Neuroscience WebPages. Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis, Fatmaeldresi@hotmail.com. Contents. Introduction.

Download Presentation

A Specialised Search Engine for Neuroscience WebPages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NeuroSearch A Specialised Search Engine for Neuroscience WebPages Fatma Y. ELDRESI (MPhil) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis, Fatmaeldresi@hotmail.com

  2. Contents Introduction Components in a NeuroSearch & its Architecture Implementation Software lifecycle : (1)webCrawler Engine, (2)Indexer Engine, (3)Query Engine, (4) Re-Crawler Engine (Specialised Crawler) Challenges Testing Conclusions

  3. Introduction • A server or a collection of servers dedicated to indexing internet web pages, storing the results and returning lists of pages which match particular queries. • Convenient search engines generate indexes : • Google using Spider • Yahoo using Directory • “NeuroSearch” Using Spider & the Advance Knowledge What is a Search Engine?

  4. Introduction cont.. • why is a specialised search engine needed? • Web has got non centralised organisation, with huge mixed collection of Information • Updated continuously, without standard format, • Pages are extensively linked Defining the problem Therefore, establishing standard measures for relevance is a very challenging task • In addition, • (1)-users have many challenges in choosing the relevant keywords; • (2)-professionals sometimes fail in their search and get disappointed result, because • the retrieved pages sometimes not related or • different from what the they’re looking for. • Creating a specialised search engine (i.e, Advance knowledge) to read web documents • Index and update all the content in the local server • Answer the queries from the local database • Update the system over a constant period The Objective

  5. Components of “NeuroSearch” It has two components: 1-Search/Crawler Engine 2- Query engines

  6. Retriever (Query engine) Re-crawler Indexer Spider Components explained Query Engine Crawler Engine Crawler Engine Crawler Engine

  7. WWW Search Engine Interface QueryEngine Users Index Indexer Re-Crawler WebCrawler World Wide Web “NeuroSearch” Architecture Model

  8. Implementation and Case Study • Creating the database using Access DB. • Implementing all parts of “NueroSearch” using • Java Language and SQL.

  9. Advance Knowledge data Re-crawler data WebCrawler data Query Data Indexer data NeuroSearch Database The Advance Knowledge TEXT TEXT TEXT

  10. Phase 1 Phase 2 Phase 3 The advance knowledgeCase study-Neuroscience (Vision) NeuroSearch uses advance knowledge about Neuroscience (vision) as a case study. Then, as a domain knowledge of Vision, do data mining to construct keywords and the relation between them. This knowledge is stored in the database and categorised by numbers, and related knowledge is categorised too and stored in data network form in the database.

  11. Software lifecycle Crawler Engine Consists of 1. WebCrawler/Spider Engine 2. Indexer Engine 3. Re-Crawler (specialised)

  12. WebCrawler (Spider) 1)-This web crawler is general one which can download any kind of WebPages. It performs this using : 2)-Fetch URL, retrieves all its WebPages and saves them in the local drive Spider 4)-The crawler performs a breadth-first search, which means it collects a list of all the links that are on the current page before it follows any of the links to a new page. 3)-In addition, WebCrawler has to access the proxy firewall (i.e. in Newcastle University LAN), before downloaded any web sites.

  13. WebCrawler - real challenge . • Challenge 1: • connect to www and accessing private websites. • Challenge 2: • connect this socket further to the WWW • Solution 2: • Get method : the straight forward socket uses is just to get the file name. • However, in this case • Get command has to take the full URL. • Solution 1: • Crawler has to allow its socket to connect first with the Proxy server.

  14. Indexer Engine • 2)- if it is related to the case study subject (neuroscience) so the indexerwill collect the following information from the document: 1)-Firstly, it search the webpage using it’s advance knowledge. Then, Webpage will be deleted if it is notrelated to the case study subject. Indexer Engine • 3)-All keywordsit contains, • how manytimes they are • repeated,title, contentsThen, save them in the database for later display in the query result and do other calculation. 4)-The Ranking Method

  15. Query Engine It has an interface to accept keywords from the user Query Engine It searches for query keywords in the index database and retrieved the result in html format. gives the user 2 choices for either display only the most relevant result, or the whole result which include the related results.

  16. Query Result:This is indeed an edge compared to other convenient search engines

  17. Re-Crawling 1-WebCrawler is specialised of any subject created in the advance knowledge in the database, which will achieve this purpose by reading the URL from the index database using SQL 2-its interface allow the special users decide to continue crawling the website or cancel it. Re-Crawling 3-This Part of software aimed to update the index found new link. This is will make search and crawl any “advance knowledge” subject related websites easier

  18. Testing phase Test phase requires: checking the first 10 ranking queries results of the “NeuroSearch” with the same 10 queries results of another search engine such as Google. specific keywords generalkeywords abbreviation & combined keywords 20 tests for each category Abbreviation keywords combinedkeywords Total of 1000 tests

  19. Testing cont.. Ranking query test results in General Keywords: Table 1: (Query 1) Ranking query test result in General Keywords: (Eye)

  20. Testing cont.. Chart 1Average of Keywords performance for Category Based test results of the (Google) Chart 2 Average of Keywords performance for Category Based test results of the (NeuroSearch)

  21. Analysing the search engines ranking results Depends on the Categories Table 4. The Average Ranking Engines Performance Query test resultsCategory based

  22. Analysing the Average Ranking Engines Performance Query test results Category based

  23. Visual representation Chart 4Average of the keyword Basedin the documents in Query test results for (Category based Query) engines performance Chart 3 Average of Categories Based Engines ranking performance

  24. Conclusion Particularly, if its advance knowledge built/created by specialist (domain knowledge), e.g. Oil, Medical, arts, etc Although “NeuroSearch” search engine Used a simple algorithm to judge the page quality compared by other convenient search engines, “NeuroSearch”proves to be very powerful in obtaining relevant results,

  25. Reference (example..) • : Wandell, Brain A. Foundations of Vision. Sunderland, Massachusetts, USA, 1995. • Brin, S. and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. The Seventh Annual International WWW Conference and computing science of Stanford University, Stanford, CA 94305.USA, 1998.

  26. Thank You ! Ready for Questions!!!

More Related