1 / 26

Introduction to Data Structures

Introduction to Data Structures. Vamshi Ambati vamshi@andrew.cmu.edu. Overview. Java you need for the Project Search Engine and Data Structures THIS Code Structure On the Data Structure front Dictionaries (Dictionary Structures) Java Collections Linked List Queue.

Download Presentation

Introduction to Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Data Structures Vamshi Ambati vamshi@andrew.cmu.edu

  2. Overview • Java you need for the Project • Search Engine and Data Structures • THIS Code Structure • On the Data Structure front • Dictionaries (Dictionary Structures) • Java Collections • Linked List • Queue [c] Vamshi Ambati

  3. Java you will need for the Project • Core Programming + I/O and Files • OOPS • Inheritance • Packages • Encapsulation • Java API • Collections [c] Vamshi Ambati

  4. What is a Search Engine? • A sophisticated tool for finding information on the web • An Index for the World Wide Web • Analogous to the Index on a textbook • Just Imagine a world without Search Engine! [c] Vamshi Ambati

  5. Why Index in the first place? • Which list is easier to search? • sow fox pig eel yak hen ant cat dog hog • ant cat dog eel fox hen hog pig sow yak • A Sorted list always helps • Permits binary search. About log2n probes into list • log2(1 billion) ~ 3 [c] Vamshi Ambati

  6. How search engines work • The search engines maintain data of web sites in its database. • Use programs (often referred to as "spiders" or "robots") to collect information. • The information is then indexed by the search engine. • It allows users to look for the words or combination of words found in the index

  7. a (1, 4, 24…) • entry (17…) • file (2, 10) • contains(11,….) • position (25…) • positions (15…) • word (20….) • words (6,12..) • . • . INVERTED FILE Inverted Files FILE A file is a list of words and this file contains words at various positions. Each entry of the word is associated with a position. POS 1 10 20 30 36 [c] Vamshi Ambati

  8. DOCID OCCUR POS 1 POS 2 . . . . . . Inverted Files for Multiple Documents “jezebel” occurs 6 times in document 34, 3 times in document 44, 4 times in document 56 . . . LEXICON WORD INDEX [c] Vamshi Ambati

  9. A comprehensive form of Inverted Index [c] Vamshi Ambati SOURCE: http://www.searchtools.com/slides/bestsearch/bls-24.html

  10. THIS • Search engine for the website http://www.hinduonnet.com/ • Website for the news paper The Hindu • Not for the entire web • Results are confined to only one web site [c] Vamshi Ambati

  11. Index Structure for our Project (THIS) [c] Vamshi Ambati

  12. Search Engines

  13. Search Engine Differences • Coverage (What part of the web do they really cover?) • Crawling algorithms • Frequency of crawl • depth of visits • http://www.msitprogram.net/ Depth -0 • http://www.msitprogram.net/admissions.html/ • Depth -1 • Indexing policies • Data Structures • Representation • Search interfaces • Ranking [c] Vamshi Ambati

  14. Search Engine [c] Vamshi Ambati

  15. Crawl Index Search [c] Vamshi Ambati

  16. TheWeb Parser crawl parse Spider addUrls URLList Index addPage getNextUrl Indexer store retrieve FinalResult retrieve makePage Query Sort by Rank ResultSet ResultPage [c] Vamshi Ambati

  17. Where are our data structures and algorithms lying? Priority Queue Queue TheWeb Parser crawl parse Spider Index addUrls URLList addPage getNextUrl Hashtable Indexer store BinaryTree retrieve LinkedList FinalResult retrieve makePage Query Sort by Rank ResultSet ResultPage MergeSort& InsertionSort [c] Vamshi Ambati

  18. Inheritance Uses Calls Spider SearchDriver CrawlerDriver Crawl Code Structure(THIS) WebSpider Index Query addPage Restore Parse Queue Save PageLexer Indexer HttpTokenizer URLTextReader Index DictionaryDriver PageElement DictionaryInterface PageImg PageHref PageWord ListDictionary TreeDictionary HashDictionary [c] Vamshi Ambati

  19. Dictionary Structures (Lexicon) • A Dictionary is an unordered container that contains key-element pairs • Ordered Dictionary has the elements in sorted order • Keys are unique, but the values could be any [c] Vamshi Ambati

  20. Dictionary ADT • size(): returns the number of items in D • Output: Integer • isEmpty(): Test whether D is empty. • Output: Boolean • elements(): Return the elements stored in D. • Output: iterator of elements (objects) • keys(): Return the keys stored in D. • Output: iterator of keys (objects) • findElement(k): if D contains an item with key == k, then return the element of that item, else return NO_SUCH_KEY. • Output: Object • findAllElements(k): • Output: Iterator of elements with key k • insertItem(k,e): Insert an Item with element e and key k into D. • removeElement(k): Remove an item with key == k and return it. If no such element, return NO_SUCH_KEY • Output: Object (element) • removeAllElements(k): Remove from D the items with key == k. • Output: iterator of elements Also see the Java Standard API for Dictionary http://java.sun.com/j2se/1.4.2/docs/api/java/util/Dictionary.html [c] Vamshi Ambati

  21. Dictionary ADT in THIS Project • size(): returns the number of items in D • Output: Integer • isEmpty(): Test whether D is empty. • Output: Boolean • getKeys(): Return all the keys of the elements stored in D. • Output: String array (Ideally it should be Vector!!) • getValue(k): if D contains an item with key == k, then return the element of that item, else return NULL. • Output: Object • insertItem(k,e): Insert an Item with element e and key k into D. • remove(k): Remove an Item with key k from D. • We have customized the Dictionary a bit as we would be inserting only elements of the type <String,Object> !! [c] Vamshi Ambati

  22. Java Collections • java.util.* (A quite helpful library) • Has implementations for most of the Data Structures • They make life really easy • You can not use the data structures inbuilt unless specified (Eg:Task1 Tasklet-A) • Use them for non-data structural purposes - Collections • Eg: Arrays,Vectors, Iterators,Lists, Sets etc • You would definitely be using “Iterator” atleast as you would be dealing with many Objects at a time! • http://java.sun.com/j2se/1.4.2/docs/api/java/util/Iterator.html. See: http://java.sun.com/docs/books/tutorial/collections/ [c] Vamshi Ambati

  23. Other Data structures • Queue • LinkedList • Beware! there are no Pointers in Java • However there are “references” • Learn more about References in Java • Do not use the java.util package for DataStructures or Sorting Algorithms! You are expected to code them [c] Vamshi Ambati

  24. Summary • Learn data structures by implementing THIS • Mini version of a real search engine • Frame work is provided • More details in the next video [c] Vamshi Ambati

  25. THANK YOU [c] Vamshi Ambati

More Related