1 / 16

Peer to Peer Information Retrieval

Explore the evolution of Peer-to-Peer (P2P) Information Retrieval (IR) beyond Napster, the advantages and disadvantages of P2P systems, and the future directions for decentralized and traditional IR methods.

rhammons
Download Presentation

Peer to Peer Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer to Peer Information Retrieval Going beyond Napster

  2. What is P2P IR? • No index on a central server • Content is distributed across all users of the system • Content is more then text • Binary files • Associated Metadata

  3. An example of a P2P system

  4. Why go P2P • Spiraling costs of maintaining indexes • Look at Google’s server farm • New content forces new thinking on IR • Large binary files are hard to index • Freedom of speech • Society is striving to communicate data which is being legislated against

  5. First P2P Systems • Central hash of distributed content • Only the central hash was used for queries • Disadvantages: • Scalability • Known location of content • Single point of failure • Advantages • Quick searching • Deterministic search results

  6. Bumps that caused change • Legal • Centralized services were easy targets • Owners of index could not claim they had no knowledge of content • Growth • Cost of maintaining service grew • Hardware requirements exploded

  7. Decentralized P2P • Content spread between users w/ no explicit intent • Centralized server is replaced by self-maintaining network • Every user is also a server • There is no index of content • How do we search?

  8. Searching Decentralized P2P Systems • Many methods, none perfected yet • Broadcast search • Advantages • Every node takes part in query • Disadvantages • As system grows, network bandwidth, query time grow exponentially

  9. Intelligent P2P Crawls • Ways to improve decentralized P2P query • Intelligently place data (FreeNet) • By knowing the algorithm that distributes data, querying can be done more intelligently • Clustering (Fireworks model) • Clients with similar properties are logically grouped • Queries that don’t apply to a group will not be sent to that entire group of clients • Both change the paradigm of what kind of data is shared and the means of sharing

  10. Other improvements • Today, most networks still rely on brute-force-search • CRC/MD5 hashing • A checksum of each file is computed • Instead of searching metadata, search for file hash • Files that are identical, but mislabeled, are still returned

  11. Query time limiting • Save on inter-system bandwidth, searches terminate after X hops • Client ends query after 100 results • Searches time out after X seconds

  12. Distributed IR • Traditional IR with the advantages of distributed systems • A central server still stores the index • Multiple brokers allow access to the data repository • Multiple gatherers crawl data near to them • Advantages are seen in the data acquisition end

  13. Examples

  14. Future Directions • Next steps will be drastic re-thinking of content placement ala FreeNet • Donate X amount of bandwidth, Y amount of HD space • Share Z directories of content • Actual content files are distributed to the network intelligently • Most requested files are blanketed • Unique files are still accessible

  15. Future directions for Traditional IR • Large central repositories such as Google will fade • Internet will be fragmented into clusters of interest • Similar interest groups will have decentralized search facilities • An index of these groups will replace the Google’s of today

More Related