1 / 17

Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1)

Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1). Yi-Hung Wu and Arbee L. P. Chen. Outline. Introduction Related approaches 1 Proposed approaches. Introduction. Search engine: a centralized processing approach. Meta-search engine:

ady
Download Presentation

Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1) Yi-Hung Wu and Arbee L. P. Chen

  2. Outline • Introduction • Related approaches • 1 Proposed approaches

  3. Introduction • Search engine: a centralized processing approach. • Meta-search engine: a distributed processing approach. • Information filtering services

  4. Search engine • A search engine often collects the representative descriptions of the web pages before the users’ information needs. • 2 ways to reach the goal • Providing a registration form for a web page author to describe what the web page contains. (Yahoo) • Fetching the web pages and to extract the representative information periodically (Google)

  5. Meta-search engine • A meta-search engine only keeps the summary information of the individual search engines, such as the index size, the frequency of updates, the search capability, and the expected response time. • When the user issues a query, a meta-search engine selects the best set of search engines to answer the query

  6. Information filtering services • Why? • the performance of the search engines will get worse if the number of web pages grows rapidly • What? • Finding good matches between the web pages and the users’ information needs

  7. Information filtering services • How? • Users first give descriptions about what they need, which are called user profiles, to start the services. A profile index is built on these profiles. A series of incoming web pages will be put into the matching process. Each incoming web page is represented in the same form of the user profile. • In this way, the users who are interested in an incoming web page can be identified by comparing the descriptions of the web page with each user profile.

  8. Information filtering services

  9. Related approaches • An Example • The counting method • The tree method

  10. An Example

  11. The counting method

  12. The counting method • COUNT is used to keep the number of occurrences for a profile during traversing the inverted lists in the matching process. • During the matching process, the inverted lists of the keywords extracted from the web page (abcf) will be traversed one by one • Pi is a match if the values in the corresponding entries of TOTAL and COUNT are equal.

  13. The tree methoda

  14. The tree method • Index tree • P-node: the nodes which store profiles • K-node: the nodes which store keywords • External path:a path that starts from the root passes a consecutive sequence of k-nodes and ended with a p-node • If the keyword in a k-node is not contained in the set of keywords extracted from a web page, all the external paths containing this k-node will not cover any matched profiles.

  15. Method 1: index path with path signatures

  16. Method 1: index path with path signatures • This method guarantees that no repetition will occur in the index. • Considering the keywords of example page in Table 1(abcf), its path signature equals 11 at node b and 110 at node d. Therefore P1 is a match, but P2 is not. (We know that the keyword of matches most be the subsets of web page`s.)

  17. To Be Continued… Thanks for your attention

More Related