index structures of user profiles for efficient web page filtering services part 1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1) PowerPoint Presentation
Download Presentation
Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1)

Loading in 2 Seconds...

play fullscreen
1 / 17

Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1) - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on

Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1). Yi-Hung Wu and Arbee L. P. Chen. Outline. Introduction Related approaches 1 Proposed approaches. Introduction. Search engine: a centralized processing approach. Meta-search engine:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1)' - ady


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
index structures of user profiles for efficient web page filtering services part 1

Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1)

Yi-Hung Wu and Arbee L. P. Chen

outline
Outline
  • Introduction
  • Related approaches
  • 1 Proposed approaches
introduction
Introduction
  • Search engine:

a centralized processing approach.

  • Meta-search engine:

a distributed processing approach.

  • Information filtering services
search engine
Search engine
  • A search engine often collects the representative descriptions of the web pages before the users’ information needs.
  • 2 ways to reach the goal
    • Providing a registration form for a web page author to describe what the web page contains. (Yahoo)
    • Fetching the web pages and to extract the representative information periodically (Google)
meta search engine
Meta-search engine
  • A meta-search engine only keeps the summary information of the individual search engines, such as the index size, the frequency of updates, the search capability, and the expected response time.
  • When the user issues a query, a meta-search engine selects the best set of search engines to answer the query
information filtering services
Information filtering services
  • Why?
    • the performance of the search engines will get worse if the number of web pages grows rapidly
  • What?
    • Finding good matches between the web pages and the users’ information needs
information filtering services1
Information filtering services
  • How?
  • Users first give descriptions about what they need, which are called user profiles, to start the services. A profile index is built on these profiles. A series of incoming web pages will be put into the matching process. Each incoming web page is represented in the same form of the user profile.
  • In this way, the users who are interested in an incoming web page can be identified by comparing the descriptions of the web page with each user profile.
related approaches
Related approaches
  • An Example
  • The counting method
  • The tree method
the counting method1
The counting method
  • COUNT is used to keep the number of

occurrences for a profile during traversing the inverted lists in the matching process.

  • During the matching process, the inverted lists of the keywords extracted from the web page (abcf) will be traversed one by one
  • Pi is a match if the values in the corresponding entries of TOTAL and COUNT are equal.
the tree method
The tree method
  • Index tree
    • P-node: the nodes which store profiles
    • K-node: the nodes which store keywords
    • External path:a path that starts from the root passes a consecutive sequence of k-nodes and ended with a p-node
  • If the keyword in a k-node is not contained in the set of keywords extracted from a web

page, all the external paths containing this k-node will not cover any matched profiles.

method 1 index path with path signatures1
Method 1: index path with path signatures
  • This method guarantees that no repetition will occur in the index.
  • Considering the keywords of example page in Table 1(abcf), its path signature equals 11 at node b and 110 at node d. Therefore P1 is a match, but P2 is not. (We know that the keyword of matches most be the subsets of web page`s.)
to be continued

To Be Continued…

Thanks for your attention