KEYWORD – BASED FILTERING
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Advantages Mature technology PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

KEYWORD – BASED FILTERING Content base filtering that uses keyword counts from documents as representations of items. Advantages Mature technology Works as well as more sophisticated content-filtering technologies in high-quality document domains. Disadvantages Only works in document domains

Download Presentation

Advantages Mature technology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Advantages mature technology

KEYWORD – BASED FILTERINGContent base filtering that uses keyword counts from documents as representations of items.

  • Advantages

    • Mature technology

    • Works as well as more sophisticated content-filtering technologies in high-quality document domains.

  • Disadvantages

    • Only works in document domains

    • Cannot capture subjective notions of quality, style of documents being filtered.

  • ApplicationsFiltering high-quality news wires and document databases.Web search engines


Advantages mature technology

NEURAL NETWORKS: Highly sophisticated content-based filtering technology that can use any arbitrary attribute information about items being filtered..

  • AdvantagesVery powerful technology (can work with many kinds of items having attribute information).Given sufficient training examples, can learn almost any concept.

  • DisadvantagesRequire long training“Black boxes”: no way to determine what exactly they have learned.Not scaleable (works only for small samples).Cannot capture subjective notions..

  • ApplicationsFilters any information stream where the items are tagged with attributes (documents, credit records, etc.) or contain keywords.


Advantages mature technology

ACTIVE COLLABORATIVE FILTERING:“Manual” collaborative filtering technique where users explicitly identify other users in the community whose opinions they are interested in..

  • AdvantagesWorks well for small communities where users know each other and their areas of expertise.Combines elements of feature-based filtering with opinion-based filtering.

  • DisadvantagesNot feasible in large communities of users.The burden of identifying the appropriate members of the community and constructing the appropriate query rests on the user.

  • ApplicationsInformation and document sharing in small workgroup environments.


Advantages mature technology

AUTOMATED COLLABORATIVE FILTERING (ACFAn automated version of “word of mouth,” where the technology uses the opinions of a large community to filter items for each person.

  • AdvantagesIncorporates subjective notions of quality into the filtering process.Very effective for domains where items cannot be easily analyzed by computer or that are highly subjective.

  • DisadvantagesNo knowledge about the “kinds of items” being filtered: can lead incorrect results. Technology cannot utilize additional information about the items even when it may be available and relevant.

  • ApplicationsHighly subjective domains (music, travel…).Domains that are not amenable to machine analysis (e.g., video).Domains where the perceived quality of items fluctuates very widely (e.g., Web sites)..


Advantages mature technology

FEATURE-GUIDED AUTOMATED COLLABORATIVE FILTERING (FGACF):Technology that utilizes features of items to partition items and more effectively apply the ACF algorithm.

  • AdvantagesUtilizes available feature information to partition the item space to apply ACF effectively. Combines strengths of simple content-based filtering with those of collaborative filtering while addressing the limitations of standard ACF.

  • DisadvantagesFeature information used must be relevant to partitioning the item space.

  • Applications“Broad” subjective domains (Web sites, books, restaurants) where additional feature information is available.Any domain standard ACF applies to.


Rule based technology

Rule-based Technology

Rule 1:If visitor age under 40 and not married and income greater than $100,000, show a Mercedes ad.

Rule 2:If visitor age under 40 and married and income not greater than $100,000, show a Plymouth ad.

What will happen if visitor age under 40 and not married and income not greater than $100,000? Maybe show a VW ad, but the rule must be explicitly given!


Rule based technology1

Rule-based Technology

What will happen if visitor age under 40 and not married and income not greater than $100,000? Maybe show a VW ad, but the rule must be explicitly given!

There are algorithms, such as ID3, which will generate a set of business rules based on a list of example cases. These rules then can be examined to verify their validity. Neural networks can perform the same type of classification but are “black boxes”, the business rules are not explicit.


Collaborative filtering algorithm

Collaborative Filtering Algorithm

How do we use rating to make predictions?How do we predict Ken’s rating for product 6?


Collaborative filtering algorithm1

Collaborative Filtering Algorithm


Collaborative filtering algorithm2

Collaborative Filtering Algorithm

We use the correlation coefficients. Notation RKL is the correlation between Ken & Lee.But how?


Collaborative filtering algorithm3

Collaborative Filtering Algorithm

  • Did the user like or dislike the product?

  • How close is the user’s rating to his/her average?

  • E.G. Lee’s AVG = 3, and gave Product 6 a 2, so use -1. Write L6-Lavg = 2 - 3 = -1.

  • Weighted average from Ken’s average:K6 = Kavg + (L6-Lavg)RKL + (M6-Mavg)RKM ++ (N6-Navg)RKN = 3 + (2-3)(-.8) + (5-3)(.33) + (3-2.6)(0) = 3 + .8 + .66 = 4.46


Neural network algorithm

Neural Network Algorithm

A diagram of a single-layer neural network.

xi is the signal level at input i (attribute i).

wi is the weight associated with input i.

wi(t) is the weight associated with input i at time t.

 is a threshold level.y = 


Neural network algorithm1

Neural Network Algorithm

Neural nets “learn” by adjusting the values of the weights. Initially the values of the weights are set to small random values. Training (learning) involves the readjustment of the input weights to develop the correct response to the training set.


Neural network algorithm2

Neural Network Algorithm


Neural network algorithm3

Neural Network Algorithm


Neural network algorithm4

Neural Network Algorithm


Neural network algorithm5

Neural Network Algorithm


Neural network algorithm6

Neural Network Algorithm


Neural network algorithm7

Neural Network Algorithm


Neural network algorithm8

Neural Network Algorithm


Neural network algorithm9

Neural Network Algorithm

  • 15..Calculate output ywixi = (0.1)(1) + 2.0(1) + (-1.7)(0) = 2.1and 2.1 > . So y(4)= 1. Therefore the network also recommends Plymouth correctly and we can stop.


15 dimensions of data quality in no actual order of importance

15 Dimensions of Data Quality(in no actual order of importance)

  • First 5:

    • Believability (believable)

    • Accuracy (data are certified error-free, accurate, correct, flawless, reliable, errors can be easily identified, the integrity of the data, precise)

    • Timeliness (age of data)

    • Accessibility (accessible, retrievable, speed of access, available, up-to-date)

    • Value –added (data give you a competitive edge, data add value to your operations)


15 dimensions of data quality

15 Dimensions of Data Quality

  • Second 5:

    • Relevancy (applicable, relevant, interesting, usable)

    • Objectivity (unbiased, objective)

    • Concise (well-presented, concise, compactly represented, well-organized, aesthetically pleasing, form of presentation, well-formatted, format of the data)

    • Appropriate amount of data (the amount of data)

    • Representational consistency (data are continuously presented in same format, consistently represented, consistently formatted, data are compatible with previous data)


15 dimensions of data quality1

15 Dimensions of Data Quality

  • Last 5:

    • Ease of understanding (easily understood, clear, readable)

    • Interpretability (interpretable)

    • Completeness (breadth, depth, and scope of information contained in the data)

    • Reputation (reputation of the data source, reputation of the data)

    • Access security (data cannot be accessed by competitors, data are of a proprietary nature, access to data can be restricted, secure)


  • Login