using owa fuzzy operator to merge retrieval system results l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Using OWA Fuzzy Operator to Merge Retrieval System Results PowerPoint Presentation
Download Presentation
Using OWA Fuzzy Operator to Merge Retrieval System Results

Loading in 2 Seconds...

play fullscreen
1 / 27

Using OWA Fuzzy Operator to Merge Retrieval System Results - PowerPoint PPT Presentation


  • 302 Views
  • Uploaded on

Tehran University Using OWA Fuzzy Operator to Merge Retrieval System Results Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud Rahgozar School of Electrical and Computer Engineering University of Tehran Farhad Oroumchian University of Wollongong in Dubai Outline The Persian Language

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Using OWA Fuzzy Operator to Merge Retrieval System Results' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
using owa fuzzy operator to merge retrieval system results

Tehran University

Using OWA Fuzzy Operator to Merge Retrieval System Results

Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud Rahgozar

School of Electrical and Computer Engineering

University of Tehran

Farhad Oroumchian

University of Wollongong in Dubai

slide2

Outline

  • The Persian Language
  • Used Methods
    • Vector Space Model
    • Language Modeling
  • OWA Operator
  • The test collections
  • Experiment results
  • Conclusion

University of Tehran - Database Research Group

slide3

Outline

  • The Persian Language
  • Used Methods
    • Vector Space Model
    • Language Modeling
  • OWA Operator
  • The test collections
  • Experiment results
  • Conclusion

University of Tehran - Database Research Group

slide4

The Persian Language

  • It is Spoken in countries like Iran, Tajikistan and Afghanistan
  • It has Arabic like script and 32 characters written continuously from right to left
  • It’s morphological analyzers need to deal with many forms of words that are not actually Farsi
    • Example
      • The word “عادت” that has two plural forms in Farsi:
        • Farsi form“عادت ها”
        • Arabic form“عادات”

University of Tehran - Database Research Group

slide5

Outline

  • The Persian Language
  • Used Methods
    • Vector Space Model
    • Language Modeling
  • OWA Operator
  • The test collections
  • Experimental results
  • Conclusion

University of Tehran - Database Research Group

slide6

Vector Space Model

List of Weights that produced the best results

Best

We used Lnu.ltu and Lnc.btc weighting schemas

University of Tehran - Database Research Group

slide7

Outline

  • The Persian Language
  • Used Methods
    • Vector Space Model
    • Language Modeling
  • OWA Operator
  • The test collections
  • Experimental results
  • Conclusion

University of Tehran - Database Research Group

language modeling
Language Modeling
  • four ways to specify the rank of document d against query q
  • Considering P(D=d) as the prior probability of relevance of the document d to the query q
  • Lambda (λ ) is a smoothing parameter and is equal for each query term
    • if there is no previous relevance information available for a query, each query term will be considered equally important

University of Tehran - Database Research Group

language modeling cont
Language Modeling- Cont.

University of Tehran - Database Research Group

slide10

Outline

  • The Persian Language
  • Used Methods
    • Vector Space Model
    • Language Modeling
  • OWA Operator
  • The test collections
  • Experimental results
  • Conclusion

University of Tehran - Database Research Group

owa operator cont
OWA Operator- Cont.
  • We used OWA operator as the merge operator.
  • The OWA weight of each document d is defined as:

Each score xi is assigned by ith search engine to document d. If d is not present in the ith list then xi=0.

.

University of Tehran - Database Research Group

owa operator weighting method
OWA Operator- Weighting Method
  • Quantifier Based Weighting
  • Degree of Importance Based Weighting

University of Tehran - Database Research Group

quantifier based weighting
Quantifier Based Weighting
  • linguistic quantifiers All, Most, Few, and At-Least-One as the weighting schemas,
  • All: consider documents appearing in all retrieval engines’ lists. This quantifier is suitable when the user is looking for precise answer
  • Most: a fuzzy majority operator that assumes the retrieval by the most of the engines to be sufficient for inclusion in the fused list.
  • Few: is a weaker weighting schemas in which it is enough for a document to be retrieved by a few number of retrieval engines.
  • At-Least-One: is the weakest weighting schemas in which it is enough for a document to appear in only one retrieval engine’s list to be included in the fused list.

University of Tehran - Database Research Group

degree of importance based weighting
Degree of Importance Based Weighting
  • As the second weighting schema we use the position of the documents in the retrieved lists
  • The weight of each document d in the Li,q is defined by

Ni is the number of elements in the ith list, Li,q, and POSi is the position of document d in Li,q.

.

University of Tehran - Database Research Group

slide15

Outline

  • The Persian Language
  • Used Methods
    • Vector Space Model
    • Language Modeling
  • OWA Operator
  • The test collections
  • Experimental results
  • Conclusion

University of Tehran - Database Research Group

test collections
Test Collections
  • Qvanin Collection
    • Documents: Iranian Law Collection
      • 177089 passages
      • 41 queries and Relevance Judgments
  • Hamshari Collection
    • Documents: 600+ MB News from Hamshari Newspaper
      • 160000+ news articles
      • 60 queries and Relevance Judgments
  • BijanKhan Tagged Collection
    • Documents: 100+ MB from different sources
      • A tag set of 41 tags
      • 2590000+ tagged words

University of Tehran - Database Research Group

slide17

Hamshahri Collection

  • We used HAMSHAHRI (a test collection for Persian text prepared and distributed by DBRG (IR team) of University of Tehran)
  • The 3rd version:
    • contains about 160000+ distinct textual news articles in Farsi
    • 60 queries and relevance judgments for top 20 relevant documents for each query

University of Tehran - Database Research Group

slide18

Outline

  • The Persian Language
  • Used Methods
    • Pivoted normalization
    • N-Gram approach
    • Local Context Analysis
  • Our test collections
  • Experimental results
  • Conclusion

University of Tehran - Database Research Group

experiment results
Experiment results

The precision of the six retrieval engines at different document cut-offs. The LM4 and the Lnu.ltu

with slope 0.25 methods are better than the other systems

University of Tehran - Database Research Group

quantifier based owa weighting
Quantifier Based OWA Weighting

The parameter n (in Most and Few quantifiers) indicates the minimum number of retrieval lists sufficient for inclusion in the merge process

University of Tehran - Database Research Group

slide21

Experiment results

The precision of the fusion methods at different document cut-offs. The bests are Most3 & Most4.

University of Tehran - Database Research Group

experiment results22
Experiment results

Comparing LM4 and Lnu.ltu methods with the best OWA results

University of Tehran - Database Research Group

statistical significance tests
Statistical significance tests
  • Wilcoxon Signed Rank
  • T-Test

University of Tehran - Database Research Group

statistical significance tests24
Statistical significance tests
  • Based on T Test, both Most3 and Most4 methods are significantly better than LM4 method which is a confirmation of The Wilconxon Signed Rank test.
  • However, with the T-Test we can not confirm the significance of the Most3 and Most4 methods over the Lnu.ltu with slope of 0.25 method.

University of Tehran - Database Research Group

conclusion
Conclusion
  • We used two weighting namely quantifier based and degree-of-importance based weighting methods
  • The experimental results show that the best OWA operator, Most3 and Most4 (quantifier based OWA operators), only marginally improve over the best retrieval method on Persian text the LM4 methods.
  • However seems they produce better ranking since they push the relevant documents to higher ranks.
  • The significant tests we conducted seem to confirm that Most3 and Most4 are significantly better than all other methods but Lnu.ltu with slope of 0.25.
  • However, the superiority over the Lnu.ltu with slope of 0.25 was not confirmed by T-Test.

University of Tehran - Database Research Group

thanks questions
Thanks, Questions

?

http://ece.ut.ac.ir/dbrg

University of Tehran - Database Research Group

owa operator cont27
OWA Operator- Cont.
  • The OWA weight of each document is computed by this Equation:

WT is the transpose vector of W that

defines the semantics of associated with the

OWA operator

B=[b1,b2,..,bn] is the vector

X=[ x1, x2,…, xn] reordered so that bj=Minj(x1, x2,…, xn), that is the jth smallest element of all the x1, x2,…, xn.

  • we used a simple function to bring the scores ({xi, i=1,…,n}) into a same scale

University of Tehran - Database Research Group