slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
FuFaIR: a Fuzzy Farsi Information Retrieval System Amir Nayyeri School of Electrical and Computer Engineering University PowerPoint Presentation
Download Presentation
FuFaIR: a Fuzzy Farsi Information Retrieval System Amir Nayyeri School of Electrical and Computer Engineering University

Loading in 2 Seconds...

play fullscreen
1 / 36

FuFaIR: a Fuzzy Farsi Information Retrieval System Amir Nayyeri School of Electrical and Computer Engineering University - PowerPoint PPT Presentation


  • 158 Views
  • Uploaded on

FuFaIR: a Fuzzy Farsi Information Retrieval System Amir Nayyeri School of Electrical and Computer Engineering University of Tehran Farhad Oroumchian University of Wollongong in Dubai. Overview. Persian Language Related Work Fuzzy IR Farsi IR FuFaIR Explanation Experimental Results

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'FuFaIR: a Fuzzy Farsi Information Retrieval System Amir Nayyeri School of Electrical and Computer Engineering University' - rodd


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

FuFaIR: a Fuzzy Farsi Information Retrieval System

Amir Nayyeri

School of Electrical and Computer Engineering

University of Tehran

Farhad Oroumchian

University of Wollongong in Dubai

overview
Overview
  • Persian Language
  • Related Work
    • Fuzzy IR
    • Farsi IR
  • FuFaIR Explanation
  • Experimental Results
  • Conclusion and Future Work
persian language
Spoken in several countries (Iran, Afghanistan, Tajikistan …)

This language has evolved over the years been influenced by many languages

Contains foreign words from many languages such as Arabic, Turkish, French, English, …

In some cases these words still follow the grammatical rules of their original languages for example:

“Maktab” مكتب (singular)  “MAKATEB” مكاتب (plural)

In some cases these words could use grammatical rules of both languages i.e.

“Khabar” خبر (singular) 

“AKHBAR” اخبار(Arabic)

“KHABAR-HA” خبرها (Persian)

Morphological analyzers for this language need to deal with many forms of words

Persian Language
information retrieval and natural language processing for persian farsi
Information Retrieval and Natural Language Processing for Persian (Farsi)
  • Faculty of Engineering of University of Tehran started working on processing of Persian about 7 years ago.
  • From 3 years ago, it has been a joint co-operation between UT and UOWD.
  • Since then several thousand experiments on processing and retrieval of Persian text have been performed.
test collections
Test Collections
  • Qvanin Collection
    • Documents: Iranian Law Collection
      • 177089 passages
      • 41 queries and Relevance Judgments
  • Hamshari Collection
    • Documents: 300 MB News from Hamshari Newspaper
  • Part of Speech Tagging Collection
    • A tag set of 40 tags
    • 2590000+ tagged words
natural language processing
Natural Language Processing
  • Investigating Automatic Part of Speech Tagging based on machine learning approaches:
    • Probabilistic (Hidden Markov Model)
    • Rule based
    • Entropy based
    • Neural Networks
    • The best so far has reached a 96% accuracy.
information retrieval experiments
Information Retrieval Experiments
  • All Major Retrieval Models of English text retrieval have been tested and their combinations (i.e.)
    • Fuzzy Logic
      • MMM, Paice,
    • Vector Space
    • Probabilistic
      • BM25
    • N-Grams
      • N=2, N=3, N=4
  • Combinational
  • With many different term weighting schemes.
the context of the current work
The context of the current work
  • Improving the quality of Persian retrieval
  • Improving IR systems that used Fuzzy Logic as their retrieval model
related work fuzzy ir
Fuzzy logic has been used in IR from early days.

But only a few of them could show superiority in comparison with Classical approaches like vector space.

This has been confirmed for Persian language also.

The current work has been mostly inspired by one of them:

D.E. Losada, F.D. Hermida, A. Bugarin, S. Barro. Experiments on using fuzzy quantified sentences in adhoc retrieval. ACM Symposium on Applied Aomputin, 2004.

Related Work – Fuzzy IR
slide12

Mixed Min & Max – MMM

    • Calculates the degree of membership of a document to the fuzzy set of the terms in the query as below
    • OR Query:
    • (قيموميت يا حضانت)  ((Guardian OR GOD Parent
    • Q or = (A1OR A2 OR A3 OR …)
    • SIM(Qor, D) = C or1 * max(dA1, dA2, …) +C or2 * min(dA1, dA2, …)
    • AND Query
    • (املاك و ثبت ) (Registration AND Properties) 
    • Q and = (A1 AND A2 AND A3 AND …)
  • SIM(Qand, D) = C and1 * min(dA1, dA2, …) +
    • C and2 * max(dA1, dA2, …)
    • Cand , Cor softness coefficient
    • Cand1 = [0.5,0.8] Cand2 = 1 – Cand1
    • Cor1 > 0.2 Cor2 = 1- Cor1
slide13

Paice Model

  • Calculates the degree of membership of a document to the fuzzy set of terms in the query as below:
    • AND Query
    • (املاك و ثبت )  (Registration AND Properties)
  • Q and = (A1 and A2 and A3 and …)
    • OR Query:
    • (قيموميت يا حضانت)  (Guardian OR GOD Parent )
  • Q or = (A1or A2 or A3 or …)
  • SIM(Q, D) =  ri-1 tdi/ ri-1
  • r = 1.0 for and queries (tdi ascending order)
  • r = 0.7 for or queries (tdi descending order)
fufair
FuFaIR
  • The query is considered as a fuzzy set of relevant documents in the database
  • The documents will be sent to the client sorted based on their degree of membership to the query's fuzzy set
  • The larger the value of µi the more relevant is the document to the query

i

fufair cont
FuFaIR (Cont.)
  • each term is assigned a membership degree to a document based on the importance of that term for representing the document’s content.
  • Membership degree can be computed with classical IR parameters such as tf/idf
  • The input query is considered as an algebraic sentence whose elements are:
    • Terms
    • Fuzzy operators such as AND, OR, and NOT
  • Applying the operators on terms the final Fuzzy Set results

i

fufair cont1
FuFaIR (Cont.)
  • The membership degree of a document to an individual term is defined as follows in our method:

i

ft,d= Frequency of term t in document d

idf (t) = Inverse document frequency of term t

overview1
Overview
  • Persian Language
  • Related Work
    • Fuzzy IR
    • Farsi IR
  • Fuzzy Logic Overview
  • FuFaIR Explanation
  • Experimental Results
  • Conclusion and Future Work
experimental results
Experimental Results
  • Parameters:
    • Hamshahri Corpora has been used
    • Total size of the collection: 300+MB
  • Indexing has been performed after stop word elimination
  • No stemming has been applied
  • 30 queries have been used for these experiments
  • Precision has been computed for top 20 retrieved documents.
experimental results cont
Experimental Results (Cont.)

Some Sample Queries:

experimental results cont1
Experimental Results (Cont.)
  • As a bench mark the best Persian retrieval model so far has been selected. That is the Vector Space model with Lnu-ltu weighting scheme.
  • Pivot and the slope parameters have been set to 13.36, and 0.75, respectively
    • The effectiveness of these values had been shown by previous works (See Paper).
  • To calculate the performance of each run, the precision at 5, 10, 15 and 20 document cut-offs have been calculated and averaged over all 30 queries.
experimental results cont2
Experimental Results (Cont.)

Comparison Results:

conclusion future work
Conclusion & Future Work

Conclusion

  • Main contribution of this paper:
    • Design, implementation and testing of FuFaIR a Fuzzy retrieval system for Persian language.
  • fuzzy quantifiers are also added to the original model to provide more flexibility
  • In comparison with Vector Space, FuFaIR significantly better performance

Future Works:

  • Testing different interpretation of the Fuzzy operators on the Persian corpora
  • Examining the true value and contribution of a Persian stemmer in retrieval.
conception of fuzzy logic
Conception of Fuzzy Logic
  • Many decision-making and problem-solving tasks are too complex to be defined precisely
  • however, people succeed by using imprecise knowledge
  • Fuzzy logic resembles human reasoning in its use of approximate information and uncertainty to generate decisions.
natural language

“false”

“true”

Natural Language
  • Consider:
    • Joe is tall -- what is tall?
    • Joe is very tall -- what does this differ from tall?
  • Natural language (like most other activities in life and indeed the universe) is not easily translated into the absolute terms of 0 and 1.
fuzzy logic
Fuzzy Logic
  • An approach to uncertainty that combines real values [0…1] and logic operations
  • Fuzzy logic is based on the ideas of fuzzy set theory and fuzzy set membership often found in natural (e.g., spoken) language.
example young
Example: “Young”
  • Example:
    • Ann is 28, 0.8 in set “Young”
    • Bob is 35, 0.1 in set “Young”
    • Charlie is 23, 1.0 in set “Young”
  • Unlike statistics and probabilities, the degree is not describing probabilities that the item is in the set, but instead describes to what extent the item is the set.
membership function of fuzzy logic
Membership function of fuzzy logic

Fuzzy values

DOM

Degree of Membership

Young

Middle

Old

1

0.5

0

25

40

55

Age

Fuzzy values have associated degrees of membership in the set.

benefits of fuzzy logic
Benefits of fuzzy logic
  • You want the value to switch gradually as Young becomes Middle and Middle becomes Old. This is the idea of fuzzy logic.
fuzzy set operations
Fuzzy Set Operations
  • Fuzzy OR (): the union of two fuzzy sets is the maximum (MAX) of each element from two sets.
  • E.g.
    • A = {1.0, 0.20, 0.75}
    • B = {0.2, 0.45, 0.50}
    • A  B = {MAX(1.0, 0.2), MAX(0.20, 0.45), MAX(0.75, 0.50)}

= {1.0, 0.45, 0.75}

fuzzy set operations1
Fuzzy Set Operations
  • Fuzzy AND (): the intersection of two fuzzy sets is just the MIN of each element from the two sets.
  • E.g.
    • A  B = {MIN(1.0, 0.2), MIN(0.20, 0.45), MIN(0.75, 0.50)} = {0.2, 0.20, 0.50}
fuzzy set operations2
Fuzzy Set Operations
  • The complement of a fuzzy variable with DOM x is (1-x).
  • Complement: The complement of a fuzzy set is composed of all elements’ complement.
  • Example.
    • Ac = {1 – 1.0, 1 – 0.2, 1 – 0.75} = {0.0, 0.8, 0.25}