150 likes | 264 Views
This paper presents a methodology for personalized ranking over categorical attributes in information retrieval, addressing the absence of inherent ordering in categorical attributes. It discusses objectives to enable unified ranked retrieval, support binary encoding, and handle single- and multi-valued attributes with bounded cardinality.
E N D
Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won You, Seung-won Hwang, Hwanjo Yu Information Sciences 178(2008)
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
Motivation • Categorical attributes’ problem of information retrieval's personal ranking • Categorical attributes do not have an inherent ordering. • How to rank the relevant data by categorical attribute. • For example, how can we… • Find old female with the preference of soda drink. 3
Objectives • Enableauniformrankedretrievaloveracombinationofcategoricalattributesandnumericalattributes. • Supportrankingofbinary representation of categorical attribute • Binary encoding • Sparsity Single-valued attribute Multi-valued attribute with bounded cardinality (item set, bc=2) 4
Overview (3) (2) (1) 5
Rank formulation 6 F= 0.5*age + 3*female+…
Rank processing (TA) A Simple example query: Find old female with the preference of soda drink. Transform into F= age + female • Candidate identification • Sorted Access age and female • Find top-k sa(age) and sa(female), e.g., k=1, sa(age)={o1}; sa(female)={o2} • Candidate reduction • O1=30+0 • O2=25+1 • O1 with the highest F score • Termination • O1 !> F(30,1)=31 // upper bound score • Another round of sorted access to consider more candidates, e.g., sa(age)={O4}; sa(female)={O3} 7
Bitmap – binary encoding F=v1+v2+v3+v4, k=2 • K={}, C={1111}(Initailization) • OID=excute(C) • OID={o4},|OID|>0,K={[o4,4]} • C={0111/1011/1101/1110} (Expansion) • K.count<k,Back to 2) • … 8
Bitmap– sparsity Single-valued attribute F=w1v1+w2v2+…+w6v6 rankedweightw1≧w2≧w3;w4≧w5≧w6forsimple,allw=1,k=2 • K={}, C={100.100.100} (Initailization) • OID=excute(C) • OID={o4},|OID|>0,K=OID={[o4,2]} • C={010.100.100/100.010.100/100.100.010} (Expansion) • K.count<k,Back to 2) • … 9
Bitmap– sparsity Multi-valued attributewithboundedcardinality 10
Experiments • UCI’ssparsityofindicatingvariable • 22%ofdatasetconsistonlythecategoricalattributes. • 56%ofcombinationofnumerical&categoricalattributes. 11
Conclusions • Thispaperstudies • Howtosupportrankformulation • Processingoverdatawithcategoricalattributes • Insteadofadoptingexistingnumericalalgorithms,developabitmap-basedapproachto • Binaryencoding • Sparsity • Single-valued • Multi-valuedwithboundedcardinality
Comments • Advantage • … • Drawback • … • Application • …