Meeting Presentation sept.12

Meeting Presentation sept.12 Things to do since last meeting: (1) find out the number of drug name in FDA website (done, the number is 6244 which is OK for us to do search crawl on twitters). (2) Read papers to find out new ideas about the query cost estimate. **Predicting query performance **what makes a query difficult, by David Camel **learning to estimate query difficulty, sigir2005 best paper. **Publications of Junghoo "John" Cho

Paper Review Predicting query performance This a great paper since it introduced a new concept named clarity score which can measure the similarity between query model and collection model. It helps us to view query difficulty from a new perspective: the weakness of query terms' ability to distinguish documents may lead query difficulty. what makes a query difficult, by David Camel This is a good development of the previous paper. It expands the concept of clarity score to a higher level concept of “distance model”. Distance does not only apply to query & collection, but also apply to query & relevant documents, relevant documents & collection, etc. What is more, the paper adopt more reasonable function: Jensen-Shannon divergence (JSD).

Paper Review learning to estimate query difficulty The paper offers a new view that sub-query coverage may also affect query difficulty a lot. To support such view, the authors provide two complex machine learning method: histogram and modified decision tree. The result shows that difficult query is likely to be dominated by a single sub-query.

Some Ideas A straight forward idea from David's paper is that we can do query deletion to maximum the distance between query and collection. The idea is not hard to implement. But I am wondering how much improvement we can get through this way.

Some Ideas An advanced idea is to connect it with retrieval cost. As we see, the traditional cost for retrieval is as following: n*(complexity of function*DF(i)) Thus computing cost is easy to be precomputed. It is also interesting to consider deleting low IDF and low clarity terms. It will greatly reduce the computing cost while decrease or even increase the retrieval performance.

Some Ideas It is also interesting to discuss term proximity and query expansion here. In my opinion, term proximity and external query term expansion may help to improve query clarity. The cost of term proximity is about additional: n*(n-1)/2*(DF1+DF2+averageTF1*averageTF2*comDoc) The cost of external query term expansion is about additional: n*(complexity of function*DF(i))+k*averageDoclength+N*(complexity of function*DF(i)) where n is the number of query terms, k is the number of top documents for expansion and N is number of terms expansed. It will be interesting to discuss how many clarity could term proximity and external query term expansion can add.

Meeting Presentation sept.12

Meeting Presentation sept.12

Presentation Transcript

CLASS MEETING 12 SEPT

Sept 12

Sept. 12, 2011: 10am Class

Thursday Sept 12/Friday Sept 13

Thursday , Sept. 12

CS 461 – Sept. 12

GMOD Meeting, Sept. 2003

Seminar Fee: Pre-paid by Sept. 12: $45 After Sept. 12: $50

Sept. 12/13 – Writing Workshop

Sept. 11/12 2013

ILC-TRC Meeting DESY Sept. 9-12, 2002 General Information

GMOD Meeting, Sept. 2003

BOARD MEETING Sept 19,2013

Sept 12, 2002

Investor Presentation | Sept 2013

WELCOME! NHL CREW MEETING 12:00 SEPT . 20 , 2014

Sept 2014 Chapter Meeting

802.11 WG Editor’s Meeting (Sept ‘12)

WRAP Meeting Sept 13, 2006

Investor Presentation Sept 2010

Working meeting - Florence , 12-14 Sept 2001