1 / 26

Social Search

Social Search . Laks V.S. Lakshmanan. How are web search queries distributed? . Taken from Damon Horowitz’s talk slides. . How are web search queries distributed? . Web search works well! . Web search is a good start; more effort needed, possibly on top. . Based on opinion of friends. .

uri
Download Presentation

Social Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Social Search Laks V.S. Lakshmanan

  2. How are web search queries distributed? Taken from Damon Horowitz’s talk slides.

  3. How are web search queries distributed? Web search works well! Web search is a good start; more effort needed, possibly on top. Based on opinion of friends. Adapted from Damon Horowitz’s talk slides.

  4. Social Search General Remarks • Search in text corpora (IR). • Search in a linked environment (authority, hub, pagerank). • What if my social context should impact search results? • E.g.: users in a SN post reviews/ratings on items. • “Items” = anything they want to talk about and share their opinions with their friends and implicitly recommend (against).

  5. Social Search – The Problem & Issues • Search results for a user should be influenced by how he/his friends rated the items, in addition to quality of match as determined by IR methods and/or by pagerank-like methods. • Transitive friends’ ratings may matter too, up to some distance. • Users may just comment on an item w/o explicitly rating it.

  6. More Issues • Factoring in transitive friends somewhat similar to Katz: longer the geodesic from u to v, the less important v’s rating is to u. • Trust may be a factor. • There is vast literature on trust computation. • May need to analyze opinions (text) and translate into strength (score) and polarity (good or bad?).

  7. Other approaches • Google Social Search • Search for “barcelona” returns results including searcher’s friends’ blogs. • Relevant users need to connect their facebook, twitter, ... Accounts to their google profile. • Of particular value when serahcing for local resources such as shows and restaurants. • But this does not use user-generated content for ranking. • Aardvark – part of Google labs that was shut down -- is an interesting approach to social search. • There are other companies such as sproose.com. See wikipedia for a list and check them out. (sproose seems to take reviews into account in ranking.) [defunct now?] • Some papers develop notions of SocialRank, UserRank, FolkRank, similar to PageRank (see references in Schenkel et al. 2008 [details later]). • Part I based on: Damon Horowitz and SepandarD. Kamvar. The Anatomy of a Large-Scale Social Search Engine. WWW 2010.

  8. The Aardvark Approach • Classic web search – roots in IR; authority centric: return most relevant docs as answers to a search query. • Alternative paradigm: consult village wise people. • Web search – keyword based. • Social search – natural language; social intimacy/trust instead of authority. • E.g.: what’s a good bakery in the Mag mile area in Chicago? • What’s a good handyman, who is not too expensive, is punctual, and honest? Can you think of similar systems that already exist? Hint: what do you do when you encounter diffculties with a new computer, system, software, tool? Note: long, subjective, and contextualized queries. These queries are normally handled offline, by asking real people. Social search seeks to make them online.

  9. Aardvark Modules • Crawler and Indexer. • Query Analyzer. • Ranking Function. • UI.

  10. Index what? • User’s existing social habitat – LI, FB contacts; common groups such as school attended, employer, …; can invite additional contacts. • Topics/areas of expertise: learned from • Self declaration • Peer endorsement (a la LI) • Activities on LI, FB, Twitter, etc • Activites (asking/answering [or not] questions) on Aardvark. • Forward Index: user (id), topics of expertise sorted by strength, answer quality, response time, … • Inverted Index: for each topic, list of users sorted on expertise, plus answer quality, response time, etc.

  11. Query Life Cycle Transport Layer Conversation Manager Routing Engine

  12. Query Answering Model Prob. that u_i is an expert in topic t. Prob. that question q is In topic t. All this is fine. But it’s important to Engage a large #high quality question askers and answerers to make and keep The system useful. Prob. That u_i can successfully answer a question from u_j. Usually based on strength of social connections/trust etc. Prob. that u_i can successfully answer question q from u_j. Red and Cyan can be computed offline and updated periodically. Purple computed online using soft classification. Computation of is parallelizable.

  13. Indexing Users • For each user and topic learn from -- Positive Signals: • Self declaration • Peer endorsement • Online profiles – e.g., FB, home pages etc. (linear SVM is used.) • Parse online activities (FB, LI, Twitter, etc.) • Negative Signals: • Muting a topic. • Declining to answer question on a topic. • Getting negative f/b on an answer from other users. • Topic Strengthening: • If your expertise in a topic is non-zero, add up expertise of your neighbors and renormalize. • Normalize probabilities across topics, for a user. • Finally, Pr • Connection strength: cosine similarity over feature space – e.g., social distance, demographics, vocabulary similarity, response time similarity, etc. But (artificially) forced to probability via normalization: • As users interact, update these two probabilities.

  14. Question Analysis • Semi-automated: • Soft classification into topics – • Filter out non-qns, inappropriate and trivial qns. • KeywordMatchTopicMapper map keywords/terms in question to topics in user profile. • TaxonomyTopicMapper  places question on a taxonomy covering popular topics. • LocationMatching. • Human judges assign scores to topics (evaluation).

  15. Overall ranking • Aggregation of three kinds of scores: • Topic expertise. • Social proximity/match between asker and answerer. • Availability of answerer (can be learned from online activity patterns, load, etc.) • Answerers contacted in priority order. • Variety of devices supported. • See paper for more details and for experimental results.

  16. SocialWisdom for Search and RecommendationRalf Schenkel et al. IEEE DE Bullet. June 2008. • Expand scope of RecSys by storing (in a relational DB) other info.: Users(username, location, gender, . . .) Friendships(user1, user2, ftype, fstrength) Documents(docid, description, . . .) Linkage(doc1, doc2, ltype, lweight) Tagging(user, doc, tag, tweight) Ontology(tag1, tag2, otype, oweight) Rating(user, doc, assessment) Just modeling/scoring aspects; scalability ignored for now.

  17. Friendship types and search modes • Social – computed from explicit social graph, say using inverse distance. Could be based on others like Katz. • Spiritual – derived based on overlap in activities (rating, reviews, tagging, ...). • Global – all users given equal weight = 1/|U|. • All measures normalized so the weights on all o/g edges from a user sum to 1. • Combos possible: F(u,u’) = aFso(u,u’) + bFsp(u,u’) + cFgl(u,u’), with a+b+c = 1.

  18. Scoring documents for tags – digress into BM25 • BM25 – state of the art IR model. idf(ti) (k1+1)tf(D, ti) • score(D,ti) = -------------------------- tf(D, ti) + k1(1-b+b.len(D)/avgdl) • k1, b tunable parameters. • #docs – n(ti)+0.5 • idf(D, ti) = log ------------------- • n(ti)+0.5 • tf = term frequency, idf = inverse doc frequency.; avgdl = avg doc length, n(ti) = #docs containing ti.

  19. Adapt to social search (k1 + 1) · |U| · sfu(d, t) • su(d, t) = ---------------------------- · idf(t) k1 + |U| · sfu(d, t) |U|=#users. |D| − df(t) + 0.5 • idf(t) = log --------------------- df(t) + 0.5 |D|=#docs, df(t) = #docs tagged t. • sfu(d, t) = ∑vЄUFu(v) tfv(D,t). • BTW, when we say docs, think items!

  20. Tag expansion • Sometimes (often?) users may use related tags: e.g., tag an automobile as “Ferrari” and as “car”. • tsim(t,t’) = P[t|t’] = df(t&t’)/df(t’). //error in the paper.// • Then sfu*(d, t) = maxt‘ЄT tsim(t,t’) . sfu(d, t‘). Plug in sfu*(d,t) in place of sfu(d,t) and we are all set.

  21. Socially aware Tag Expansion • Who tagged the documents and what is the strength of their connection to u? • tsimu(t,t’) = ∑vЄUFu(v).dfv(t&t’)/dfv(t’). • Score for a query: • s*u(d, t1, ..., tn) = ∑ti s*u(d,ti). • Experiments – see paper: librarything.com, mixed results. • Measured improvement in precision@top-10 and NDCG@top-10.

  22. Lessons and open challenges • Socializing search across the board is a bad idea. • Need to understand which kind of queries can benefit from what kind of settings (a, b, c values). Examples below. 1. Queries w/ global information need: perform best when a= b= 0; e.g., “Houdini”, “search engines”, “English grammar”; fairly precise queries; reasonably clear what are quality results.

  23. Lessons & Challenges (contd.) • 2. Queries with a subjective taste (a social aspect): perform best when a≈1; e.g., “wizard”; produces a large number of results but user may like only particular types of novels such as “Lord of the Rings”; the tag “wizard” may be globally infrequent but frequent among user’s friends. • 3. Queries with a spiritual information need: perform best when b ≈ 1; e.g., “Asia travel guide”; very general, need to make full use of users similar (in taste) to searcher. (Think recommendations.)

  24. Lessons & Challenges (contd.) • 4. Queries with a mixed information need: perform best when a≈b≈0.5; e.g.,“mystery magic”. • Challenges: The above is an ad hoc classification. Need more thorough studies and deeper insights. • Can the system “learn” the correct setting (a,b,c values) for a user or for a group? • The usual scalability challenges: see following references. • Project opportunity here.

  25. Follow-up Reading (Efficiency) • S. Amer-Yahia, M. Benedikt, P. Bohannon. Challenges in Searching Online Communities. IEEE Data Eng. Bull. 30(2), 2007. • R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, J.X. Parreira, G. Weikum. Efficient Top-k Querying over Social-Tagging Networks. SIGIR 2008. • M.V. Vieira, B.M. Fonseca, R. Damazio, P.B. Golgher, D. de Castro Reis, B. Ribeiro-Neto. Efficient Search Ranking in Social Networks. CIKM 2007.

  26. Follow-up Reading (Temporal Evolution, Events, Networks, ...) • N. Bansal, N. Koudas. Searching the Blogosphere. WebDB 2007. • M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, A. Tomkins. Visualizing Tags over Time. ACM Transactions on the Web, 1(2), 2007. • S Bao, G Xue, X Wu, Y Yu, B Fei. Optimizing web search using social annotations. WWW 2007. • Anish Das Sarma, Alpa Jain, and Cong Yu. Dynamic Relationship and Event Discovery. In WSDM, Hong Kong, China 2011. • SihemAmer-Yahia, Michael Benedikt, LaksLakshmanan, Julia Stoyanovich. Efficient Network-aware Search in Collaborative Tagging Sites VLDB 2008, 2008 We will revisit social search later in your talks.

More Related