1 / 25

Search 搜索

The Networked Economy (10): Information Management, Strategy, and Innovation 网络经济:信息管理,战略,和创新. Search 搜索. Search: Key Points 搜索:议程. Technolgoy 技术 Index 检索 Crawl (or spider) 网页爬行程序 Speed Store everything  Need search  To be fast, need to build index 存储所有东西  需要搜索  索引能加快速度

amaris
Download Presentation

Search 搜索

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Networked Economy (10):Information Management, Strategy, and Innovation网络经济:信息管理,战略,和创新 Search搜索

  2. Search: Key Points搜索:议程 • Technolgoy技术 • Index检索 • Crawl (or spider)网页爬行程序 • Speed • Store everything  Need search  To be fast, need to build index 存储所有东西  需要搜索 索引能加快速度 • Trade-off: Results very fast, but pre-computing and storage needed权衡:结果迅速但需要预先计算和存储空间 • Relevance (algorithmic)关联度(算法) • Information production  Information finding  Information filtering / ranking信息产出 信息搜索 信息过滤/排序

  3. Desktop search桌面搜索 • Money or Attention? • You pay with your money ; Buy software (e.g., X1 in 2005)用户付费购买软件: 软件开发(例如,X1 in 2005) • Pay with attention: Yahoo, Google, MSN search toolbar 用户付出注意力: 搜索工具栏 • Sites understand user behavior and situation better, can target ads better 获取用户行为和上网情境等信息, 改善定向广告效果

  4. Search搜索 • Desktop • Build index桌面:建立检索 • Intranet is similar局内网很类似 • Web • How to find information, products, etc. on the web?如何在网络上寻找信息,产品等?

  5. Find without search?无需搜索就能找到? • 1. GUESS • If you know the location on the web (URL)如果知道确定的网络地址(主页、网址) • 2. BROWSE • Use directories 目录指南 • Organize manually using expert “surfers”由“网络冲浪”专家手工编制 • Does not scale. • Manual directories difficult to maintain!但是如规模太大则无法人工维护 • Organize using community of web users由网民社区编制 URL (Universal Resource Locator, e.g., http://www.ceibs.edu) is the address that identifies the web page URL(通用资源定位程序,如http://www.ceibs.edu)指定位网页的地址

  6. Crawl爬行 • Early search engines 早期搜索引擎 • Basic idea 原理 • Crawl through web, following hyperlinks*通过超级链接在网页间中抓取 • Extract words from the page从网站中提取关键字 • Then build index of web 然后建立网页索引 • Match user input (search terms) to the index将关键字与用户输入信息(搜索项)比对 • *Example: <a href="http://www.weigend.com/"> Home of my professor.</a> Hyperlink: Takes you to another page when you click on it 超级链接:在用户点击后将用户带到另一个网页

  7. Relevance (in organic search results)关联度(自然搜索结果) New problem: Relevance 新问题:关联度 • How to rank the pages? What to show on top?搜索结果如何排序?哪些显示在最前面? • What information can be used to help with this decision?哪些信息可以用来做这项决策? • A) Within page同一网页上 • Location of search term on page搜索项在网页上的位置 • Number of occurrences of the search term on page 网页上搜索项出现的频率 • Metatags底标签 • B) Static: Link structure静态的: 链接结构 • E.g., Number of hyperlinks going into page指向某网页的外部超级链接数量 • Leverages other websites利用其它网站的访问情况 • C) Dynamic: Click behavior动态的:点击行为 • Choice within set of links用户如何在一系列链接中进行选择 • Action: Move results up or down 搜索结果上下移动 • Understand overall trajectory (eg for typos) 趋势分析(如错别字) • Q: What information does the user see?问题:用户看到的是哪些信息? • Leverages users利用使用者的点击情况 • Example: google search for “weigend”  例如: 在google上搜索“weigend” • 309,000 results returned for weigend, in 0.2 seconds显示约309,000条关于weigend的结果(仅用0.2秒)

  8. Business models of search (sponsored search etc.)搜索商业模式(付费搜索等) • Search is a necessary competence…搜索是必须的功能 • Has become entry point to everything (or at least key necessity)已成为重要入口(至少是必备) • Customer has become empowered消费者能力增强 • Customer get smarter. Can’t fool them any more – Transparency empowers them, too消费者变得聪明,不能随便愚弄 - 透明度使他们更加聪明 • Power of community社区力量 • Other examples of search其他搜索举例 • Product search产品 • Books书

  9. Search Inside the Book (Amazon.com, 2003)亚马逊图书内容全文搜索(2003)

  10. Search statistics搜索统计 • 1 billion searches per day (2005.1 estimate)每天10亿次搜索(2005年1月估计) • 0.3 billion searches per day (2003.1)每天3亿次搜索(2003年1月) • Search statistics (January 2003)(searchenginewatch.com)搜索统计(2003年1月) • Unique users per month (google, 2003.06)每月用户实际人数(google, 2003.06) • 81.9 million 8,190万 • (Nielsen/NetRatings)

  11. Vertical search垂直搜索 • Many internet businesses are essentially vertical search 许多网络公司本质上是垂直/纵向搜索 • Limitations of horizontal search?横向搜索的局限性? • Complexity of products and services产品与服务的复杂性 • Domain knowledge专业领域知识 • Information often in deep web, not in surface web信息常处于深层网页,而非表层网页 • Travel旅游 • Aggregation: Intermediation and disintermediation信息聚合 / 中介与非中介

  12. Vertical search垂直搜索 • Shopping comparison比价购物 • Initially: Spider sites, e.g., Amazon.com最初:网络蜘蛛,如亚马逊 • What should be Amazon’s response?亚马逊如何反应? • Should they make it hard or easy?制造阻碍还是积极配合? • Now feeds and web services达到双赢 • Business models for shopping comparison engines比价购物搜索引擎的盈利模式

  13. Vertical search垂直搜索 Insurance comparison保险比价 • Market structure: Often through agents, health insurance often through employment市场结构:经常通过代理,健康险经常通过公司 • Essentially an information good本质上是信息产品 • Still a long way to go还有很长路要走

  14. Vertical search垂直搜索 • Cars汽车 • 70% of customer do research on web before going to dealer在进店买之前,70%的人在网上搜索过 • Challenge: Dealer’s don’t think of their business as e-business挑战:经销商不认为他们从事的是电子商务 • Huge advertising budget, need to move to mixed channel marketing巨额的广告预算,需要利用多种渠道做整体市场营销 • Basically, car market can also be seen as vertical search基本上,汽车市场也可以被看作是垂直搜索。

  15. Vertical search垂直搜索 • Real estate房地产 • Large part of the economy – most expensive purchase for most people经济的重大组成部分-对于大多数人来说是最贵的商品 • Market structure: 6% commission市场结构:6%佣金 • But: Real estate also is essentially a information search problem但本质上是信息搜索问题

  16. Vertical search垂直搜索 • Music音乐 • Information sources信息来源 • Human ratings歌迷打分 • Meta data (Composer etc.) Meta数据(作曲家等) • Machine analysis机器分析 • Payment支付 • From buy (Possession) to rent (Subscription) 购买(拥有)还是租借(订阅) • China piracy rate: 92% of consumers are using pirated materials中国盗版率:92%消费者用盗版

  17. Local search本地搜索 • Total market size: $90 billion (CitySearch)总体市场规模:$900亿(CitySearch) • Technology技术 • Know location of use via IP address or registration通过IP地址知道网民位置或注册 • Mobile: LBS (location based services) 收集: LBS(以地区为基础的服务)

  18. People search人物搜索 • Dating and social networking sites网上交友和社会网络公司 • Note: Social networking companies are purely an information play注:社会网络公司是纯粹的信息服务 • Network effects key网络效应是关键 • The product is the customer产品就是客户 • The buyers are the inventory 买家成为存货 • Online dating platforms网络约会平台

  19. Other examples其他例子 • Craigslist • No real-time chat 非实时聊天 • Local markets (San Francisco, New York etc)本地市场(旧金山、纽约等) • Monetization货币化 • Only people who post jobs pay贴招聘广告者付钱 • Genealogy家谱 • Amazing stories of people finding relatives花费大量精力寻根溯源

  20. Personalized search个性化搜索 • Explicit显性的 • “Customization”:用户定制 • User tells interests explicitly用户告诉对何感兴趣 • Implicit隐性的 • Based on user’s past behavior基于你过去的行为 • Needs persistent history需长时间的历史信息 • Problem: Multiple personalities问题:多重个性 • a9, google giving access to entire search history on their platformsa9、google让你访问所有的搜索历史

  21. Relevance is everything关联度最重要 • The Search Paradigm搜索范例 • 2.4 words, a few clicks, and done2.4个字,几次点击,就找到了 • Only possible if results are relevant搜索结果关联度很高时才可能 • Relevance is ‘speed’ 关联度就是“速度” • Time from task initiation to resolution从任务开始执行到完成的时间 • Tmportant factors:重要因素: • Location of useful result 有用搜索结果的位置 • UI Clutter 接口的速度 • Latency 反应时间 • Relevance is relative 关联度是相对的 • Context dependent内容依赖 • E.g. ‘football’ in the UK vs the US例如,“football”在英国与在美国的含义就不同 • Task dependent任务依赖 • E.g. ‘mafia’ when shopping vs researching例如,“mafia”在购物与在研究中的含义也不同

  22. Tune Ranking 可调节的排序 Evaluate Metrics评定标准 Relevance is hard to measure关联度很难测量 • Poorly defined, subjective notion定义不清晰,主观想法 • Depends on task, user context, etc.取决于任务,用户情境等 • Analysts have focused on surrogates that are easier to measure分析时关注更易测度的替代指标 • Index size, traffic, speed索引规模,流量,速度 • anecdotal relevance tests有趣的关联度测试 • e.g. Vanity queries 例如,空内容检索 • Methodology important需要用调查的方法 • Averaged over queries 检索要求平均 • Averaged over users 用户平均 • Development Cycle发展周期

  23. User interface用户接口 • Relevance-ranked result lists 排序搜索结果 • Document summaries are critical文件摘要很重要 • Hit highlighting 加亮提示 • Dynamic abstracts 动态摘要 • Assisted search辅助搜索 • Spell correction拼写校正 • Specialized indices特定索引 • via Tabs通过标签 • Blended results 结果混杂 • Multiple sources多种来源 • Predefined segmentation 预先提炼信息池 • E.g. Paid listing 如付费列示 • Intermixed with results from other sources将结果与其他信息来源混杂 • E.g. News 例如新闻 • Localization本地化 • Country language experience语言组合与识别

  24. Future Trends未来趋势 • Question answering 问题解答 • New contexts 新的领域 • Ubiquitous searching 无处不在的搜索 • Toolbars, desktop, phone 工具栏、桌面、电话等 • Implicit searching 模糊搜索 • Computed links 计算链接 • New tasks 新的任务 • E.g. Local search如本地搜索

  25. Search: Summary搜索:总结 • No longer about filing and organizing, but about searching不再去归档或组织,而是去搜索 • Whether it’s about your email or knowledge in your companies可能是你的电子邮件,可能是你公司的内部信息和知识 • And then about ranking / sorting / relevance排序/索引分类/关联度 • Why does search replace directories / categories?为什么搜索会替代分类目录 • Can be done automatic, in contrast to manual categories自动执行,不同于手工目录

More Related