trends in web search and its relevance to digital libraries
Download
Skip this Video
Download Presentation
Trends in Web Search and its relevance to Digital Libraries

Loading in 2 Seconds...

play fullscreen
1 / 21

Trends in Web Search - PowerPoint PPT Presentation


  • 228 Views
  • Uploaded on

Trends in Web Search and its relevance to Digital Libraries. Min-Yen Kan Web IR NLP Group (WING) National University of Singapore. Tips on Web Searching. Visualize results, then come up with multiple queries Use multiple search engines Advanced Search inurl:, site: “Phrasal search”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Trends in Web Search ' - HarrisCezar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
trends in web search and its relevance to digital libraries

Trends in Web Searchand its relevance to Digital Libraries

Min-Yen Kan

Web IR NLP Group (WING)

National University of Singapore

tips on web searching
Tips on Web Searching
  • Visualize results, then come up with multiple queries
  • Use multiple search engines
  • Advanced Search
    • inurl:, site:
    • “Phrasal search”

But that’s just general search…

  • Federated resources / Niche search engines

26 Sep 2008

site and task specific resources
Site- and Task-specific resources
  • Site Prestige

Know what others think and do

    • Google PageRank (Link structure), Alexa (Traffic)
    • Google Trends / Insight (Queries)
  • Social Searching (Web 2.0)

The voice of the reader / critic

    • (Bookmarks / Tags) Del.icio.us, Citeulike.org, Bibsonomy.org
    • (News) Digg / Slashdot
    • (Blogs) Google Blog, Technorati
  • People Search:

Finding public information on a person

    • Spock (web), Zabasearch (US only)
    • LinkedIn, Facebook
    • Must validate your sources

http://labs.digg.com/arc/

26 Sep 2008

expert search
Expert Search

Find people who will advocate on your behalf

  • What do they want?
  • Scholar:
    • Active? →Check their recent articles
    • Names common? → Define area of interest
    • Compare against peers
    • Download vs. citation counts
  • Patent search:
    • Referenced by: (citation count; different than scholar)
  • Identifying webfaced advocates:
    • Blog search, PageRank

→Impact

http://flickr.com/photos/phauly/

  • How do machines do it?
  • Expert search task as benchmark test
  • Download web pages to analyze
  • Needed to deal with spam pages
  • Used PageRank to assess prestige

26 Sep 2008

problem or opportunity
Revenue from print continually declining

Students and researchers rely on internet

Researchers want archiving rights – freedom of academic information

Characteristics:

Not zero-sum content

Distribution is now largely the role of search engines

→Necessitates new role of publisher and new revenue model

Will classic models work? Advertising, Subscription, Transactional & Bundling

Variants? Versioning (Varian), Moving window (JSTOR)

Problem or opportunity?

The game has fundamentally changed

http://flickr.com/photos/danielbroche/

26 Sep 2008

forecasting

Content is becoming free

MIT / Stanford opening up textbooks

Open access archiving

→ long term: content will not be primary revenue source

eBook revenue hasn’t held up its promise yet…

Device gap: iPhone and nextGen devices

→ Revenue may be further down the pipe

+

Academic publishers

Connect to libraries and federations at institution level

Individual customers are secondary

Trusted source

Expertise in copyediting, typesetting, project management, distribution, social networking

Many individual web publishers rediscovering same problems

→ Consultancy model

→ Win-win partnerships with individual authors

Forecasting

26 Sep 2008

web trends
Social Content

Wisdom of masses: Crowdsourcing

Rich Media

Open Source / Access

Paradigmatic change

Classifieds → Craigslist

POTS →Skype

CD store →iTunes

Publishers → ??

Web Trends

http://www.informationarchitects.jp/slash/iA_WebTrends_2007_2_1024_768.gif

26 Sep 2008

where is research going

Server centric

User centric

Where is research going?
  • Search API usage
  • Browser as computer
  • Web page structure, mining text data
  • Modeling web users at tasks: Exploring / Fact-finding
  • Personalization, recommending
  • Social networks
  • Understanding opinion
  • Query and log analysis

http://flickr.com/photos/alisdair/

26 Sep 2008

webfaced pop quiz which is which

[email protected]

Webfaced pop quiz – which is which?

American Statistical Society

World Scientific

Springer

courtesy:http://pagerank.si/

26 Sep 2008

forecast know your strengths
Get advocates

Make it easy to get individuals to insist to their institution to buy your materials

Know who is accessing (not necessarily buying) your content

Content revenue will continue to decline

Find an economic model that works for you

Work as partners in content creation

Be savvy on trends

Be visible: do “white hat” Search Engine Optimization (SEO)

Make your abstracts indexable by others

+

Academic publishers

Connect to libraries and federations at institution level

Individual customers are secondary

Trusted source

Expertise in copyediting, typesetting, project management, distribution, social networking

Many individual web publishers rediscovering same problems

→ Consultancy model

→ Win-win partnerships with individual authors

Forecast: Know your strengths

26 Sep 2008

trends in digital libraries
Trends in Digital Libraries

>> WING @ NUS

  • Expanding types of information in search
  • Automated tools for DLs
  • Usability in E-books and online media
  • User modeling
  • Personalization, annotation and relation to other user tasks

http://flickr.com/photos/pathfinderlinden

26 Sep 2008

slide12
Scholarly Digital Libraries
  • ForeCite: our scholarly DL
  • Data Cleaning
  • Slide and Document Alignment
  • Searching in the OPAC
  • Math Information Retrieval

26 Sep 2008

forecite beyond the document as an item
ForeCite: Beyond the document as an item

Server

Client

  • A user-centric DL framework
  • Put author / reader functionality together
  • Tagging, correction, annotation and viewing
  • Automatic tools: keyphrases and sentence classification
  • For use on and offline, organizes local PDF files for you
  • Onlyneed your web browser

26 Sep 2008

data cleaning
Addresses

Dongwon Lee, 110 E. Foster Ave. #410, State College, PA, 16802

LEE Dong, 110 East Foster Avenue Apartment 410, Univ. Park, PA 16802-2343

Products

Honda Fix vs. Honda Jazz

Apple iPod Nano 4GB vs. 4GB iPod nano 4GB

Idea: use web as additional context for disambiguation and clustering

Placed 3rd in Web People Search Task (WEPS 2007)

Data Cleaning
  • Search results:
    • “Jeffrey D. Ullman” 384,000 pages
    • “Jeffrey D. Ullman” + “aho” 174,000 pages
    • “J. Ullman” 124,000 pages
    • “J. Ullman” + “aho” 41,000 pages
    • “Shimon Ullman” 27,300 pages
    • “Shimon Ullman” + “aho” 66 pages

45%

33%

0%

26 Sep 2008

slides and their relationship to documents
Slides and their relationship to documents

Document in focus

Slides in Focus

26 Sep 2008

searching in libraries
Searching in Libraries

http://linc.comp.nus.edu.sg

26 Sep 2008

symbolic information search
Symbolic Information Search

How do users want to search math materials?

Our answer: Text-to-Expression Linking

  • Resolve text keywords to expressions
  • e.g., “Pythagorean Theorem”“a2+b2=c2” or “x2+y2=z2”
    • Reduce the need for expression input
    • Solves the notational variation problem

Not quite right…

26 Sep 2008

conclusions
Conclusions
  • Consider us your research WING!
  • Trade data and problems for solutions and interns

Meanwhile:

  • Use better search strategies
  • Practice white hat SEO
  • Identify webfaced advocates

26 Sep 2008

slide19
References
  • Kahin and Varian (2000) Internet Publishing and Beyond
  • Towle et al. (2007) Electronic Books in the 2003-2005 Period, Pub Res Q 23:95-104

Photo Credits

  • Flickr Creative Commons Search

Thanks to all of you for listening

& my fellow WING group members

26 Sep 2008

abstract
Abstract
  • I will present trends in current academic research on web search anddigital libraries, and discuss their relevance to publishers and theireconomic model. With respect to the web, I will cover how searchengines are starting to specialize and use click through and ad datato improve relevance ranking. With respect to digital libraryresearch, I discuss my group\'s research at NUS on advancing thestate-of-the-art in scholarly digital libraries. I cover advances onhow we deal with data cleaning issues, and slide and equationretrieval and alignment.

26 Sep 2008

ad