best practices for search n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Best Practices for Search PowerPoint Presentation
Download Presentation
Best Practices for Search

Loading in 2 Seconds...

play fullscreen
1 / 109

Best Practices for Search - PowerPoint PPT Presentation


  • 187 Views
  • Uploaded on

Best Practices for Search. for the Federal Government Marti Hearst Web Manager University November 10, 2009. The Importance of Search for Govt. OMB memorandum, Dec 2005: “When disseminating information to the public-at-large, publish your information directly to the internet.”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Best Practices for Search


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
best practices for search

Best Practices for Search

for the Federal Government

Marti Hearst

Web Manager University

November 10, 2009

the importance of search for govt
The Importance of Search for Govt
  • OMB memorandum, Dec 2005:

“When disseminating information to the public-at-large, publish your information directly to the internet.”

  • Pres. Obama’s memorandum, Jan 21, 2009:

“Information maintained by the Federal Government is a national asset. My Administration will take appropriate action, consistent with law and policy, to disclose information rapidly in forms that the public can readily find and use. ”

a bit about me
A bit about me
  • Professor at the School of Information at University of California, Berkeley.
    • Teach masters students
    • User Interface Design, Search Engines, Computational Linguistics, Visualization.
  • Search User Interfaces
  • Visiting government for 1 year
    • Updating usasearch.gov
    • Looking at site search alternatives.
    • Generally kibbitzing
two focus areas
Two Focus Areas
  • Web search engines
    • The quality and form of your content
    • How your results are viewed in search engine listings
    • How your site is crawled
  • Site search
    • The search interface
    • What is crawled
    • How results are presented
outline
Outline
  • Designing your site for effective search
  • Site search interfaces
  • Special considerations for web search engines
  • An example of what not to do.
slide6
both content and tech people
  • have to be focused on it together
  • mention why using (free) book example
  • add an exercise towards the start
  • top 3 things to do right away
  • don’t force h1n1, be sure swine flu too
  • seo integrated into process
use proven interface techniques
Use Proven Interface Techniques
  • Use modern search UI ideas that are known to have good usability.
  • Apply the principle: recognition over recall.
    • Related query suggestions
    • Auto-suggest as the user types
    • Use faceted navigation where appropriate.
search as you type sayt
Search-as-you-Type (SAYT)
  • As the user types, shows other peoples’ queries with the same word stems.
  • Helps people think of additional words
    • (recognition over recall)
  • Proven to improve search results.
evidence based decision making
Evidence-based Decision Making

DATA TRUMPS INTUITIONS

(Kohavi)

use evidence based decision making
Use Evidence-based Decision Making

User behavior determines if an idea is retained.

A/B testing is a standard way to do this.

  • Make small changes to an interface.
  • Show the changed interface to a significant sample of the user population, show everyone else the original version.
  • Do this over time (~ two weeks) and for (tens of) thousands of users.
  • Compare what the two groups do over time.
  • Based on this, decide whether to keep or reject the feature.
evidence based decision making1
Evidence-based Decision Making
  • Example:
    • Dan Siroker on Obama for America’s website and video design decisions
    • Easy to measure the outcome: it is in money donated.
    • http://www.siroker.com/archives/2009/05/14/obama_lessons_learned_talk_at_google.html
vote which button is best
Vote: Which Button is Best?

count down counter

ease of use summary
Ease of Use: Summary

USE PROVEN UI TECHNIQUES

REDUCE EXTRA STEPS

USE CLEAR LANGUAGE

MAKE EVIDENCE-BASED UI DECISIONS

how search engines work
How Search Engines Work

Three main parts:

  • Gather the contents of all web pages (crawling)
  • Organize the contents of the pages in a way that allows efficient retrieval (indexing)
  • Take in a query, determine which pages match, and show the results (ranking and display of results)
standard web search engine architecture1
Standard Web Search Engine Architecture

Check for duplicates,

store the

documents

DocIds

crawl the

web

Crawler

machines

Create an

inverted

index

Inverted

index

Search

engine

servers

standard web search engine architecture2
Standard Web Search Engine Architecture

Check for duplicates,

store the

documents

DocIds

crawl the

web

Crawler

machines

Create an

inverted

index

user

query

Inverted

index

Search

engine

servers

Show results

To user

spiders or crawlers
Spiders or crawlers
  • How to find web pages to visit and copy?
    • Can start with a list of domain names, visit the home pages there.
    • Look at the hyperlink on the home page, and follow those links to more pages.
    • Keep a list of urls visited, and those still to be visited.
    • Each time the program loads in a new HTML page, add the links in that page to the list to be crawled.
spider behaviour varies
Spider behaviour varies
  • Parts of a web page that are indexed
  • How deeply a site is indexed
  • Types of files indexed
  • How frequently the site is spidered
four laws of crawling
Four Laws of Crawling
  • A Crawler must show identification
  • A Crawler must obey the robots exclusion standard

http://www.robotstxt.org/wc/norobots.html

  • A Crawler must not hog resources
  • A Crawler must report errors
lots of tricky aspects
Lots of tricky aspects
  • Servers are often down or slow
  • Hyperlinks can get the crawler into cycles
  • Some websites have junk in the web pages
  • Now many pages have dynamic content
  • The web is HUGE
the internet is enormous
The Internet Is Enormous

Image from http://www.nature.com/nature/webmatters/tomog/tomfigs/fig1.html

freshness
“Freshness”
  • Need to keep checking pages
    • Pages change
      • At different frequencies
      • Pages are removed
    • Many search engines cache the pages (store a copy on their own servers)
what really gets crawled
What really gets crawled?
  • A small fraction of the Web that search engines know about; no search engine is exhaustive
  • Not the “live” Web, but the search engine’s index
  • Notthe “Deep Web”
  • Mostly HTML pages but other file types too: PDF, Word, PPT, etc.
ii index the database
ii. Index (the database)

Record information about each page

  • List of words
    • In the title?
    • How far down in the page?
    • Was the word in boldface?
  • URLs of pages pointing to this one
  • Anchor text on pages pointing to this one
inverted index
Inverted Index
  • How to store the words for fast lookup
  • Basic steps:
    • Make a “dictionary” of all the words in all of the web pages
    • For each word, list all the documents it occurs in.
    • Often omit very common words
      • “stop words”
    • Sometimes stem the words
      • (also called morphological analysis)
      • cats -> cat
      • running -> run
inverted index example
Inverted Index Example

Image from http://developer.apple.com

/documentation/UserExperience/Conceptual/SearchKitConcepts/searchKit_basics/chapter_2_section_2.html

inverted index1
Inverted Index
  • In reality, this index is HUGE
  • Need to store the contents across many machines
  • Need to do optimization tricks to make lookup fast.
query serving architecture

“travel”

Load Balancer

“travel”

FE1

FE2

FE8

“travel”

QI1

QI2

QI8

“travel”

“travel”

Node1,1

Node1,2

Node1,3

Node1,N

Node2,1

Node2,2

Node2,3

Node2,N

Node3,1

Node3,2

Node3,3

Node3,N

Node4,1

Node4,2

Node4,3

Node4,N

Query Serving Architecture
  • Index divided into segments each served by a node
  • Each row of nodes replicated for query load
  • Query integrator distributes query and merges results
  • Front end creates a HTML page with the query results
iii results ranking
iii. Results ranking
  • Search engine receives a query, then
  • Looks up the words in the index, retrieves many documents, then
  • Rank orders the pages and extracts “snippets” or summaries containing query words.
    • Most web search engines assume the user wants all of the words
  • These are complex and highly guarded algorithms unique to each search engine.
some ranking criteria
Some ranking criteria
  • For a given candidate result page, use:
    • Number of matching query words in the page
    • Proximity of matching words to one another
    • Location of terms within the page
    • Location of terms within tags e.g. <title>, <h1>, link text, body text
    • Anchor text on pages pointing to this one
    • Frequency of terms on the page and in general
    • Link analysis of which pages point to this one
    • (Sometimes) Click-through analysis: how often the page is clicked on
    • How “fresh” is the page
  • Complex formulae combine these together.
measuring importance of linking
Measuring Importance of Linking
  • PageRank Algorithm
    • Idea: important pages are pointed to by other important pages
    • Method:
      • Each link from one page to another is counted as a “vote” for the destination page
      • But the importance of the starting page also influences the importance of the destination page.
      • And those pages scores, in turn, depend on those linking to them.

Image and explanation from http://www.economist.com/science/tq/displayStory.cfm?story_id=3172188

making web sites attractive to search engines
Making Web Sites Attractive to Search Engines
  • Called “Search Engine Optimization” (SEO)
  • There is a LOT of information about this on the web
    • Most is about how to improve your site
    • Some is about “cheating”; avoid this
  • There are many tools to help you too.
the most important principle

The Most Important Principle:

Good, unique content trumps everything else.

content is key
Content is Key
  • Web sites that are primarily high-quality, unique content will be ranked highly.
    • Not just links to other content
    • Not re-packaging of other content
  • Example:
    • My online book was top ranked for “search user interfaces” within one day of site launch.
    • It is also top ranked for many related queries.
web site characteristics
Web Site Characteristics
  • These can lead to high search engine ranking (but no guarantees):
    • High-quality, unique content.
    • Linked to by high-quality sites.
    • Been around a long time with consistent content.
keyword placement
Keyword Placement
  • Search engines place “weight” on words according to where they are used
  • Place important words in
    • Title tags
    • Headings (H1 is key) and emphasized text
    • Visible body text
    • Description metadata – often used in search results snippets.
    • Alt text in images
keyword variation
Keyword Variation
  • Describe the same concepts using different words within the relevant pages.
    • Compare “search interfaces” with “search user interfaces”in the next slide.
    • 1 hit versus 4 hits in the top 6
    • I need to use more variation for the key concepts
  • But it must make sense in your page;
    • Don’t hide dictionaries of words !
    • Can include them in the description metadata.
the importance of urls
The Importance of URLs
  • Meaningful, short urls improve search engine ranking and usability
  • Urls that consist of computer-generated database queries can hurt rankings.
  • Urls with lots of redirects also hurt.
the importance of titles
The Importance of Titles
  • The title tag determines what words show up in the search results title.
    • Make them descriptive of the site
    • Vary them to differentiate them.
  • Example (next page)
    • Consistently varies the title to show how they differ.
    • But makes a mistake in the metadata description by putting the part that varies too far from the start, so it all looks the same.
robots exclusion
Robots Exclusion
  • It is important to check your robots.txt files to be sure they are allowing crawling.
  • If your server can’t handle a lot of traffic, use the site map file to slow crawlers down.
site maps
Site Maps
  • There are two kinds of site maps:
    • A navigation structure visible to users
    • An XML file visible only to search engines
      • The latter is important to help ensure the pages on your site are crawled.
      • You can also specify the frequency with which you hope the pages will be crawled.
      • There are free tools to help you do this.
examples of what not to do

Examples of What Not To Do

For both site design and SEO.

Or … don’t mess with my dog!

what happens when you type

What happens when you type

http://recalls.gov

?

the lesson make your url web address easy to find

The lesson: make your url (web address) easy to find

There should at least be a redirect from recalls.gov to www.recalls.gov

Also, the url should match its description in the site title field!

the point the search entry form should be highly visible and in a standard position

The point: the search entry form should be highly visible and in a standard position.

Usually wide and centered towards the top or else shorter and on the upper right.

the point do not make the user guess how your information is structured

The point: do not make the user guess how your information is structured.

There should be one search engine for all government recall information.

the point do not require users to fill out structured search forms

The point: do not require users to fill out structured search forms.

This can be an option but should not be required.

Showing categories with previews of how many hits are associate with each is better than lots of entry forms.

the point use standard layout unless there is a good reason not to

The point: use standard layout (unless there is a good reason not to)

This site puts too much text at the top before showing search results.

Also, searchers frequently modify their query

It is standard to show the search form with the previous query at the top.

the point do promote commonly requested information to the top of the results

The point: do promote commonly requested information to the top of the results.

This site uses “best bets” to promote popular content to the top; the user finds what they want.

the point use descriptive titles

The point: use descriptive titles.

It is important to put the distinguishing information first so the repeated part does not dominate. For example:

Home page: Recalls.gov

Recent Recalls

Food Safety Recalls

Automotive Recalls

what happens if i search for car recalls at major search engines1

What happens if I search for car recalls at major search engines?

Answer: I don’t see recalls.gov

the point use words that your users use

The point: use words that your users use.

Notice that the main page for cars at recalls.gov does not appear towards the top. The word “car” does not play an important role on relevant page.

search engine information
Search Engine Information
  • SEO
    • http://www.ninebyblue.com/
  • Keep current with industry
    • http://www.searchengineland.com
    • http://battellemedia.com
  • Search Interface Principles
    • http://searchuserinterfaces.com
  • Search Design Patterns (Peter Morville)
    • http://www.flickr.com/photos/morville/collections/72157603785835882/
faceted navigation

Faceted Navigation

For Structured Web Site Search

the idea of facets
The Idea of Facets
  • Facets are a way of labeling data
    • A kind of Metadata (data about data)
    • Can be thought of as properties of items
  • Facets vs. Categories
    • Items are placed INTO a category system
    • Multiple facet labels are ASSIGNED TO items
the idea of facets1
The Idea of Facets
  • Create INDEPENDENT categories (facets)
    • Each facet has labels (sometimes arranged in a hierarchy)
  • Assign labels from the facets to every item
    • Example: recipe collection

Ingredient

Cooking

Method

Chicken

Stir-fry

Bell Pepper

Curry

Course

Cuisine

Main Course

Thai

the idea of facets2
The Idea of Facets
  • Break out all the important concepts into their own facets
  • Sometimes the facets are hierarchical
    • Assign labels to items from any level of the hierarchy

Preparation Method

Fry

Saute

Boil

Bake

Broil

Freeze

Desserts

Cakes

Cookies

Dairy

Ice Cream

Sorbet

Flan

Fruits

Cherries

Berries

Blueberries

Strawberries

Bananas

Pineapple

using facets
Using Facets
  • Now there are multiple ways to get to each item

Preparation Method

Fry

Saute

Boil

Bake

Broil

Freeze

Desserts

Cakes

Cookies

Dairy

Ice Cream

Sherbet

Flan

Fruits

Cherries

Berries

Blueberries

Strawberries

Bananas

Pineapple

Fruit > Pineapple

Dessert > Cake

Preparation > Bake

Dessert > Dairy > Sherbet

Fruit > Berries > Strawberries

Preparation > Freeze

advantages of faceted navigation
Advantages of Faceted Navigation
  • Systematically integrates search results:
    • reflect the structure of the info architecture
    • retain the context of previous interactions
  • Gives users control and flexibility
    • Over order of metadata use
    • Over when to navigate vs. when to search
example medicare prescription drug plan scam
Example: Medicare Prescription Drug Plan Scam

If you have folders, have to place the item into multiple folders:

Health

Elderly

Safety

Drugs

Fraud

alternative assign stickers to the item medicare prescription drug plan scam
Alternative: assign stickers to the item:Medicare Prescription Drug Plan Scam

Assign categories to the item, rather than put the item into categories

Health

Elderly

Drugs

Safety

Physicians

Scams

faceted navigation1
Faceted Navigation
  • User can start with any category, and see the results grouped by the other categories.
  • Example:
    • Start with Health
      • See results grouped by subcategories of Health, such as Drugs, Nutrition
    • Alternatively, user can group results by other categories:
      • Click on Financial, see Insurance, Payments, etc
      • Click on Teens, see results relevant to teens
best practices for search1

Best Practices for Search

Thank you!

Marti Hearst

Web Manager University

November 10, 2009

ease of use
Ease of Use

REDUCE STEPS