1 / 43

Efficient Online Information Searching

Efficient Online Information Searching. 251111 Internet And Online Community Week 3. Review. Computer Technologies & The Modern World Evolution of Communication & Technology Telecommunication Input Devices Output Devices Future Technology Context Aware Computing. This Week.

mirnag
Download Presentation

Efficient Online Information Searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Online Information Searching 251111 Internet And Online CommunityWeek 3

  2. Review • Computer Technologies & The Modern World • Evolution of Communication & Technology • Telecommunication • Input Devices • Output Devices • Future Technology • Context Aware Computing

  3. This Week • Efficient Online Information Searching • How do you search for information? • Search Engines • Search Engine Optimisation (SEO)

  4. A Search Engine

  5. Search Documents Query Indexing Matching Relevance /Feedback Document Representations Retrieved Documents

  6. Measuring Search Efficiency • Recall • (a.k.a. Sensitivity) • Fraction of relevant instances retrieved • Precision • (a.k.a. Positive Predictive Value) • Fraction of retrieved instances that are relevant

  7. Recall & Precision (Walber)

  8. Returned REsults • The “Blue” area represents all the relevant articles • The “Orange” area represents other articles that could be returned C A B C A B

  9. Recall • A = Relevant Returned Articles • C = Relevant Unreturned Articles

  10. Precision • A = Relevant Returned Articles • B = Irrelevant Returned Articles

  11. Recall & Precision • Suppose there are 200 relevant articles • A search engine returns 40 articles, of which 25 are relevant… • What is the recall? • What is the precision?

  12. Google Search Refinement • Quotes! “…” • Force Google to look for something • Star Wars I vs Star Wars “I” • Jobs in central LA vs Jobs in central “LA” • - • Stop Google from looking something • Dolphins –football • ~ • “Is Similar to” – look for synonyms • ~inexpensive

  13. Google Search Refinement • OR or “|” • Chiang Mai| Chiangmai • .. • Specify a range • * • Replaces one or more words • google * my life

  14. Google Search Refinement • allintitle: • makes sure the search appears in the title • allintitle: ken cosh • cache: • returns cached copy of page • link: • returns pages that link to the specified page • site: • restrict results to a particular website

  15. Search Engine Market Share • Which Search Engine do you use? • Which is the most popular?

  16. Video Break! • How Search Works • https://www.youtube.com/watch?v=BNHR6IQJGZs

  17. Search Engines • A great source of traffic for your site. • But, how do they decide which sites to display, and which order to display them on their SERPs? • SERPs = Seach Engine Results Pages • Obviously being #1 in Google for a popular search term will bring you lots of traffic.

  18. Ranking Algorithm • We don’t know, but it takes plenty of factors into account; • Page Content • Meta tags • Age • Keyword density • Links • And the algorithm appears to evolve over time.

  19. Google’s Magic • Gone are the days when you can just say what your page is about, now its much more technical… • Much of Google’s magic comes from their patented “PigeonRank” algorithm • https://archive.google.com/pigeonrank/

  20. Pigeon -> PageRank • PageRank is a numeric value that represents how important a page is on the web.

  21. PageRank • Google figures that when one page links to another page, it is effectively casting a vote for the other page. • The more votes that are cast for a page, the more important the page must be. • The importance of the page that is casting the vote determines how important the vote itself is. • Google calculates a page's importance from the votes cast for it. • How important each vote is is taken into account when a page's PageRank is calculated.

  22. PageRank • PageRank is Google's way of deciding a page's importance. • It matters because it is one of the factors that determines a page's ranking in the search results. • It isn't the only factor that Google uses to rank pages, but it is an important one.

  23. Link Farms etc. • Not all links are counted by Google. For instance, they filter out links from known link farms. Some links can cause a site to be penalized by Google. They rightly figure that webmasters cannot control which sites link to their sites, but they can control which sites they link out to. For this reason, links into a site cannot harm the site, but links from a site can be harmful if they link to penalized sites. So be careful which sites you link to. If a site has PR0, it is usually a penalty, and it would be unwise to link to it.

  24. Calculating PageRank • To calculate the PageRank for a page, all of its inbound links are taken into account. These are links from within the site and links from outside the site. • PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn)) • That's the equation that calculates a page's PageRank. It's the original one that was published when PageRank was being developed, and it is probable that Google uses a variation of it but they aren't telling us what it is. It doesn't matter though, as this equation is good enough.

  25. Calculating PageRank • PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn)) • 't1 - tn' are pages linking to page A • 'C' is the number of outbound links that a page has • 'd' is a damping factor, usually set to 0.85.

  26. PageRank simplified • We can think of it in a simpler way:- • a page's PageRank = 0.15 + 0.85 * (a "share" of the PageRank of every page that links to it) • “share” = the linking page’s PageRank divided by the number of outbound links on the page. • A page "votes" an amount of PageRank onto each page that it links to. The amount of PageRank that it has to vote with is a little less than its own PageRank value (its own value * 0.85). This value is shared equally between all the pages that it links to.

  27. PageRank • From this, we could conclude that a link from a page with PR4 and 5 outbound links is worth more than a link from a page with PR8 and 100 outbound links. • The PageRank of a page that links to yours is important but the number of links on that page is also important. • The more links there are on a page, the less PageRank value your page will receive from it.

  28. Or perhaps not… • If the PageRank value differences between PR1, PR2,.....PR10 were equal then that conclusion would hold up, but many people believe that the values between PR1 and PR10 (the maximum) are set on a logarithmic scale, and there is very good reason for believing it. • Nobody outside Google knows for sure one way or the other, but the chances are high that the scale is logarithmic, or similar. • If so, it means that it takes a lot more additional PageRank for a page to move up to the next PageRank level that it did to move up from the previous PageRank level. • The result is that it reverses the previous conclusion, so that a link from a PR8 page that has lots of outbound links is worth more than a link from a PR4 page that has only a few outbound links.

  29. Either way… • Whichever scale Google uses, we can be sure of one thing. A link from another site increases our site's PageRank. Just remember to avoid links from link farms.

  30. SEO • Search Engine Optimisation • Become an important job for website owners

  31. What is SEO? • Search Engine Optimisation • Making webpages more search engine friendly. • SEO should be considered from the start. • Domain Name • Site Structure • Site Design • Site Navigation • Site Topics • Headings • Subheadings • Content • Links • Usability • Accessibility

  32. Why is it important? • 24% of marketers said that >75% of their traffic comes from search engines • 60% of students use search engines to find online retailers • 55% of online purchases were made on sites found through search engines • 80% of users reach sites through search engines • 48% of websites depend on search engines for the majority of their traffic (Various sources)

  33. Why is it important? • Following Search Engine rules. • If your webpage fits the criteria for a certain search term, you’ll get top ranking. • Search Engine Optimisers • Modify webpages to fit the criteria to give a page a better chance of being selected.

  34. Design with SEO in mind • It’s tempting to build a website, and then think about SEO. • Better to design with SEO in mind

  35. Domain Name • Get a domain name that contains your keywords • But make sure it is still memorable… • www.AAA1-Chiang-Mai-Travel-Hotel-Guide-Bookings-Tourist.com • Is not a good domain name!

  36. Website Structure • Usability • It doesn’t matter how good the content is if the site is frustrating to use. • Linkability • Remember the internal linking structure, and its effect on PageRank

  37. Website Design • Flash? • NO! Search Engines rely on keywords to classify pages, while flash is mostly for entertainment. • Search Engines do not index flash files. • HTML • Yes! It’s easy and spiders have no problem indexing it. • But PHP etc. is fine so long as you use search engine friendly urls & links

  38. Webpage Content • Spiders use the content to know where to categorise each page. • A page with no text (flash site) • Where should it be put? • A page with lots of text on lots of topics • Where should it be put? There are too many competing keywords. • The amount of content is also important.

  39. Links • After content, links are the most important thing… • Some would even argue it’s the opposite way around. • PageRank • The link text is just as important as the link. • It is tempting to use an attractive graphical button for the link – but how can the spider associate keywords with the link?

  40. How many keywords? • Keyword Frequency • The number of times a keyword, or phrase, appears within a page. • Keyword Density • The ratio of keywords contained in the page within the number of total indexable words • Perhaps 1-3%

  41. Keyword Density • Is more complicated than that? • Different search engines have different preferences • Different search engines will also calculate a different density for your page; • Stop words? • Word Stemming? • Keywords in particular HTML tags

  42. Keyword Prominence • As well as frequency and density, prominence is also a factor • Words appearing near the beginning of the page, paragraph, sentence. • Certain HTML tags (title)

  43. Keyword Proximity • How close keywords are together could also be a factor. • Consider a search for ‘dog biscuits’ • “We sell delicious biscuits for all breeds of dogs!” • “We sell the most delicious dog biscuits in the world!”

More Related