1 / 29

Web Intelligence

Web Intelligence. By Otto Borchert April 28, 2003. Background. Application Layer / HTTP Agents Present - Google / Page Rank Future - Semantic Web / OWL. Hypertext Transfer Protocol (HTTP). Application level protocol (World Wide Web) Runs over TCP, normally port 80

Download Presentation

Web Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Intelligence By Otto Borchert April 28, 2003

  2. Background • Application Layer / HTTP • Agents • Present - Google / Page Rank • Future - Semantic Web / OWL

  3. Hypertext Transfer Protocol (HTTP) • Application level protocol (World Wide Web) • Runs over TCP, normally port 80 • Information retrieved using a URL (Uniform Resource Locator) protocol://host:port • Typical HTTP packet format • START_LINE<CRLF> • MESSAGE_HEADER<CRLF> • <CRLF> • MESSAGE_BODY<CRLF>

  4. Request Messages • Given by client on START_LINE • Includes: • OPTIONS: request information about available options • GET: (one of 2 most commonly used) retrieve document identified in URL • HEAD (other most common used) retrieve metainformation about document identified in URL (find out how old a page is) • POST: give information to server • PUT: store document under specified URL • DELETE: delete specified URL • TRACE: loopback request message • CONNECT: for use by proxies

  5. Example request • GET http://www.cs.ndsu.nodak.edu/index.html HTTP/1.1 • Give entire descriptor in START_LINE • GET index.html HTTP/1.1 Host: www.cs.ndsu.nodak.edu • Precise page given in START_LINE, host in MESSAGE_HEADER

  6. Server reply • Server replies with a Response Message • Contains version of HTTP being used, 3 digit code indicating whether or not the request was successful and the reason for giving that code

  7. Codes • 1xx – Informational (Request received, continuing process) • 2xx – Success (Action successfully received, understood, and accepted) • 3xx – Redirection (further action must be taken to complete the request) • 4xx – Client Error (request contains bad syntax or cannot be fufilled) • 5xx – Server Error (server failed to fulfill an apparently valid request)

  8. Example Replies • HTTP/1.1 202 Accepted • Web page request accepted, displays page • HTTP/1.1 404 Not Found • The usual not found error • HTTP/1.1 301 Moved Permanently • The page has moved, includes a MESSAGE_HEADER like in request to tell where the page has been moved to

  9. HTTP extras • In version 1.0 one TCP connection for each request. 1.1 allowed for persistent connections • HTTP was set up with web caching in mind. One can check the date a page was last updated and store the newest versions of frequently accessed pages on a local machine

  10. Is the web intelligent? • Intelligence is a poorly defined word anyway. For example, would you consider these intelligent? • Document analysis systems for cataloging and summarizing Web pages • Profiling systems for placing selective Web advertising • Data mining and analysis • Tools for searching databases supported by Web browsers • Translation tools that convert to and from human languages • Statistical software for network caching, routing, and tracking • Knowledge-based systems for automated e-mail reading • Smart agents for Internet-based product and service marketing • Video object recognition and searching

  11. Is the web intelligent? (2) • One of the most important advances in making the web intelligent is through the use of agents. • These agents take many forms including many listed on the previous slide

  12. What is an agent? • No standard definition • Can be: • Web Crawler • Travel Agent • Secretary • Hard to distinguish between agent and program. Agent normally performs actions based on data it finds, without much human intervention • Agents can be defined as intelligent as well • Act as the glue for many of the following ideas

  13. The Present of Web Intelligence - Google • Presently the most used search engine the Internet has to offer. • Provides a unique blend of computer hardware and software to complete millions of user searches each day • Based on a system called Page Rank

  14. PageRank • Developed by Larry Page and Sergey Brin at Stanford University (Google’s founders) • Uses a system of link ranking • If there is a link from page A to page B, page B is correlated to page A • If page A is a strong page to begin with, page B becomes stronger as well

  15. Word Association • On top of PageRank, there is also a system of word matching. • Word counts (Do the words exist on the page?) • Proximity checks (Are the words close together?)

  16. Can’t you cheat PageRank? • People try everyday! • Higher search ranking == More exposure • Link Farms • Places where people merely have millions of links to a web page in hopes the target will move higher on the list. • Google’s answer: Page importance. Once link farms are discovered, they are given a negative rank, so if you have a page on a link farm, its rank will go down as well

  17. Another way to cheat • Put lots of words related to your page in your page (even if they are not visible) • Google’s answer: PageRank is primary, cheaters are given lower priority

  18. Moral Decisions • Wired article • Computer screen shows location, query pairs for random searches on Google’s engines. • One search during the late hours on the West Coast was “How to stop a friend from committing suicide” • Can’t do much about it but make sure they get the right information the next time

  19. The Future of Web Intelligence • The Semantic Web

  20. What is the Semantic Web? • As the web presently stands, it is complete nonsense to most software applications. • Two completely different statements • The ball is round • The round ball • The semantic web is a series of protocols meant to enrich the current web with meaning

  21. Series of Protocols • RDF – Resource Description Framework • OWL – Web Ontology Language (extension of RDF)

  22. Resource Description Framework • From World Wide Web Consortium webpage • RDF “defines a mechanism for describing resources that makes no assumptions about a particular application domain, nor defines (a priori) the semantics of any application domain. The definition of the mechanism should be domain neutral, yet the mechanism should be suitable for describing information about any domain“

  23. RDF – Some examples • Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila. • Abstract, conceptual Framework • Concrete syntax using XML

  24. Abstract example • Subject (Resource)  • http://www.w3.org/Home/Lassila    • Predicate (Property)    • Creator   • Object (literal)    • "Ora Lassila“ • Graphic

  25. Concrete syntax • Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila. <rdf:RDF> <rdf:Description about="http://www.w3.org/Home/Lassila"> <s:Creator>Ora Lassila</s:Creator> </rdf:Description> </rdf:RDF>

  26. Web Ontology Language • What is an ontology? • “defines the terms used to describe and represent an area of knowledge” • OWL defines ontologies for use on the web • Actually an extension of RDF

  27. Ontologies • Date and Time • Countries of the World • Wines • Space Shuttle Information

  28. Some example OWL statements <owl:Class rdf:ID="WineGrape"> <rdfs:subClassOf rdf:resource="&food;Grape" /> </owl:Class> <owl:Class rdf:ID="WhiteWine"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Wine" /> <owl:Restriction> <owl:onProperty rdf:resource="#hasColor" /> <owl:hasValue rdf:resource="#White" /> </owl:Restriction> </owl:intersectionOf> </owl:Class>

  29. Conclusion • Web intelligence is a broad new field for exploration • Present efforts like Google can be improved upon with more semantic information • Any questions?

More Related