1 / 20

COMPSCI 101 S1 2014 Principles of Programming

COMPSCI 101 S1 2014 Principles of Programming. 33 Web programming. Learning outcomes. At the end of this lecture, students should be able to: use Python libraries to access and process data from the Web Examples and Exercises: Example 1: Opening a URL Case Study 1: Word Count

diata
Download Presentation

COMPSCI 101 S1 2014 Principles of Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMPSCI 101 S1 2014Principles of Programming 33 Web programming

  2. Learning outcomes • At the end of this lecture, students should be able to: • use Python libraries to access and process data from the Web • Examples and Exercises: • Example 1: Opening a URL • Case Study 1: Word Count • Case Study 2: Downloading Files • Case Study 3: Working on the headers COMPSCI101

  3. Internet: A collection of networks • The Internet is a network of networks. • If you put a device in your home so that your computers can talk to one another, you have a network. • A wireless base station, or an Ethernet router, perhaps. • You can probably reach printers on your network, or copy files between computers. • If you now connect your network (through an Internet Service Provider (ISP)) to the global Internet, your network becomes yet another part of the whole Internet. COMPSCI101

  4. The World Wide Web • Tim Berners-Lee wanted a way to create readable documents that could reference material on the Internet in a hypertext format. • It is a set of agreements, started by Tim Berners-Lee • On how to refer to everything on the Internet: The URL (Uniform Resource Locator) • On how to create documents that refer to things all over the Internet: HTTP (HyperText Transfer Protocol) • On how those documents will be formatted: Using HTML (HyperText Markup Language) COMPSCI101

  5. HyperText Transfer Protocol (HTTP) • HTTP defines a very simple protocol for how to exchange information between computers. • It defines the pieces of the communication. • What resource do you want? • Where is it? • Okay, here’s the type of thing it is (JPEG, HTML, whatever), and here it is. • It is a set of rules to allow browsers to retrieve web documents from servers over the Internet COMPSCI101

  6. Uniform Resource Locators (URL) • URLs allow us to reference any material anywhere on the Internet. • Address used for any web resource • URLs have four parts: • The protocol to use to reach this resource: http • The domain name of the computer where the resource is, • Name of a host computer (domain name) • The path on the computer to the resource, • courses/compsci101s1c/ • And the name of the resource. http:///www.cs.auckland.ac.nz/en.html Protocol Filename Domain name COMPSCI101

  7. Terms • Web Site • A collection of Web pages related to a single topic or theme. Normally designed and maintained by a single individual or organization • Web Page • A hypermedia document designed for the WWW • Web Browser • Software used to access information on the World Wide Web • Sends requests to a web server • Client (Internet Explorer or Firefox or Safari …) • They know how to interpret HTML and display it graphically. • Web Server • Software that makes local files available through the web • Fulfils requests from a web browser • Server COMPSCI101

  8. Accessing a web page • Client (Web Browser) runs on the local machine • User requests a web page Browser Web page Requested COMPSCI101

  9. Accessing a web page • Web server runs on the destination machine • Request sent to destination domain • Web server accepts the request and finds the web page Web Server Browser Web page Requested COMPSCI101

  10. Accessing a web page • Web page is sent from the server to the client • Client (web browser) displays the page Web Server Browser Web page Requested COMPSCI101

  11. Using urllib in Python • Python has modules that allow you to use these protocols. • In Python, we can read any URL as if it was a file. • The urllib.request module defines functions and classes which help in opening URLs (mostly HTTP) • Add an import statement to your .py file import urllib.request COMPSCI101

  12. Example 1: Opening a URL and reading it • The urlopen() function: • Opens the URL url, which can be either a string or a Request object. • Creates a file-like object that allows you to read the identified resource def viewpage(url): con = urllib.request.urlopen(url) contents = con.read() print (len(contents)) viewpage("http://www.cs.auckland.ac.nz/courses/compsci101s1c") 23488 COMPSCI101

  13. The Info() • The info() function • returns the meta-information of the page, such as headers, • The geturl() function • returns the URL of the resource retrieved print (con.info()) print (con.geturl()) Server: Apache … Content-Type: text/html; charset=UTF-8 Content-Length: 23488 Accept-Ranges: bytes Date: Mon, 26 May 2014 00:35:46 GMT … https://www.cs.auckland.ac.nz/courses/compsci101s1c/ COMPSCI101

  14. Encoding • Note that urlopen returns a bytes object. This is because there is no way for urlopen to automatically determine the encoding of the byte stream it receives from the http server • Use ‘utf-8’ for decoding the bytes object. viewpage("https://www.cs.auckland.ac.nz/courses/compsci101s1c/lectures/words.txt") Byte format b'The woods are lovely dark and deep\r\nBut … print (con.read().decode('utf-8')) The woods are lovely dark and deep … COMPSCI101

  15. Case Study 1Word Count Revisit • Task: • Complete the following program which reads a web page, counts the frequency of each word in the page using a dictionary, and prints the dictionary url = "https://www.cs.auckland.ac.nz/courses/compsci101s1c/lectures/words.txt" con = urllib.request.urlopen(url) contents = con.read().decode('utf-8') ... {'keep': 1, 'promises': 1, 'And': 2, 'sleep': 2, 'But': 1, 'before': 2, 'have': 1, 'to': 3, 'The': 1, 'and': 1, 'dark': 1, 'I': 3, 'miles': 2, 'go': 2, 'deep': 1, 'are': 1, 'lovely': 1, 'woods': 1} COMPSCI101

  16. Case Study 1 Word Count Revisit • Algorithm: COMPSCI101

  17. Case Study 2Downloading Files • Task: • Complete the get_files()function which takes a url and a list of filenames as parameters and downloads the list of files into your current working directory file_list = ["words.txt", "sample.txt"] url = "http://www.cs.auckland.ac.nz/courses/compsci101s1c/lectures/" get_files(url, file_list) COMPSCI101

  18. Case Study 1 Downloading Files • Algorithm COMPSCI101

  19. Case Study 3Working on the Headers • Task: • Complete the get_headers() function which reads the headers (string) of a web page, and returns a dictionary containing all headers {'Strict-Transport-Security': 'max-age=31536000', 'Age': '0', 'Server': 'Apache', 'Vary': 'Accept-Encoding', 'X-Webroute-Cache': 'MISS', 'Date': 'Wed, 28 May 2014 00:45:48 GMT', …'} url = "https://www.cs.auckland.ac.nz/courses/compsci101s1c/" con = urllib.request.urlopen(url) print(get_headers(con.info())) COMPSCI101

  20. Case Study 3Working on the Headers • Algorithm COMPSCI101

More Related