1 / 6

Easy Web Page Retrieval - Network Programming at Kansas State University

Learn about retrieving web pages in network programming at Kansas State University at Salina. Discover HTTP basics, client-side focus, and application protocols. Understand the use of pre-built modules and how to deal with someone else's code. Explore web-related network programming concepts in a language-independent manner.

jackbrown
Download Presentation

Easy Web Page Retrieval - Network Programming at Kansas State University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Programming Kansas State University at Salina Retrieving Web Pages (HTTP), Topic 3, Chapter 6

  2. First, some comments • Switch to application protocols • Client side focus • Pre-build Modules • A natural OO thing – a matter of productivity • Argh!, someone else’s code • Lots of choices, language independent principles • Web related network programming • Chapter 6 – retrieving web pages – easy • Chapter 7 – Parsing HTML – hard • Chapter 8 – XML and XML-RPC – interesting

  3. HTTP Basics • Stateless, connectionless protocol • Basic GET … import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('www.sal.ksu.edu', 80)) request = """GET /faculty/tim/index.html HTTP/1.0\n From: tim@sal.ksu.edu\n User-Agent: Python\n \n""" s.send(request) fp = open( "index.html", "w" ) while 1: data = s.recv(1024) if not len(data): break fp.write(data) s.close() fp.close()

  4. Now, for the easy way … import sys, urllib2 page = "http://www.sal.ksu.edu/faculty/tim/" req = urllib2.Request(page) fd = urllib2.urlopen(req) while 1: data = fd.read(1024) if not len(data): break sys.stdout.write(data)

  5. Submitting with GET >>> import urllib >>> encoding = urllib.urlencode( [('activity', 'water ski'), \ ('lake', 'Milford'), ('code', 52)] ) >>> print encoding activity=water+ski&lake=Milford&code=52 >>> url = "http://www.example.com" + '?' + encoding >>> print url http://www.example.com?activity=water+ski&lake=Milford&code=52

  6. Submitting with POST >>> encoding = urllib.urlencode( [('activity', 'water ski'),\ ('lake', 'Milford'), ('code', 52)] ) >>> print encoding activity=water+ski&lake=Milford&code=52 >>> import urllib2 >>> req = urllib2.Request(url) >>> fd = urllib2.urlopen("http://www.example.com", encoding)

More Related