1 / 6

Web Page Retrieval: Network Programming Basics

Learn practical methods for retrieving web pages in network programming using HTTP. Explore client-side focus and productivity with pre-built modules. Discover the ease of GET requests and URL encoding.

Download Presentation

Web Page Retrieval: Network Programming Basics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Programming Kansas State University at Salina Retrieving Web Pages (HTTP), Topic 3, Chapter 6

  2. First, some comments • Switch to application protocols • Client side focus • Pre-build Modules • A natural OO thing – a matter of productivity • Argh!, someone else’s code • Lots of choices, language independent principles • Web related network programming • Chapter 6 – retrieving web pages – easy • Chapter 7 – Parsing HTML – hard • Chapter 8 – XML and XML-RPC – interesting

  3. HTTP Basics • Stateless, connectionless protocol • Basic GET … import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('www.sal.ksu.edu', 80)) request = """GET /faculty/tim/index.html HTTP/1.0\n From: tim@sal.ksu.edu\n User-Agent: Python\n \n""" s.send(request) fp = open( "index.html", "w" ) while 1: data = s.recv(1024) if not len(data): break fp.write(data) s.close() fp.close()

  4. Now, for the easy way … import sys, urllib2 page = "http://www.sal.ksu.edu/faculty/tim/" req = urllib2.Request(page) fd = urllib2.urlopen(req) while 1: data = fd.read(1024) if not len(data): break sys.stdout.write(data)

  5. Submitting with GET >>> import urllib >>> encoding = urllib.urlencode( [('activity', 'water ski'), \ ('lake', 'Milford'), ('code', 52)] ) >>> print encoding activity=water+ski&lake=Milford&code=52 >>> url = "http://www.example.com" + '?' + encoding >>> print url http://www.example.com?activity=water+ski&lake=Milford&code=52

  6. Submitting with POST >>> encoding = urllib.urlencode( [('activity', 'water ski'),\ ('lake', 'Milford'), ('code', 52)] ) >>> print encoding activity=water+ski&lake=Milford&code=52 >>> import urllib2 >>> req = urllib2.Request(url) >>> fd = urllib2.urlopen("http://www.example.com", encoding)

More Related