1 / 93

URLs, InetAddresses, and URLConnections

URLs, InetAddresses, and URLConnections. High Level Network Programming Elliotte Rusty Harold elharo@metalab.unc.edu http://metalab.unc.edu/javafaq/slides/. We will learn how Java handles. Internet Addresses URLs CGI URLConnection Content and Protocol handlers. I assume you.

shelby
Download Presentation

URLs, InetAddresses, and URLConnections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. URLs, InetAddresses, and URLConnections High Level Network Programming Elliotte Rusty Harold elharo@metalab.unc.edu http://metalab.unc.edu/javafaq/slides/

  2. We will learn how Java handles • Internet Addresses • URLs • CGI • URLConnection • Content and Protocol handlers

  3. I assume you • Understand basic Java syntax and I/O • Have a user’s view of the Internet • No prior network programming experience

  4. Applet Network Security Restrictions • Applets may: • send data to the code base • receive data from the code base • Applets may not: • send data to hosts other than the code base • receive data from hosts other than the code base

  5. Some Background • Hosts • Internet Addresses • Ports • Protocols

  6. Hosts • Devices connected to the Internet are called hosts • Most hosts are computers, but hosts also include routers, printers, fax machines, soda machines, bat houses, etc.

  7. Internet addresses • Every host on the Internet is identified by a unique, four-byte Internet Protocol (IP) address. • This is written in dotted quadformat like 199.1.32.90 where each byte is an unsigned integer between 0 and 255. • There are about four billion unique IP addresses, but they aren’t very efficiently allocated

  8. Domain Name System (DNS) • Numeric addresses are mapped to names like "www.blackstar.com" or "star.blackstar.com" by DNS. • Each site runs domain name server software that translates names to IP addresses and vice versa • DNS is a distributed system

  9. The InetAddress Class • The java.net.InetAddress class represents an IP address. • It converts numeric addresses to host names and host names to numeric addresses. • It is used by other network classes like Socket and ServerSocket to identify hosts

  10. Creating InetAddresses • There are no public InetAddress() constructors. Arbitrary addresses may not be created. • All addresses that are created must be checked with DNS

  11. The getByName() factory method public static InetAddress getByName(String host) throws UnknownHostException InetAddress utopia, duke; try { utopia = InetAddress.getByName("utopia.poly.edu"); duke = InetAddress.getByName("128.238.2.92"); } catch (UnknownHostException e) { System.err.println(e); }

  12. Other ways to create InetAddress objects public static InetAddress[] getAllByName(String host) throws UnknownHostException public static InetAddress getLocalHost() throws UnknownHostException

  13. Getter Methods • public boolean isMulticastAddress() • public String getHostName() • public byte[] getAddress() • public String getHostAddress()

  14. Utility Methods • public int hashCode() • public boolean equals(Object o) • public String toString()

  15. Ports • In general a host has only one Internet address • This address is subdivided into 65,536 ports • Ports are logical abstractions that allow one host to communicate simultaneously with many other hosts • Many services run on well-known ports. For example, http tends to run on port 80

  16. Protocols • A protocol defines how two hosts talk to each other. • The daytime protocol, RFC 867, specifies an ASCII representation for the time that's legible to humans. • The time protocol, RFC 868, specifies a binary representation, for the time that's legible to computers. • There are thousands of protocols, standard and non-standard

  17. IETF RFCs • Requests For Comment • Document how much of the Internet works • Various status levels from obsolete to required to informational • TCP/IP, telnet, SMTP, MIME, HTTP, and more • http://www.faqs.org/rfc/

  18. W3C Standards • IETF is based on “rough consensus and running code” • W3C tries to run ahead of implementation • IETF is an informal organization open to participation by anyone • W3C is a vendor consortium open only to companies

  19. W3C Standards • HTTP • HTML • XML • RDF • MathML • SMIL • P3P

  20. URLs • A URL, short for "Uniform Resource Locator", is a way to unambiguously identify the location of a resource on the Internet.

  21. Example URLs http://java.sun.com/ file:///Macintosh%20HD/Java/Docs/JDK%201.1.1%20docs/api/java.net.InetAddress.html#_top_ http://www.macintouch.com:80/newsrecent.shtml ftp://ftp.info.apple.com/pub/ mailto:elharo@metalab.unc.edu telnet://utopia.poly.edu ftp://mp3:mp3@138.247.121.61:21000/c%3a/stuff/mp3/ http://elharo@java.oreilly.com/ http://metalab.unc.edu/nywc/comps.phtml?category=Choral+Works

  22. The Pieces of a URL • the protocol, aka scheme • the authority • user info • user name • password • host name or address • port • the path, aka file • the ref, aka section or anchor • the query string

  23. The java.net.URL class • A URL object represents a URL. • The URL class contains methods to • create new URLs • parse the different parts of a URL • get an input stream from a URL so you can read data from a server • get content from the server as a Java object

  24. Content and Protocol Handlers • Content and protocol handlers separate the data being downloaded from the the protocol used to download it. • The protocol handler negotiates with the server and parses any headers. It gives the content handler only the actual data of the requested resource. • The content handler translates those bytes into a Java object like an InputStream or ImageProducer.

  25. Finding Protocol Handlers • When the virtual machine creates a URL object, it looks for a protocol handler that understands the protocol part of the URL such as "http" or "mailto". • If no such handler is found, the constructor throws a MalformedURLException.

  26. Supported Protocols • The exact protocols that Java supports vary from implementation to implementation though http and file are supported pretty much everywhere. Sun's JDK 1.1 understands ten: • file • ftp • gopher • http • mailto • appletresource • doc • netdoc • systemresource • verbatim

  27. URL Constructors • There are four (six in 1.2) constructors in the java.net.URL class. public URL(String u) throws MalformedURLException public URL(String protocol, String host, String file) throws MalformedURLException public URL(String protocol, String host, int port, String file) throws MalformedURLException public URL(URL context, String url) throws MalformedURLException public URL(String protocol, String host, int port, String file, URLStreamHandler handler) throws MalformedURLException public URL(URL context, String url, URLStreamHandler handler) throws MalformedURLException

  28. Constructing URL Objects • An absolute URL like http://www.poly.edu/fall97/grad.html#cs try { URL u = new URL("http://www.poly.edu/fall97/grad.html#cs"); } catch (MalformedURLException e) {}

  29. Constructing URL Objects in Pieces • You can also construct the URL by passing its pieces to the constructor, like this: • URL u = null; • try { • u = new URL("http", "www.poly.edu", "/schedule/fall97/bgrad.html#cs"); • } • catch (MalformedURLException e) {}

  30. Including the Port • URL u = null; • try { • u = new URL("http", "www.poly.edu", 8000, "/fall97/grad.html#cs"); • } • catch (MalformedURLException e) {}

  31. Relative URLs • Many HTML files contain relative URLs. • Consider the page http://metalab.unc.edu/javafaq/index.html • On this page a link to “books.html" refers to http://metalab.unc.edu/javafaq/books.html.

  32. Constructing Relative URLs • The fourth constructor creates URLs relative to a given URL. For example, try { URL u1 = new URL("http://metalab.unc.edu/index.html"); URL u2 = new URL(u1, ”books.html"); } catch (MalformedURLException e) {} • This is particularly useful when parsing HTML.

  33. Parsing URLs • The java.net.URL class has five methods to split a URL into its component parts. These are: public String getProtocol() public String getHost() public int getPort() public String getFile() public String getRef()

  34. For example, • try { • URL u = new URL("http://www.poly.edu/fall97/grad.html#cs "); • System.out.println("The protocol is " + u.getProtocol()); • System.out.println("The host is " + u.getHost()); • System.out.println("The port is " + u.getPort()); • System.out.println("The file is " + u.getFile()); • System.out.println("The anchor is " + u.getRef()); • } • catch (MalformedURLException e) { }

  35. Parsing URLs • JDK 1.3 adds three more: public String getAuthority() public String getUserInfo() public String getQuery()

  36. Missing Pieces • If a port is not explicitly specified in the URL it's set to -1. This means the default port is to be used. • If the ref doesn't exist, it's just null, so watch out for NullPointerExceptions. Better yet, test to see that it's non-null before using it. • If the file is left off completely, e.g. http://java.sun.com, then it's set to "/".

  37. Reading Data from a URL • The openStream() method connects to the server specified in the URL and returns an InputStream object fed by the data from that connection. • public final InputStream openStream() throws IOException • Any headers that precede the actual data are stripped off before the stream is opened. • Network connections are less reliable and slower than files. Buffer with a BufferedReader or a BufferedInputStream.

  38. Webcat import java.net.*; import java.io.*; public class Webcat { public static void main(String[] args) { for (int i = 0; i < args.length; i++) { try { URL u = new URL(args[i]); InputStream in = u.openStream(); InputStreamReader isr = new InputStreamReader(in); BufferedReader br = new BufferedReader(isr); String theLine; while ((theLine = br.readLine()) != null) { System.out.println(theLine); } } catch (IOException e) { System.err.println(e);} } } }

  39. The Bug in readLine() • What readLine() does: • Sees a carriage return, waits to see if next character is a line feed before returning • What readLine() should do: • Sees a carriage return, return, throw away next character if it's a linefeed

  40. Webcat import java.net.*; import java.io.*; public class Webcat { public static void main(String[] args) { for (int i = 0; i < args.length; i++) { try { URL u = new URL(args[i]); InputStream in = u.openStream(); InputStreamReader isr = new InputStreamReader(in); char c; while ((c = br.read()) != -1) { System.out.print(c); } } catch (IOException e) { System.err.println(e);} } } }

  41. CGI • Common Gateway Interface • A lot is written about writing server side CGI. I’m going to show you client side CGI. • We’ll need to explore HTTP a little deeper to do this

  42. Normal web surfing uses these two steps: • The browser requests a page • The server sends the page • Data flows primarily from the server to the client.

  43. Forms • There are times when the server needs to get data from the client rather than the other way around. The common way to do this is with a form like this one:

  44. CGI • The user types the requested data into the form and hits the submit button. • The client browser then sends the data to the server using the Common Gateway Interface, CGI for short. • CGI uses the HTTP protocol to transmit the data, either as part of the query string or as separate data following the MIME header.

  45. GET and POST • When the data is sent as a query string included with the file request, this is called CGI GET. • When the data is sent as data attached to the request following the MIME header, this is called CGI POST

  46. HTTP • Web browsers communicate with web servers through a standard protocol known as HTTP, an acronym for HyperText Transfer Protocol. • This protocol defines • how a browser requests a file from a web server • how a browser sends additional data along with the request (e.g. the data formats it can accept), • how the server sends data back to the client • response codes

  47. A Typical HTTP Connection • Client opens a socket to port 80 on the server. • Client sends a GET request including the name and path of the file it wants and the version of the HTTP protocol it supports. • The client sends a MIME header. • The client sends a blank line. • The server sends a MIME header • The server sends the data in the file. • The server closes the connection.

  48. What the client sends to the server GET /javafaq/images/cup.gif Connection: Keep-Alive User-Agent: Mozilla/3.01 (Macintosh; I; PPC) Host: www.oreilly.com:80 Accept: image/gif, image/x-xbitmap, image/jpeg, */*

  49. MIME • MIME is an acronym for "Multipurpose Internet Mail Extensions". • an Internet standard defined in RFCs 2045 through 2049 • originally intended for use with email messages, but has been been adopted for use in HTTP.

  50. Browser Request MIME Header • When the browser sends a request to a web server, it also sends a MIME header. • MIME headers contain name-value pairs, essentially a name followed by a colon and a space, followed by a value. Connection: Keep-Alive User-Agent: Mozilla/3.01 (Macintosh; I; PPC) Host: www.digitalthink.com:80 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*

More Related