slide1 n.
Skip this Video
Download Presentation
Socket Options

Loading in 2 Seconds...

play fullscreen
1 / 56

Socket Options - PowerPoint PPT Presentation

  • Uploaded on

Socket Options Four functions define the API for ‘get’ and ‘setting’ socket options. We will cover those aspects which address generic, IPv4, TCP UDP and Raw Socket options. getsockopt and setsockopt functions.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Socket Options' - thais

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Socket Options
    • Four functions define the API for ‘get’ and ‘setting’ socket options. We will cover those aspects which address generic, IPv4, TCP UDP and Raw Socket options.
  • getsockopt and setsockopt functions.

int getsockopt (int sockfd, int level, int optname, void *optval, socklen_t *optlen)

  • sockfd must refer to an OPEN socket descriptor
  • level specifies the interpretation option (generic, IPv4, IPv6).
  • optval is a ptr to a var from which the new value of the option is stored by getsockopt (value-result argument).
  • getsockopt is effectively a query; in an OO language a ‘get’ is typically an operation to retrieve a reference to a class and its methods.
Socket Options

int setsockopt (int sockfd, int level, int optname, void *optval, socklen_t *optlen)

  • optval is a pointer to the datatype for each option (typically an int or a struct).
  • Options are basically a mechanism whereby certain communication features either in the kernel or in IPv4, or TCP may be enabled / disabled / modified / queried.
  • Two primary option datatypes:
    • Binary options used to enable or disable features (flag settings)
    • Return options that retrieve and return specific values for either examination or modification.
Socket Options
  • Assignment for ALL (assignment #2).

Write a ‘c’ program to define all of the options that are available on your particular system (page 180 sockopt/checkopts.c)

email your results to the T/A Mr. Venkatarhagavan by 9/27.

  • Not all implementations support all socket options. Therefore we must (should) employ the #ifdef #else #endif compilation directive of lines 34 to 39 of Figure 7.2.
  • This compiler directive must (should) be used on EVERY socket option.
Socket Options
    • Some socket options must be set prior to the completion of an accept (connected socket not returned until completion) this means they must be set on the LISTENING socket.
    • Some socket options are inherited from the listening socket (SO_DEBUG, SO_DONTROUNT, SO_KEEPALIVE, SO_LINGER, SO_OOBINLINE, SO_RCVBUF, SO_SNDBUF).
    • Generic socket options; they are handled not by protocol code but rather by the kernel (but some may apply to only certain types of protocols).
Socket Options
  • SO_BROADCAST socket option
    • enables/disables ability to send broadcast messages (only UDP sockets and only on 802.3, 802.4, 802.5) (which is most of the world).
  • SO_DEBUG option (TCP only).
    • When enabled the kernel keeps track of information about all segments sent or received by that socket.
  • SO_DONTROUTE option
    • This option, when enabled, causes IP to bypass the routing mechanisms of the underlying protocol (ethernet, etc).
    • Often used by routing daemons (routed and gated) to bypass the routing table and force a packet to be sent out a particular interface.
    • Avoid enabling this option. Will, in most cases, cause a connection failure.
Socket Options
  • SO_ERROR option
    • This option can be queried but cannot be set. A very important option.
    • When a socket error occurs the kernel sets so_error to one of the standard Exxx values. (pending error for that socket).
      • The process affected can be notified if the proc is blocked on select (select returns with appropriate read / write condition set).
      • Or if signal driven I/O is being employed SIGIO signal is generated.
    • In either case the value of so_error can be obtained by querying the SO_ERROR socket option.
    • After this query the value of so_error is reset by the kernel.
    • On a read or write error a -1 is returned and so_error is set to the value of errno, which is then reset to zero.
Socket Options
  • SO_KEEPALIVE socket option.
    • Generally cannot employ at the user level.
    • When set for a TCP socket, if no data exchanged for 2 hours, a keepalive probe is sent to the peer.
      • Either peer responds with ACK
      • Peer responds with RST, and errno is ECONNRESET (socket is closed)
      • If no response to the keepalive probe, BSD’s will send 8 more probes. If no response to the keepalive probes then errno is set to ETIMEDOUT and the socket is closed.
      • If an ICMP error is received in response to the probes then error is returned and errno is EHOSTUNREACH.
Socket Options


  • Most common question regarding SO_KEEPALIVE is whether the two hour inactive period can be modified.
    • Generally NOT. This is a kernel issue and not a socket issue. If changed then it may end up being changed for ALL sockets.
  • SO_KEEPALIVE is designed to detect a peer host crash, not a process crash (process crash will result in a FIN being sent).
  • SO_KEEPALIVE is generally used by servers to prevent de facto half-open connections.
  • Rlogin and Telnet servers use SO_KEEPALIVE to terminate the connection on hang-up or power down.
  • FTP handles the timeout within the application.
Socket Options

TCP is sending - Peer process crashes:

Then peer sends FIN, detected using select.

TCP is sending - Peer host crashes:

Our TCP will timeout and errno = ETIMEDOUT.

TCP is sending - peer host becomes unreachable:

Our TCP will timeout and the pending error is set to EHOSTUNREACH

TCP is receiving - Peer process crashes:

peer will send a FIN (will read as EOF).

TCP is receiving - peer host crashes:

will stop receiving data and will go to inactive period (SO_KEEPALIVE is active then probes after 2 hours).

TCP is receiving - peer host is unreachable

will stop receiving data.

Socket Options

Connection is idle - SO_KEEPALIVE is set. Peer process crashes.

Peer TCP sends FIN, detect using select.

Connection is idle - SO_KEEPALIVE is set. Peer host crashes.

Keepalive probes are sent after 2 hours. If no answer then errno is = ETIMEDOUT.

Connection is idle - SO_KEEPALIVE is set. Peer host is unreachable.

Keepalive probes are sent after 2 hours. If no answer then errno (so_error?) is set to EHOSTUNREACH.

Connection is idle - SO_KEEPALIVE is not set.

If peer proc crashes then a FIN is sent by peer.

If peer host crashes then we are lost.

If peer becomes unreachable then we are lost.

Socket Options
  • SO_LINGER socket option.
    • Default on a close() is to return immediately but the kernel will attempt to send any data in the socket send buffer.
    • SO_LINGER allows modification of the default behavior for a close().
    • SO_LINGER requires the passing of the linger struct to the kernel.

struct linger {

int l_onoff;

int l_linger // Posix specs in seconds.

    • If l_onoff non-zero and l_linger = 0, TCP aborts connection on close and discards send buffer and sends a RST (no 4 way termination).
    • Avoids TIME-WAIT state but this can cause problems if no segment sequence numbers being used.
Socket Options
  • SO_LINGER option (continued).
    • If l_onoff nonzero and l_linger nonzero then on close the kernel will first attempt to send any data in the send buffer until the linger time expires. (meanwhile process is blocked).
    • If linger time expires before all data is sent then EWOULDBLOCK is returned and data in send buffer is lost.
    • The ‘correct’ way for a client to terminate a connection is to set the SO_LINGER option for some finite time and to use shutdown in conjunction with this option.
      • This way the client can KNOW that the server has read any transferred data.
      • The problem is that close() returns immediately (or can linger until the ACK of its FIN).
      • But shutdown(), followed by a read() waits until the peers FIN is received.
Steven’s suggests an application layer ACK; to make this work there has to be an agreed upon data send size or an agreed upon end-of-record marker.
  • My commentary is that this is NOT a sound engineering approach as it violates the OSI stack model; it mixes the responsibilities and methods of the application and network layers.
  • The approach of application layer ‘hacks’ (or acks) reflects the old-school approach to Unix coding. Do whatever you have to do without regard to later consequences.
  • The result of this approach is hard to understand, difficult to modify software. Though it does create many jobs.
Some comments on buffer sizes; should be at least three times the MSS (most systems 8192 bytes for a send/receive buffer versus a typical MSS of 512 (some 1460)).
  • Buffer size can be a problem on systems with large MTU’s (some ATMs have MTU’s as large as 4096+ bytes).
  • Recommended that the user not employ the SO_RCVBUF or the SO_SNDBUF socket options. Unless serious network programming is being done.
  • Most important concept is the relationship between a full-duplex pipe and the socket buffer sizes on the machines using the pipe.
SO_RCVBUF and SO_SNDBUF socket options
    • Receive buffers are used to hold received data until it is read by the application.
    • The available room in the receive buffer is the window that TCP advertises to the other end.
      • Therefore it cannot overflow
      • This is the core of TCP flow control.
      • If a peer ignores the window and sends data beyond it both TCP and UDP discard the data.
    • SO_RCVBUF and SO_SNDBUF allow the buffer sizes to be modified.
      • Modern TCP systems have buffers between 8k and 64k bytes.
      • Modern UDP systems have 9k byte send buffers and about 40k byte receive buffers.
Socket Options
    • SO_RCVBUF must be called prior to a connect() call.
    • For a server the receiver socket options must be called prior to a listen() call.
    • TCP socket buffers should be an even multiple of the MSS.
      • Multiply the bandwidth in bits/sec times the RTT and then convert the bits to bytes.
      • RTT can be determined with a ping.
      • Example: T1 with a bandwidth of 1.536 Mb/s and a RTT of 60 mS gives a bandwidth-delay of 11.520k Bytes.
      • If the socket buffers are less than this then the pipe will not stay full and performance will be less than optimal.
      • Large buffers are required with fast bandwidth or long RTT.
      • When the bandwidth-delay > TCPs Max Window (64k B) size then a long fat pipe options are used.
Socket Options
      • Allow the changing of the low-water marks.
      • Receive low water mark is the amount of data that must be in the socket receive bffer for select() to return readable.
        • Defaults to 1 for TCP and UDP sockets.
      • The send low water mark is the amount of data that must be in the socket send buffer for select() to return writable.
        • Normally defaults to 2048 for TCP
      • With UDP the low-water mark is used but since the number of bytes of available space in the UDP send buffer never changes, as long as the UDP socket send buffer size is > than the low water mark the UDP socket is writable.
        • UDP does NOT keep a copy of the datagrams sent by the application.
    • Allow the setting of timeouts on socket receives and sends.
    • Argument is a pointer to a timeval struct identical to the one used with select().
      • Disable a timeout by setting the struct value to zero.
      • Both timeouts are disabled by default.
    • Posix 1.g does not require support for these options.
    • SO_REUSEADDR option.
      • allows a listening server to start and bind() its well known port even if previously established connections exist that use this port as their local port.
class 8 cse 7348 5348
Class 8 - CSE 7348/5348
  • Goals for Class 8
    • Understand hierarchical IP addressing
    • Subnetting
    • Gateways.
    • Anagram? Abendego? Or abendego, or abnormal end to ego. Carolyn Meinel?
class 8 cse 7348 53481
Class 8 - CSE 7348/5348
  • Five classes of addresses in the original RFC-950
  • Network Working GroupRequest for Comments: 950 J. Mogul (Stanford)J. Postel (ISI)August 1985
  • Page 1
        • Internet Standard Subnetting Procedure
  • Status Of This Memo
    • This RFC specifies a protocol for the ARPA-Internet community. If subnetting is implemented it is strongly recommended that these procedures be followed. Distribution of this memo is unlimited.
class 8 cse 7348 53482
Class 8 - CSE 7348/5348

0 network ID(7 bits)host ID(24 bits)

Class A

10 network ID (14 bits) host ID (16 bits)

Class B

110 network ID (21 bits) host ID (8 bits)

Class C

1110 multicast group 28 bits

Class D

11110 reserved for future use (27 bits)

Class E

class 8 cse 7348 53483
Class 8 - CSE 7348/5348
  • Most IP addresses are now ‘classless’.
    • What is assigned is a 32 bit network address and a corresponding 32 bit mask.
    • Bits of 1 in the mask cover the NETWORK address.
    • Bits of 0 in the mask cover the HOST address.
    • Bits of 1 in the mask are ALWAYS CONTIGUOUS FROM THE LEFT.
    • Bits of 0 in the mask are ALWAYS CONTIGUOUS FROM THE RIGHT.
    • This allows the mask to be specified as a PREFIX length, i.e. a class A address can be specified as having a PREFIX length of 8.
class 8 cse 7348 53484
Class 8 - CSE 7348/5348
  • The advantage of a classless address space is that we are no longer restricted by using fixed PREFIX lengths.
    • IPV4 addresses are sometimes written a dotted decimal number followed by a slash, followed by the prefix length.
    • Using classless addresses requires classless routing are what is called CIDR (RFC 1519).
    • The purpose of CIDR was to reduce the size of the Internet routing tables to reduce the rate of IPV4 address depletion.
    • IPv4 addresses are normally SUBNETTED.
class 8 cse 7348 53485
Class 8 - CSE 7348/5348
    • Extends addressing Hierarchy
      • network ID (assigned to site)
      • subnet ID (chosen by site)
      • host ID (chosen by site)
    • The boundary between the subnet ID and the network ID is fixed by the PREFIX length of the assigned NETWORK id.
      • The prefix length is normally assigned by the ISP.
    • The boundary between the subnet ID and the host ID is chosen by the site.
    • All hosts on a given subnet share a common subnet MASK.
    • The subnet MASK specifies the boundary between the subnet ID and the host ID.
class 8 cse 7348 53486
Class 8 - CSE 7348/5348
  • Subnetting (continued)
    • Example
    • - an entire class C network.
    • The user then divides the remaining 8 bits into a 3 bit subnet ID and a 5 bit host ID.
    • The subnet mask is therefore (0xFE for the last octet).
    • So how does this allow the more efficient use of the available address space?
class 8 cse 7348 53487
Class 8 - CSE 7348/5348
  • Gateways
    • From RFC 950
  • 2.2 Changes to Host Software to Support Subnets
    • In most implementations of IP, there is code in the module that handles outgoing datagrams to decide if a datagram can be sent directly to the destination on the local network or if it must be sent to a gateway.
    • Generally the code is something like this:
  • IF ip_net_number(dg.ip_dest) = ip_net_number(my_ip_addr)
  • THEN
  • send_dg_locally(dg, dg.ip_dest)
  • ELSE
  • send_dg_locally(dg,
  • gateway_to(ip_net_number(dg.ip_dest)))
class 8 cse 7348 53488
Class 8 - CSE 7348/5348
  • Gateways
  • To support subnets, it is necessary to store one more 32-bit quantity, called my_ip_mask. This is a bit-mask with bits set in the fields corresponding to the IP network number, and additional bits set corresponding to the subnet number field. (or the subnet MASK).
  • The code then becomes:

IF bitwise_and(dg.ip_dest, my_ip_mask)

= bitwise_and(my_ip_addr, my_ip_mask)


send_dg_locally(dg, dg.ip_dest)



gateway_to(bitwise_and(dg.ip_dest, my_ip_mask)))

class 8 cse 7348 53489
Class 8 - CSE 7348/5348
  • RFC 1812; the ‘mother of all RFC’s’. Defines the addressing, connections and routing of the IP layer.
  • F. Baker, Editor Cisco Systems, June 1995
  • 2.2.5 Addressing Architecture
    • An IP datagram carries 32-bit source and destination addresses, each of which is partitioned into two parts - a constituent network prefix and a host number on that network. Symbolically:
  • IP-address ::= { <Network-prefix>, <Host-number> }
    • To finally deliver the datagram, the last router in its path must map the Host-number (or rest) part of an IP address to the host's Link Layer address.
class 8 cse 7348 534810
Class 8 - CSE 7348/5348
  • RFC 1812 (continued)
  • This simple notion (the notion of class based addressing) has been extended by the concept of subnets. These were introduced to allow arbitrary complexity of interconnected LAN structures within an organization, while insulating the Internet system against explosive growth in assigned network prefixes and routing complexity. Subnets provide a multi-level hierarchical routing structure for the Internet system. The subnet extension, described in [INTERNET:2], is a required part of the Internet architecture. The basic idea is to partition the <Host-number> field into two parts: a subnet number, and a true host number on that subnet:
  • IP-address ::=
  • { <Network-number>, <Subnet-number>, <Host-number> }
class 8 cse 7348 534811
Class 8 - CSE 7348/5348
  • RFC 1812

The interconnected physical networks within an organization use the same network prefix but different subnet numbers. The distinction between the subnets of such a subnetted network is not normally visible outside of that network. Thus, routing in the rest of the Internet uses only the <Network-prefix> part of the IP destination address. Routers outside the network treat <Network-prefix> and

<Host-number> together as an uninterpreted rest part of the 32-bit IP

address. Within the subnetted network, the routers use the extended

network prefix:

{ <Network-number>, <Subnet-number> }

class 8 cse 7348 534812
Class 8 - CSE 7348/5348
  • RFC 1812
  • The bit positions containing this extended network number have historically been indicated by a 32-bit mask called the subnet mask. The <Subnet-number> bits SHOULD be contiguous and fall between the

<Network-number> and the <Host-number> fields. More up to date protocols do not refer to a subnet mask, but to a prefix length; the "prefix" portion of an address is that which would be selected by a

subnet mask whose most significant bits are all ones and the rest are zeroes. The length of the prefix equals the number of ones in the subnet mask. This document assumes that all subnet masks are

expressible as prefix lengths.

class 8 cse 7348 534813
Class 8 - CSE 7348/5348
  • RFC 1812
  • Classless Inter Domain Routing (CIDR)
    • The explosive growth of the Internet has forced a review of address assignment policies. The traditional uses of general purpose (Class A, B, and C) networks have been modified to achieve better use of IP's 32-bit address space. Classless Inter Domain Routing (CIDR) [INTERNET:15] is a method currently being deployed in the Internet backbones to achieve this added efficiency. CIDR depends on deploying and routing to arbitrarily sized networks. In this model, hosts and routers make no assumptions about the use of addressing in the internet. The Class D (IP Multicast) and Class E (Experimental) address spaces are preserved, although this is primarily an assignment policy.
    • By definition, CIDR comprises three elements:
    • topologically significant address assignment,
    • routing protocols that are capable of aggregating network layer reachability information, and
    • consistent forwarding algorithm ("longest match").
class 8 cse 7348 534814
Class 8 - CSE 7348/5348
  • RFC-1812
  • 2.2.6 IP Multicasting
    • IP multicasting is an extension of Link Layer multicast to IP internets. Using IP multicasts, a single datagram can be addressed to multiple hosts without sending it to all. In the extended case, these hosts may reside in different address domains. This collection of hosts is called a multicast group. Each multicast group is represented as a Class D IP address. An IP datagram sent to the group is to be delivered to each group member with the same best- effort delivery as that provided for unicast IP traffic. The sender of the datagram does not itself need to be a member of the destination group.
    • The semantics of IP multicast group membership are defined in [INTERNET:4]. That document describes how hosts and routers join and leave multicast groups. It also defines a protocol, the Internet Group Management Protocol (IGMP), that monitors IP multicast group membership.
class 8 cse 7348 534815
Class 8 - CSE 7348/5348
  • RFC 1812
  • Forwarding of IP multicast datagrams is accomplished either through static routing information or via a multicast routing protocol. Devices that forward IP multicast datagrams are called multicast routers. They may or may not also forward IP unicasts. Multicast datagrams are forwarded on the basis of both their source and destination addresses. Forwarding of IP multicast packets is described in more detail in Section [5.2.1]. Appendix D discusses multicast routing protocols.
  • Embedded Routers
    • A router may be a stand-alone computer system, dedicated to its IP router functions. Alternatively, it is possible to embed router functions within a host operating system that supports connections to two or more networks. The best-known example of an operating system with embedded router code is the Berkeley BSD system. The embedded router feature seems to make building a network easy, but it has a number of hidden pitfalls:
class 8 cse 7348 534816
Class 8 - CSE 7348/5348
      • Classless Inter-Domain Routing (CIDR):
  • Classless Inter-Domain Routing (CIDR)
  • Conceptually collapses a block of contiguous Class C addresses into a single routing table entry.
  • A routing table entry consists of: (Network Address, Count)
  • (Network Address, Count) = 1 table entry:
  • Network Address - smallest network address in the block.
  • Count - total number of network addresses in the block.
  • Entry (, 3) = [ | |]
  • CIDR Does not restrict network numbers to Class C addresses
  • CIDR Does not use an integer count for the block size.
  • CIDR Requires each block of addresses to be a power of two
  • CIDR Uses bit masking to identify the block size.
class 8 cse 7348 534817
Class 8 - CSE 7348/5348
      • Coordination of Address Allocation:
  • To equally distribute the remaining available Class C addresses, they have been broken up into address groups. Each continent receives a block of addresses to be administered by a continent-level authority. Sub-authorities are delegated to disperse addresses within each country. All addresses were divided so that each address group differs by two numbers in the first octet.
class 8 cse 7348 534818
Class 8 - CSE 7348/5348
  • Address allocation
class 8 cse 7348 534819
Class 8 - CSE 7348/5348
      • CIDR Address Range Values:
  • Two values are required to specify the range:
    • Lowest address
    • 32-bit mask (which operates like a subnet mask)
  • Lowest - First address in the range
  • Highest - Last valid address in range, usually broadcast.
  • Mask - Similar to subnet masks
  • Example: (2048 contiguous addresses starting at
  • lowest:
    • (
  • Highest:
    • (
  • CIDR Mask:
    • (
class 8 cse 7348 534820
Class 8 - CSE 7348/5348
      • CIDR Router Functionality:
  • Routers at a site with classless addresses must be changed to correctly route datagrams.
  • Address Class Interpretation within the router must be disabled.
  • To determine the correct destination, each entry in the routing table (pair of address and mask) and the routing software use a "longest-match" paradigm to select a route.
  • A given block of addresses can be subdivided and separate routes can be setup for each subdivision.
  • All nodes on a given network will be assigned addresses from the same fixed range.
  • Hosts and routers that use supernetting need unconventional routing software that understands ranges of addresses.
class 8 cse 7348 534821
Class 8 - CSE 7348/5348
  • Subnet interpretation can be chosen independently for each physical network. The standard specifies that a site using subnet addressing must choose a 32-bit subnet mask for each network.
  • It is recommended that sites use contiguous subnet masks and that all physical networks sharing the same IP address, use the same subnet mask.
class 8 cse 7348 534822
Class 8 - CSE 7348/5348
      • Subnet Addressing and Routing:
  • This is the most accepted IP extension method to date. It has been standardized by the IAB.
  • Subnet addressing involves dividing the HOSTID part of an IP address into two sub-parts that identify:
  • A physical network (usually within an autonomous system)
  • A host on that network.
class 8 cse 7348 534823
Class 8 - CSE 7348/5348
  • More on Subnets
  • Subnet interpretation can be chosen independently for each physical network. The standard specifies that a site using subnet addressing must choose a 32-bit subnet mask for each network. It is recommended that sites use contiguous subnet masks and that all physical networks sharing the same IP address, use the same subnet mask.
class 8 cse 7348 534824
Class 8 - CSE 7348/5348
  • Subnet routing
  • Subnet Routing:Uses a modified IP routing algorithm that includes subnet masks as well as NETID and Next-hop addresses.
  • A subnet routing table entry is made of{subnet mask, network address, next-hop address}
class 8 cse 7348 534825
Class 8 - CSE 7348/5348
  • Subnet10 -> looking ahead
  • Anyone building a huge corporate intranet knows that life would be a lot simpler if the InterNIC would simply fork over one of those big Class A IP network addresses, the kind that supports 16 million hosts or thousands of subnetworks.
  • Barring that near-miracle, however, there's always Plan B: Create a huge corporate intranet by subdividing a special Class A address space--network smaller subnetworks, thereby protecting corporate information assets while providing the flexibility of Internet access and address management and security.
  • These special network IP addresses are the ones designated in the Internet's RFCs (Request for Comment documents) for use as private networks--those TCP/IP networks not connected to the Internet. (Of course, you can use these addresses and still have full Internet connectivity.)
class 8 cse 7348 534826
Class 8 - CSE 7348/5348
  • Not only can you use as much address space as you need (actually, smaller networks may find special Class B or C addresses adequate), the Subnet 10 strategy lets you build protected intranets, isolated behind firewalls and proxy servers, and manage the networks' IP address space any way you like.
  • The Subnet 10 strategy also allows you to create a large-scale template for a comprehensive set of Internet and intranet services. It frees you from the restraints of configuring and managing a number of small Class C networks.
  • And even though it's a large-scale design capable of supporting a campuswide or regionwide network, it can be scaled down to individual offices, small networks or a tightly controlled intranet structure.
class 8 cse 7348 534827
Class 8 - CSE 7348/5348
  • The key to building a Subnet 10 is that unlike networks with unique, Internet service provider- or InterNIC-assigned IP addresses, you must keep every host, router or workstation that uses these special addresses hidden from the Internet behind a firewall and a proxy host. They'll still be able to reach the Internet through the proxy, but you'll be free to tailor the internal network addresses any way you like.
  • There also must be a second, external network that is directly accessible from the Internet. Hosts that are publicly available--World Wide Web sites, an anonymous FTP host and the DNS (Domain Naming System) host, for example--reside on the external network.
class 8 cse 7348 534828
Class 8 - CSE 7348/5348
  • The hosts on the internal, protected network have a unique, special identity. Their signature is those special Class A, B or C network addresses that mark them as members of the private network. Their special network addresses help to protect the intranet from intruders. An intruder will find it difficult to target a private network host by forging a private network address, because any IP datagrams bearing the special addresses will be discarded by external routers.
  • Behind the firewall and the proxy, intranet users can communicate freely with each other. When users venture out onto the Internet, they connect to the proxy, which establishes the connection for them.
class 8 cse 7348 534829
Class 8 - CSE 7348/5348
  • The design has three elements: the inner network, the proxy hosts (sometimes called "bastion" hosts) and the outer network.
  • All user workstations, servers, E-mail hosts and a DNS server that knows about the inner network machines go on the inner network or its subnets. E-mail hosts or post offices are usually hidden from Internet E-mail systems by an SMTP gateway and a mail host on the outside network, which is visible to the Internet.
  • The proxy hosts sit on the border between the inner and outer networks. They're the line of defense that protects the hosts on the inner network. Proxies on the border connect to both the inner and the outer networks, so they know about both worlds, as well as what each should know about the other.
class 8 cse 7348 534830
Class 8 - CSE 7348/5348
  • The outside network is the public address space--a separate network that is not part of Subnet 10. It has a real, registered and fully routable InterNIC or ISP-assigned network address. The outer network only has a few hosts, so a Class C address will do.
  • The outer network contains the hosts that Internet wanderers can see, contact and (since we live in dangerous times) attempt to hack. These include the organization's Web site, its anonymous FTP host, its E-mail gateway and the external DNS. The outer network portions of the proxy servers also reside on this network.
  • The DNS only knows the identities of the hosts on the outside network, along with the outside network identities of the proxies. The DNS on the inner network points to the external network DNS to resolve Internet host names.
class 8 cse 7348 534831
Class 8 - CSE 7348/5348
  • Network Address Translation (NAT) is a vitally important Internet technology for a variety of reasons. It can provide load balancing for parallel processing, it can provide several types of strong access security, and it can provide fault-tolerance and high-availability. Finally, it can simplify some basic network administration functions. Below, we sketch the possible uses, and then follow up with Linux-specific applications.
  • RFC 1631
    • RFC 1631 (alt) describes the "traditional" NAT (Network Address Translation) that can be used for this kind of a task. Basically, the idea behind NAT is to re-write the IP headers and substitute one numeric address for another. This document discusses some basic implementation issues, such as computing header checksums, and mentions problems with packet encryption, and ICMP. It does not discuss load-balancing or masquerading issues.
class 8 cse 7348 534832
Class 8 - CSE 7348/5348
  • Masquerading
    • One variation of NAT, called masquerading, is already available in stock Linux kernels. The theory, tools and installation procedure are discussed in the IP Masquerade mini-HOWTO. Masquerading is designed to provide security. It is intended for use as a type of a firewall, hiding many hosts behind one IP address, and relabeling all packets from behind the firewall so that they appear to be coming from on location, the firewall itself.
    • IP Masq is very powerful and flexible in this respect, and the filter & accounting rules can configured to handle complex network topologies. However, it does not currently support the inverse operation of distributing incoming packets to multiple servers.
class 8 cse 7348 534833
Class 8 - CSE 7348/5348
  • Since NAT gateways operate on IP packet-level, most of them have built-in internetwork routing capability. The internetwork they are serving can be divided into several separate sub networks (either using different backbones or sharing the same backbone) which further simplifies network administration and allows more computers to be connected to the network.
  • NAT and Proxies
  • A proxy is any device that acts on behalf of another. The term is most often used to denote Web proxying. A Web proxy acts as a "half-way" Web server: network clients make requests to the proxy, which then makes requests on their behalf to the appropriate Web server. Proxy technology is often seen as an alternative way to provide shared access to a single Internet connection. The main benefits of Web proxying are:
class 8 cse 7348 534834
Class 8 - CSE 7348/5348
        • Local caching: a proxy can store frequently-accessed pages on its local hard disk; when these pages are requested, it can serve them from its local files instead of having to download the data from a remote Web server. Proxies that perform caching are often called caching proxy servers.
        • Network bandwidth conservation: if more than one client requests the same page, the proxy can make one request only to a remote server and distribute the received data to all waiting clients.
  • Both these benefits only become apparent in situations where multiple clients are very likely to access the same sites and so share the same data.
  • Unlike NAT, Web proxying is not a transparent operation: it must be explicitly supported by its clients. Due to early adoption of Web proxying, most browsers, including Internet Explorer and Netscape Communicator, have built-in support for proxies, but this must normally be configured on each client machine, and may be changed by the naive or malicious user.
class 8 cse 7348 534835
Class 8 - CSE 7348/5348
  • The basic purpose of NAT is to multiplex traffic from the internal network and present it to the Internet as if it was coming from a single computer having only one IP address.
  • A modern NAT gateway must change the Source address on every outgoing packet to be its single public address. It therefore also renumbers the Source Ports to be unique, so that it can keep track of each client connection. The NAT gateway uses a port mapping table to remember how it renumbered the ports for each client's outgoing packets. The port mapping table relates the client's real local IP address and source port plus its translated source port number to a destination address and port. The NAT gateway can therefore reverse the process for returning packets and route them back to the correct clients.
class 8 cse 7348 534836
Class 8 - CSE 7348/5348
  • When any remote server responds to an NAT client, incoming packets arriving at the NAT gateway will all have the same Destination address, but the destination Port number will be the unique Source Port number that was assigned by the NAT. The NAT gateway looks in its port mapping table to determine which "real" client address and port number a packet is destined for, and replaces these numbers before passing the packet on to the local client.
  • This process is completely dynamic. When a packet is received from an internal client, NAT looks for the matching source address and port in the port mapping table. If the entry is not found, a new one is created, and a new mapping port allocated to the client:
        • Incoming packet received on non-NAT port
        • Look for source address, port in the mapping table
        • If found, replace source port with previously allocated mapping port
        • If not found, allocate a new mapping port
        • Replace source address with NAT address, source port with mapping port