Socket i o
1 / 50

Socket I/O - PowerPoint PPT Presentation

  • Uploaded on

Socket I/O. 2005. 6. 8 백 일 우 [email protected] Concept. Code Introduction Socket Buffer Write, writev, sendto, sendmsg Sendit Function Sosend Function Read, readv, recvfrom, recvmsg Recvmsg systemp call Recvit Function Soreceive Function. Code Intro. Socket Buffers.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Socket I/O' - fadey

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Socket i o

Socket I/O

2005. 6. 8

백 일 우

[email protected]


  • Code Introduction

  • Socket Buffer

  • Write, writev, sendto, sendmsg

  • Sendit Function

  • Sosend Function

  • Read, readv, recvfrom, recvmsg

  • Recvmsg systemp call

  • Recvit Function

  • Soreceive Function

Socket buffers
Socket Buffers

  • Each socket has an associated send & receive buffer

sb_cc : total number of data bytes

sb_hiwat, sb_lowat : socket flow control algorithm

sb_mbcnt : total amount of memory allocated to the mbufs in the buffer

sb_cnt: total number amount of memory allocated to the mbufs

sb_mbmax : Upper bound on the amount of memory to be allocated as mbufs for each socket buffer

sb_mb : points to the first mbuf in the chain

sb_timeo : measured in clock ticks and limits the time a process blocks during read/write

Sb flags

> Default socket buffer limits for the Internet protocols

Socket macros and functions
Socket Macros and functions

  • They handle buffer locking and synchronization

Socket macros and functions cont
Socket Macros and functions_cont’

  • For socket buffer allocation and manipulation

Write writev sendto and sendmsg
Write, writev, sendto, and Sendmsg

  • All the write system calls, directly or indirectly, call sosend

    • Copy data from the process to the kernel and pass data to protocol associated with the socket


  • Writing from multiple buffers is called gathering

  • Analogous read operation is called scattering

  • In a gather operation, the kernel accepts data from each buffer specified in an array of iovec structures

  • Without this type of interface,.

    • Should Copy buffers into a single larger buffer

    • Should make write system calls to send data from multiple buffers

    • Above of all INEFFICENT, so iovec

Iov_base : points to the start of a buffer of iov_len bytes


  • iovec arguments to writev

  • Datagram protocols require a destination address

    • Write, writev and send don’t accept explicit address, so Called only after a destination has been associated with a connectionless socket by calling connect

    • A Destination must be provided with sendto or sendmsg() or connect must have been previously called

iovp : points to the first element of the array

iovcnt : the size of the array


  • Only sendmsg call supports control info

    • Control info and several argu to sendmsg are specified within a msghdr

Should be declared as a pointer to a sockaddr structure, since it contains a network address

Control infomation

1. Control message is formatted as a cmsmsg

2. Control info is not interpreted be socket layer, but messages are typed (cmsg_type)

and they have an explicit length (cmsg_len)

Msghdr structure
Msghdr structure

  • msghdr structure for sendmsg system call

Sendmsg system call
sendmsg System Call

8, 1024

// Copy the msghdr from user space to the kernel

// Message too long

iovec with 8 entries is allocated automatically on stack

If not large enough, calls MALLOC

copyin : places a copy of the iovec array from user space into

Larger array

Delivered to the appropriate protocol or an error

sendmsg release iovec array and return

Sendit function
sendit Function

  • sendit is the common func called by sendto and sndmsg

    • Initialzes a uio structure

    • Copy control and address information from proccess into the kernel

    • uiomove function

      • Moves n bytes between a single buffer referenced by cp and the multiple buffers specified by an iovec array in uio

// Instruction space

Points to an array of iovec structure

each time uiomove is called, uio_offset

Increase by n and uio_resid decreases

by n

// counts the number of bytes transferred by uiomove

// Counts the number of bytes remaining to be transferred

Uio structure before and after uiomove
uio structure before and after uiomove


Points to a buffer within the kernel ,typically data area of an mbuf


The data from the buffer in the proccess

has been moved into the kernel’s buffer

because uio_raw was UIO_WRITE

Sendit code
sendit Code

Code is for initialization of uio

To get the file structure associated with descriptor s

Initialize uio structure to gather the output buffers into mbufs in the kernel

Calculate the length of the transfer and save in uio_resid

Ensure that buffer length is nonnegative

Ensure that uio_resid does not overflow ( it is signed integer)

and Guarantee iov_len is nonnegative

Sendit code cont
sendit Code(cont’)

Code : address and control information from the proccess

sockargs() makes copies of the dst address and

control information into mbufs if they are provided

by the proccess

The number of transferred can be calculated if sosend()

doesn’t accept all the data ( it is remaining length)

1. When transfer data and is interrupted by

signal or blocking, error is discarded and

partial transfer is reported

2. If return EPIPE, send SIGPIPE signal

3. No error occurred, transferred bytes are

calculated and saved in *retsize

Sosend function
sosend Function

  • Has responsibility to pass data and control info to pr_usrreq function of the protocol associated with the socket

    • Before pass, check out for enough space in send buffer

    • sosend never places data in the send buffer

      • To store and remove the data is protocol’s responsibility : Protocol DO

    • send buffer’s sb_hiwat and sb_lowat values by sosend depends on whether protocol is reliable or unreliable transfer semantics

  • For reliable protocol, send buffer has both data,,.

    • Data that has not yet been transmitted

    • Data that has been sent, but Not ACKed

    • Sb_cc is the number of the bytes of data that reside in the sendbuffer

      • 0 <= sb_cc <= sb_hiwat

Sosend function how to pass
Sosend Function : how to pass

  • If PR_ATOMIC is set,.

    • sosend() must preserve message boundaries between process and protocol layer

      • In this case, sosend() waits for enough space to become available to hold entire message

      • If available, mbuf having the message is constructed and passed to the protocol in a single call

  • If NOT set,

    • Sosend() passes the message to the protocol one mbuf at a time

    • Pass a partial mbuf to avoid exceeding the high-water mark


  • Unreliable Protocol Buffering

    • No data is ever stored in the sendbuffer and no ACK is expected

    • Each message is passed to the protocol immediately

      • So, sb_cc always 0, and sb_hiwat specifies MMS

    • Sb_hiwat Default for UDP is 9216(9*1024)

      • Unless process changes sb_hiwat with SO_SNDBUF socket option, Trying to write more than 9216 bytes returns ERROR

Sosend code
sosend Code

so : pointer to the relevant socket

addr : pointer to an destination address

uio : pointer uio structure

top : mbuf chain that holds data to be sent

control : mbuf that holds control info to be sent

flags : contains option for this write call

/* initialization ( Figure 16.23) */

Lock send buffer

the lock ensure orderly access to the socket buffer by multiple process

/* wait for space in send buffer (figure 16.24) */

Obtain the lock and prepare to deliver data to the protocol

End of record

If not NULL, transfer data from the process

/* fill a single mbuf or an mbuf chain (Figure 16.25) */

/* pass mbuf chain to protocol (Figure 16.26) */

After all data is passed to protocol,

socket buffer is unlocked, any remaining mbufs are discarded

Sosend initialization
sosend() : initialization

sosendallatonce is true, atomic is set

this flag controls whether data is passed to the protocol as a single

mbuf or seperately

Number of bytes in the iovec buffers or top mbuf chain

// optional control mbuf

Sosend error and resource checking
sosend() : error and resource checking

// socket can’t send more

Protocol require connection and connection is

not established or connection attempt has not been

Started, ENOTCONN is returned

// NO address

// Computes the amount of free space in the send buffer

//if atomic and larger than high_watermark, EMSGSIZE returned

Message must be passed in a single request(atomic)

Msg may be split, but free space fall below low_water

Or the control info don’t fit in the available space

=> sosend() must wait!!!

mp holds pointer used to construct mbuf chain

Sosend function data transfer
Sosend() Function : data transfer

Allocate packet header or standard mbuf

IF atomic set, allocate packet header during first loop and then standardmbuf

IF Not, always allocate packet header, because top is cleared before entering

the loop

Cluster is attached to the mbuf

If set, reserve room for header. But If not set, No reserved

// Msg len, buffer len, mbuf len

Locates data at the end o the buffer in the chain

May leave room for header, depending on how much data is placed in mbuf

// copy len bytes of data from process to mbuf

// update mbuf length

// New mbuf is linked with previous mbuf

// mbuf chain length is updated

when last byte is transferred from the proccess, if M_EOR is set, sosend() breaks out

Sosend function protocol dispatch
sosend() function : protocol dispatch

// only could be enabled for assign message

// reset

Recvmsg system call
recvmsg System call

socket descriptor

for control information

8 on stack

// Copy msg structure to kernel

// 8

Copy iov array

// 1024

after receive data, copy msghdr to process

Recvit function initialization
recvit Function : initialization

// return file structure ‘s’

// compute number bytes of transferred data by adding array length

Total length is computed and saved

Recvit function initialization1
recvit Function : initialization

// the number of bytes of data transferred

copy address

Copy address

& control info

to the proccess

control information

Soreceive function
soreceive Function

  • soreceive transfer data from the receive buffer of socket to the buffer specified by the process

    • recvmsg is the only read system call that returns flags to process

      • In the other calls, the info is discarded by kernel before control returns to the process

  • Out-of-Band data

    • Two mechanisms to facilitate handling OOB

      • Tagging & synchronization

Oob handling
OOB Handling

  • TAG

    • Sending process tags data as OOB by setting MSG_OOB flag

    • Sosend() pass this info to socket protocol

    • When receive OOB, the data is set aside instead of placing socket’s receivce buffer

    • Receive OOB data by setting MSG_OOB

  • Synchronization

    • The receiving process can ask the protocol to place OOD data inline with the regular data

      • In this case, MSG_OOB flag is not used

    • ReadCalls return either all regular data or all OOB data


  • Receiving out-of-band data

Receive buffer organization
Receive Buffer Organization

  • Message boundaries

    • For protocol that support message boundaries, each message is stored in a single chain of mbufs

      • Multiple messages in recv buffer are linked together by m_nextpkt

    • Protocol layer adds data to the recv que and socket layer removes the data from recv que

      • High_water mark for recv buffer restricts the amount of data

    • When PR_ATOMIC is not set,.

      • Protocol layer stores as much data in the buffer as possible and discards the portion of incoming data that does not fit

        • For TCP, it means that any data which is out side the window is discarded

    • PRO_ATOMIC is set,.

      • Protocol use sbappendaddr to construct an mbuf chain and add it to recv queue

Receive buffer organization1
Receive Buffer Organization

  • No Message Boundaries

    • Such as TCP, incoming data is appended to the end of the last mbuf chan in the buffer with sbappend

      • Incoming data is trimmed to fit within the recv buffer, and sb_lowat puts a lower boun on the number of bytes returned by a read system call

Soreceive code
soreceive Code

so : socket, paddr : address info, mp0 : mbuf pointer

controlp : control info in mbufpointer

// size of receive request, if addr, info are copied to kernel, set it to 0

if data is copied, it is updated


// Before access buffer, Get lock

// ‘m’ is first mbuf chain

Check several conditions and if need to wait for more data

If soreceive sleeps in this code, it jumps back to restart when it wakes up to see if enough data has arrives

This continue until request is satisfied

Soreceive code1
soreceive Code

// jumps here when it has enough data to satisfy the request

before any other data is transferred from the receive buffer

Setup data transfer

: remember the type of data at the front of the Que,

so soreceive can stop transfer when the type changes


Soreceive function1
soreceive function

< Out_of_band data >

OOB is not stored, soreceive() allocates a standard

mbuf and issues PRU_RCVOOB request to protocol`

while loop copies data returned by protocol to the buffers

specified by uio.

After copy, soreceive returns 0 or error

< Connection Confirmation >

If data is returned, *mp is cleared up as NULL

If socket is in the SO_ISFIRMING state,

PRU_RCVD request notifies protocol which is

attempting to receive data

Soreceive function2
soreceive function

  • Enough data ?

  • 1. There is no data in the recv buffer ( m ==0 )

  • 2. Not enough data to satisfy the entire read (sb_cc < uio_resid), the minimum amount of data is available,

  • data can be appended to this chain when it arrives(m_nextpht = o and PR_ATOMIC is set)

  • 3. No enough data to satisfy the entire read, minimum amount of data is available, data can be added to chain

  • , but MSG_WAITALL indicates that soreceive must wait until the entire rean can be satisfied

Soreceive wait for more data
soreceive : wait for more data

If socket is in ERROR and ‘m’ is NULL, return ERROR

If ERROR and nonNULL, return data

=> if MSG_PEEK is set, error is not cleared,

since ReadCall with MSG_PEEK should not change the state of socket

If data remain in the recv buffer, sosend() doesn’t wait and return data to process

If recvbuff is empty, sorecv jumps to release and read system call return 0

If contain OOB or end of logical record

soreceive doesn’t wait for additional data and

Jump to dontlock

If protocol request a connection but No exits,

ENOTCONN is posted and jump to release

Soreceive function return address and info
soreceive Function : return address and info

[Return Address]

Like UDP, mbuf containing address is removed from the mbuf

chain and retuned in *paddr

if MSG_PEEK is set, data is removed from the buffer

if NULL, the address is discarded

Soreceive function control information
soreceive Function : control information

// Each control mbuf is removed from the buffer and PEEK set,

and is attached to *contolp( => if NULL, discarded)

If the process is prepared to receive control info,

// If controlp is NULL, Discarded..

points to next mbuf

After the control information has been processed, the chain should

contain regular, OOB mbuf or no mbufs at all

Soreceive function uio move
soreceive Function : uio move

// Continues while there are more mbufs, process’s buffer is not full, and

No error

If the type of mbuf changes, the transfer stops

So, regular and OOB data are not both returned in the same message

Distance to OOB is computed and limits the size of tranfer, so

the byte before the mark is the last byte transferred

Soreceive function update buffer
soreceive Function : update buffer

// if all bytes in mbuf has been transferred, mbuf must be discarded or pointer advanced

Finished with mbuf?

More data to process

if request didn’t consume all the data,

if so_oobmark cut the request short,

if additional data arrived during uiomove,

=> there may be more data to process

Soreceive function oob mark
soreceive Function : OOB mark

If OOB mark is not ZERO, decremented by the number of bytes


If mark has been reached, SS_RCVATMARK is set and breaks out

If MSG_PEEK is set, offset is updated instead of so_oobmark

// End of Record

Soreceive function msg waitall processing
soreceive Function : MSG_WAITALL processing

// If [MSG_WAITALL is set, No more data

in the recv buffer(m==0), wants more data,

this is the last record in recv buffer]

=> Must wait for additional data

When recv buffer is changed by protocol layer, sbwait return

If the wait was interrupted by a signal, sosend returns immediately

Sync ‘m’ and ‘nextrecord’ with recv buffer

Soreceive function cleanup
soreceive Function : cleanup

Truncated Message

If buffer is too small, so truncated

End of the record processing

: Next mbuf chain is attached to the receive buffer

Nothing Transferred