1 / 22

A Wander through GHC’s New IO library

A Wander through GHC’s New IO library. Simon Marlow. The 100-mile view. the API changes: Unicode putStr “A légpárnás hajóm tele van angolnákkal ” works! (if your editor is set up right…) locale-encoding by default, except for Handles in binary mode ( openBinaryFile, hSetBinaryMode )

ellery
Download Presentation

A Wander through GHC’s New IO library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Wander through GHC’s New IO library Simon Marlow

  2. The 100-mile view • the API changes: • Unicode • putStr “A légpárnás hajóm tele van angolnákkal” works! (if your editor is set up right…) • locale-encoding by default, except for Handles in binary mode (openBinaryFile, hSetBinaryMode) • changing the encoding on the fly • hSetEncoding :: Handle -> TextEncoding -> IO () • hGetEncoding :: Handle -> IO (Maybe TextEncoding) • data TextEncoding • latin1, utf8, utf16, utf32, … :: TextEncoding • mkTextEncoding :: String -> IO TextEncoding • localeEncoding :: TextEncoding

  3. The 100-mile view (cont.) • Better newline support • teletypes needed both CR+LF to start a new line, and we’ve been paying for it ever since. hSetNewlineMode :: Handle -> NewlineMode -> IO () data Newline = LF {- “\n” –} | CRLF {- “\r\n” -} nativeNewline :: Newline data NewlineMode = NewlineMode { inputNL :: Newline, outputNL :: Newline } noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF } universalNewlineMode = NewlineMode { inputNL = CRLF, outputNL = nativeNewline } nativeNewlineMode = NewlineMode { inputNL = nativeNewline, outputNL = nativeNewline }

  4. The 10-mile view • Unicode codecs: • built-in codecs for UTF-8, UTF-16(LE,BE), UTF-32(LE-BE). • Other codecs use iconv on Unix systems • Built-in codecs only on Windows (no code pages) • yet… • The pieces for building a codec are provided…

  5. The 10-mile view • Build your own codec: API in GHC.IO.Encoding data BufferCodec from to state = BufferCodec { encode :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to) close :: IO () getState :: IO state setState :: state -> IO () } type TextEncoder state = BufferCodec Char Word8 state type TextDecoder state = BufferCodec Word8 Char state data TextEncoding = forall dstate estate . TextEncoding { mkTextDecoder :: IO (TextDecoder dstate) mkTextEncoder :: IO (TextEncoder estate) } Saving and restoring state is important since Handles support buffering, random access, and changing encodings

  6. The 1-mile view Type class providing I/O device operations: close, seek, getSize, … • Make your own Handles! • why mkFileHandle, not mkHandle? Type class providing buffered reading/writing mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev) => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle Typeable, in case we need to take the Handle apart again later For error messages ReadMode/WriteMode/…

  7. IODevice -- | I/O operations required for implementing a 'Handle'. class IODevice a where -- | closes the device. Further operations on the device should -- produce exceptions. close :: a -> IO () -- | seek to the specified positing in the data. seek :: a -> SeekMode -> Integer -> IO () seek _ _ _ = ioe_unsupportedOperation -- | return the current position in the data. tell :: a -> IO Integer tell _ = ioe_unsupportedOperation -- | returns 'True' if the device is a terminal or console. isTerminal :: a -> IO Bool isTerminal _ = return False … etc … Default is for the operation to be unsupported

  8. BufferedIO class BufferedIO dev where newBuffer :: dev -> BufferState -> IO (Buffer Word8) fillReadBuffer :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) fillReadBuffer0 :: dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) emptyWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer0 :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) Device gets to allocate the buffer. This allows the device to choose the buffer to point directly at the data in memory, for example. 0-versions are non-blocking, non-0 versions must read or write at least one byte (but may transfer less than the whole buffer)

  9. RawIO -- | A low-level I/O provider where the data is bytes in memory. class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int readBuf :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8) readBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) writeBuf :: RawIO dev => dev -> Buffer Word8 -> IO () writeBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)

  10. Example: a memory-mapped Handle • Random-access read/write doesn’t perform very well with ordinary buffered I/O. • Let’s implement a Handle backed by a memory-mapped file • We need to • define our device type • make it an instance of IODevice and BufferedIO • provide a way to create instances

  11. Example: memory-mapped files • Define our device type Ordinary file descriptor, provided by GHC.IO.FD data MemoryMappedFile = MemoryMappedFile { mmap_fd :: FD, mmap_addr :: !(Ptr Word8), mmap_length :: !Int, mmap_ptr :: !(IORef Int) } deriving Typeable Address in memory where our file is mapped, and its length The current file pointer (Handles have a built-in notion of the “current position” that we have to emulate) Typeable is one of the requirements for making a Handle

  12. aside: Buffers module GHC.IO.Buffer ( Buffer(..), .. ) where data Buffer e = Buffer { bufRaw :: !(ForeignPtr e), bufState :: BufferState, -- ReadBuffer | WriteBuffer bufSize :: !Int, -- in elements, not bytes bufL :: !Int, -- offset of first item in the buffer bufR :: !Int -- offset of last item + 1 } Data bufRaw bufL bufR bufSize

  13. Example: memory-mapped files • (a) make it an instance of BufferedIO instance BufferedIO MemoryMappedFile where newBuffer m state = do fp <- newForeignPtr_ (mmap_addr m) return (emptyBuffer fp (mmap_length m) state) fillReadBuffer m buf = do p <- readIORef (mmap_ptr m) let l = mmap_length m if (p >= l) thendo return (0, buf{ bufL=p, bufR=p }) elsedo writeIORef (mmap_ptr m) l return (l-p, buf{ bufL=p, bufR=l }) flushWriteBuffer m buf = do writeIORef (mmap_ptr m) (bufR buf) return buf{ bufL = bufR buf } fillReadBuffer returns the entire file! flush is a no-op: just remember where to read from next

  14. Example: memory-mapped files • (b) make it an instance of IODevice instance IODevice MemoryMappedFile where close = IODevice.close . mmap_fd seek m mode val = do let sz = mmap_length m ptr <- readIORef (mmap_ptr m) let off = case mode of AbsoluteSeek -> fromIntegral val RelativeSeek -> ptr + fromIntegral val SeekFromEnd -> sz + fromIntegral val when (off < 0 || off >= sz) $ ioe_seekOutOfRange writeIORef (mmap_ptr m) off tell m = do o <- readIORef (mmap_ptr m); return (fromIntegral o) getSize = return . fromIntegral . mmap_length … etc …

  15. Example: memory-mapped files • provide a way to create instances mmapFile :: FilePath -> IOMode -> Bool -> IO Handle mmapFile filepath iomode binary = do (fd,_devtype) <- FD.openFile filepath iomode sz <- IODevice.getSize fd addr <- c_mmap nullPtr (fromIntegral sz) prot flags (FD.fdFD fd) 0 ptr <- newIORef 0 let m = MemoryMappedFile { mmap_fd = fd, mmap_addr = castPtr addr, mmap_length = fromIntegral sz, mmap_ptr = ptr } let (encoding, newline) | binary = (Nothing, noNewlineTranslation) | otherwise = (Just localeEncoding, nativeNewlineMode) mkFileHandle m filepath iomode encoding newline Open the file and mmap() it Call mkFileHandle to build the Handle

  16. Demo… $ ./Setup configure Configuring mmap-handle-0.0... $ ./Setup build Preprocessing library mmap-handle-0.0... Building mmap-handle-0.0... [1 of 1] Compiling System.Posix.IO.MMap ( dist/build/System/Posix/IO/MMap.hs, dist/build/System/Posix/IO/MMap.o ) Registering mmap-handle-0.0... $ ./Setup register --inplace --user Registering mmap-handle-0.0... $ ghc-pkg list --user /home/simonmar/.ghc/x86_64-linux-6.11.20090816/package.conf.d: mmap-handle-0.0

  17. Demo… $ cat test.hs import System.IO import System.Posix.IO.MMap import System.Environment import Data.Char main = do [file,test] <- getArgs h <- if test == "mmap" then mmapFile file ReadWriteMode True else openBinaryFile file ReadWriteMode sequence_ [ do hSeek h SeekFromEnd (-n) c <- hGetChar h hSeek h AbsoluteSeek n hPutChar h c | n <- [ 1..10000] ] hClose h putStrLn "done" $ ghc test.hs --make [1 of 1] Compiling Main ( test.hs, test.o ) Linking test ...

  18. Timings… $ time ./test /tmp/words file done 0.24s real 0.14s user 0.10s system 99% ./test /tmp/words file $ time ./test /tmp/words mmap done 0.09s real 0.09s user 0.00s system 99% ./test /tmp/words mmap $ time ./test ./words file # ./ is NFS-mounted done 10.44s real 0.20s user 0.52s system 6% ./test tmp file $ time ./test ./words mmap # ./ is NFS-mounted done 0.10s real 0.09s user 0.00s system 93% ./test tmp mmap

  19. More examples • A Handle that pipes output bytes to a Chan • Handles backed by Win32 HANDLEs • Handle that reads from a Bytestring/text • Handle that reads from text

  20. The -1 mile view • Inside the IO library • The file-descriptor functionality is cleanly separated from the implementation of Handles: • GHC.IO.FD implements file descriptors, with instances of IODevice and BufferedIO • GHC.IO.Handle.FD defines openFile, using FDs as the underlying device • GHC.IO.Handlehas nothing to do with FDs

  21. Implementation of Handle Existential: packs up the IODevice, BufferedIO, Typeable dictionaries, and codec state is existentially quantified data Handle__ = forall dev enc_state dec_state . (IODevice dev, BufferedIO dev, Typeable dev) => Handle__ { haDevice :: !dev, haType :: HandleType, -- read/write/append etc. haByteBuffer :: !(IORef (Buffer Word8)), haCharBuffer :: !(IORef (Buffer CharBufElem)), haEncoder :: Maybe (TextEncoder enc_state), haDecoder :: Maybe (TextDecoder dec_state), haCodec :: Maybe TextEncoding, haInputNL :: Newline, haOutputNL :: Newline, .. some other things .. } deriving Typeable Two buffers: one for bytes, one for Chars.

  22. Where to go from here • This is a step in the right direction, but there is still some obvious ugliness • We haven’t changed the external API, only added to it • There should be a binary I/O layer • hPutBuf working on Handles is wrong: binary Handles should have a different type • in a sense, BufferedIO is a binary I/O layer: it is efficient, but inconvenient • FilePath should be an abstract type. • On Windows, FilePath = String, but on Unix, FilePath = [Word8]. • Should we rethink Handles entirely? • OO-style layers: binary IO, buffering, encoding • Separate read Handles from write Handles? • read/write Handles are a pain

More Related