Chapter 3 File I/O

Chapter 3 File I/O

File Operations

open and opent Parameters: filename to open or create Both of these return file descriptors that are always the smallest unused descriptor number. Closed file descriptors become unused.

The parameter fd in opent specifies the starting address in the file system where the relative path name begins.

create is essentially opening a new file with open. When closing a file, close also releases all record locks held by the process on that file.

lseek can explicitly set the offset for an opened file. Some files support this while others don’t, such as pipes, FIFOs, and sockets. This operation doesn’t cause any I/O operations. The offset can be greater than the current file length, creating holes in between that don’t allocate disk blocks.

read and write both temporarily store arrays in buf. buf size determination: larger than the disk block length. Most file systems employ read-ahead technology. During sequential reads, the system pre-reads more data into page cache. Therefore, when measuring I/O performance multiple times, we use multiple copies to prevent cache hits.

File Sharing

Process table entry | Descriptor table - File table entry - Inode node

File table entry: file status flags, current file offset, i-node pointer (where lseek operates, write increments the offset by the number of bytes written). This allows each process to have its own offset for the file.

Atomic Operations

Two processes simultaneously writing to the same offset location will cause overwrite issues. Calling lseek followed by read or write is not an atomic operation and may be interrupted!

pread and pwrite ensure atomic I/O operations and won’t be interrupted during the process.

Copying File Descriptors

dup makes the passed descriptor and the returned new descriptor both point to the same file table entry. The new descriptor follows the same minimum unused allocation principle. dup2 can directly specify the new descriptor.

sync, fsync, fdatasync

sync only queues modified block buffers for writing and then returns immediately without waiting for write operations to complete. The update daemon periodically calls sync to ensure regular flushing of block buffers.

fsync waits for specific file descriptor write operations to finish before returning. fdatasync only cares about data updates, not file attribute updates.

fcntl

Can change attributes of already opened files.

ioctl

Performs I/O operations on special devices.

Opening the path /dev/fd/n where n is open, will copy descriptor n.

Buffered I/O

Reads and writes go through memory pages maintained by the kernel for buffering. Each page can map to multiple blocks on disk. Based on block size, each block can map to one or more sectors on disk hardware. According to kernel policies, dirty pages are lazily flushed to hard disk, reducing I/O frequency and improving performance.

Direct I/O

Reads and writes bypass the kernel’s page cache entirely. Using the O_DIRECT flag when opening files prevents additional copying during read/write operations.

Using direct I/O in ordinary applications may lead to performance degradation, but it allows users to fully control I/O operations, potentially improving performance through certain methods. Databases are programs that use direct I/O to significantly enhance I/O performance. For example, they use direct I/O to implement WAL components, ensuring data change records are written to disk first.

Block Alignment

When using direct I/O, note that written and read data must be integer multiples of the block size, i.e., integer multiples of 512 bytes.

mmap

Memory mapping enables processes to access all positions in a file within memory. Process virtual pages directly map to kernel page cache, meaning there’s no buffer I/O overhead of copying data from user space to kernel space. No copy system calls exist in between. During reads, unless a page fault is triggered, random read operations won’t execute.

Private Mode and Shared Mode

Private mode is similar to a no-copy version of buffered I/O. Shared mode allows multiple processes to share the same mmap.