Process Management and Inter-Process Communication

Process Management and Inter-Process Communication

A process is an instance of a running program, including the program counter, registers, and current values of variables.

Process Internals

Virtual address space + at least one control thread Segmentation of space: r-x is the program segment, r contains defined constants, rw contains defined variable segments. The rw section can be further divided into stack and heap space, where malloc’d memory is also placed.

Threads

A thread is an entity scheduled for execution by the CPU. Threads contain program counters, have registers, and possess their own stack. Processes are used merely to aggregate resources together. Multiple processes share physical memory, disk, and other resources, while multiple threads share the same address space and other resources.

Threads are simply viewed as processes that share certain resources with others, and whether they share address space is almost the only distinction between processes and what Linux calls threads.

Difference between multiple child processes and multithreading: Multithreads have the same address space and can access each other without protection. Multiprocesses have separate address spaces.

User-Level Threads

Requires unified runtime with specialized thread tables to manage each thread, similar to kernel tables that record processes.

Process Lifecycle

Creation: Through system calls or during system initialization Daemon processes: Processes handling background activities Termination conditions: Normal exit, error exit, serious errors, being killed States: Running, blocked, ready

Process Management

The operating system maintains an array structure called the process table. Each entry is called a Process Control Block (PCB), containing state information needed for process suspension and resumption. If an interrupt occurs (such as I/O), the interrupt hardware pushes the program counter, status word, and one or more registers onto the stack, then hands over to the interrupt service routine for remaining work. Registers are prioritized for saving, then the stack pointer points to temporary space for the routine to use. After completion, the scheduler runs and decides which process from the ready queue to execute.

Process System Calls

fork()

What does fork copy? Calling thread, process address space, process file descriptor table, data and code segments Uses copy-on-write technology, physically still mapped to parent’s physical space Not copied: File handles Therefore, after fork, child and parent processes’ handle tables point to the same file handles Return value 0 – child process Return value PID – parent process After copying, only handles are shared between parent and child

vfork()

vfork is designed for scenarios where the child process immediately executes execve system call to load a new program after creation. Before the child process exits or starts a new program, the kernel ensures the parent process remains blocked (in suspended state), guaranteeing priority execution of the child process. vfork created child processes share all memory and data segments with the parent process, which can be used to reduce unnecessary copying in fork.

clone()

Can selectively copy resources to child processes, sharing non-copied data structures through pointer duplication. Generally used as parameter passing to create threads.

Zombie Processes

When a process uses fork to create a child process, if the child process exits but the parent process doesn’t call wait or waitpid to retrieve the child’s status information, the child process’s process descriptor remains in the system.

The kernel releases resources when a process exits, but retains some information until the parent process retrieves these details before beginning release.

How to prevent accumulated zombie processes from consuming excessive system resources?

  1. When a child process calls exit(), the parent receives a SIGCHILD signal prompting it to use wait() to retrieve the child process exit code
  2. Loop calling waitpid() to wait for all exiting child processes; typically doesn’t block the caller, while wait() will block the caller if no exiting processes are found
  3. Kill the parent so init process 1 handles it

Orphaned Processes

When a parent process exits while one or more of its child processes are still running, those children become orphaned processes. Orphaned processes are adopted by the init process (process ID 1), which completes status collection work via wait, so orphaned processes pose no harm.

Inter-Process Communication

  1. Pipe: A half-duplex communication method where data flows in only one direction and works only between processes with kinship relationships. Process kinship usually refers to parent-child relationships, essentially kernel buffer.
  2. Named pipe FIFO: Named pipes are also half-duplex communication methods, but they allow communication between unrelated processes, has inode on disk without data blocks.
  3. Message Queue: Message queues are linked lists of messages stored in kernel and identified by message queue identifiers. Message queues overcome limitations of signal limited information transfer, pipe carrying only unformatted byte streams, and limited buffer sizes, allowing custom specification of receiving specific message types.
  4. Shared Memory: Shared memory maps a segment accessible by other processes. This shared memory segment is created by one process but accessible by multiple processes. Shared memory is the fastest IPC method, specifically designed for low efficiency of other inter-process communication methods. It often works with other communication mechanisms like semaphores to achieve process synchronization and communication. Implementation:
    • mmap mapping the same ordinary file makes the file become memory, eliminating need for file operations to transfer data
    • UNIX system V maps through a special filesystem, obtaining shared memory ID via standard key, mapping shared memory according to ID
  5. Semaphore: A semaphore is a counter used to control multiple process access to shared resources. It commonly serves as a locking mechanism preventing other processes from accessing resources while one process is using them. Therefore, it mainly serves as synchronization means between processes and between different threads within the same process.
  6. Socket: Sockets provide another inter-process communication mechanism, unique in that they enable process communication between different machines.
  7. Signal: Signals represent a more complex communication method, the only asynchronous communication mechanism in inter-process communication. Used to notify receiving processes that certain events have occurred.

Machine-level atomic instructions: TSL (Test and Set Lock) - The executing CPU locks the memory bus, preventing other CPUs from accessing memory before instruction completion; leaving processes can release the lock by reassigning with mov XCHG then CMP CAS operations implemented using CMPXCHG assembly instruction All have busy-waiting drawbacks, which can be addressed using sleep and wakeup system calls to block/suspend and wake processes Use semaphores for synchronization, P semaphore -1, V semaphore +1

Condition variables: pthread_cond_wait, pthread_cond_signal, commonly used for thread mutual waiting. For example, in producer-consumer problems, wait until other threads send signals. When multiple threads wait on one signal, pthread_cond_broadcast can be used. Condition variables are often used with mutexes. This pattern allows one thread to lock a mutex and wait on a condition variable when expected results aren’t obtained, making blocking atomic operations possible. If no processes wait on condition variables, signal will disappear forever, so wait must occur before signal.

Monitor: High-level synchronization primitive, only one active process allowed in monitor at any time, mutual exclusion completed by compiler (in Go directly handled by channel data structures). If other processes exist in monitor, calling processes are suspended. When buffers are full, condition variables can provide blocking, allowing consumers to execute signal on condition variables waiting for their partners.

Barrier: Used to synchronize process groups; unless all processes are ready to proceed to next phase, no process can enter next phase.

Pipe: Extensively used when executing shell commands in terminal. However, pipes’ limitation is they only work between processes with kinship relationships (common ancestor). As long as the common ancestor calls pipe function, opened pipe files are shared among all descendants after fork.

Redirection: Replace file descriptors through dup2. For example, dup(4, 1) makes fd4’s descriptor point to the same file as fd1, so when writing to fd1, callers don’t know the specific target file.

Deadlock

Conditions:

  1. Resource mutual exclusion
  2. Hold and wait
  3. Non-preemption
  4. Circular wait

Detection: Resource allocation graph contains one or more cycles; vector comparison Recovery:

  1. Recovery through preemption
  2. Rollback recovery: Set checkpoints
  3. Kill processes

Avoidance: Resource trajectory graph

Process Scheduling Timing

  1. Process creation and exit
  2. Process blocking on I/O or semaphores or other blocking causes
  3. When I/O interrupts occur

Scheduling Algorithms

First-Come First-Served, Shortest Job First, Shortest Remaining Time (preemptive), Round Robin, Priority

Thread scheduling:

  1. User-level threads: Low switching overhead, kernel unaware
  2. Kernel-level threads: High switching overhead, process won’t hang due to thread I/O blocking

Linux scheduling algorithm: Dynamic priority, rewarding interactive processes, penalizing CPU-intensive processes