Program, Process, and Thread

DevOps/OS 2020. 9. 19. 23:43

1. Overview

1.1 Program

A program is a set of instructions and associated data that resides on the disk and is loaded by the operating system to perform some task. An executable file or a python script file are examples of programs. In order to run a program, the operating system's kernel is first asked to create a new process, which is an environment in which a program executes.

A program is an executable file residing on the disk (secondary storage) in a directory.
Hence, the program is also termed as a set of instructions stored in the secondary storage device that is intended to carry out a specific job.
It is read into memory and executed by the kernel.
Therefore, a program is termed as a ‘passive entity’ which exists in the secondary storage even after the machine reboots.
Example:
- On a Microsoft Windows® system: The ‘Calculator’ executable that is usually stored at “<drive>:\windows\system32\calc.exe”.
- On a Linux system: The ‘ls’ binary that is normally stored at “/bin/ls”.

1.2 Process

A process is a program in execution. A process is an execution environment that consists of instructions, user-data, and system-data segments, as well as lots of other resources such as CPU, memory, address-space, disk, and network I/O acquired at runtime. A program can have several copies of it running at the same time but a process necessarily belongs to only one program.

An executing instance of a program is called a process.
Some operating systems use the term ‘task‘ to refer to a program that is being executed.
A process is always stored in the main memory also termed as the primary memory or random access memory.
Therefore, a process is termed as an active entity. It disappears if the machine is rebooted.
Several processes may be associated with the same program.
On a multiprocessor system, multiple processes can be executed in parallel.
On a uni-processor system, though true parallelism is not achieved, a process scheduling algorithm is applied and the processor is scheduled to execute each process one at a time yielding an illusion of concurrency.
Example:
- Executing multiple instances of the ‘Calculator’ program. Each of the instances is termed as a process.

1.3 Thread

Thread is the smallest unit of execution in a process. A thread simply executes instructions serially. A process can have multiple threads running as part of it. Usually, there would be some state associated with the process that is shared among all the threads, and in turn, each thread would have some state private to itself. The globally shared state amongst the threads of a process is visible and accessible to all the threads, and special attention needs to be paid when any thread tries to read or write to this global shared state. There are several constructs offered by various programming languages to guard and discipline the access to this global state.

A thread is a subset of the process.
It is termed as a ‘lightweight process’, since it is similar to a real process but executes within the context of a process and shares the same resources allotted to the process by the kernel (See http://kquest.co.cc/2010/03/operating-system for more info on the term ‘kernel’).
Usually, a process has only one thread of control – one set of machine instructions executing at a time.
A process may also be made up of multiple threads of execution that execute instructions concurrently.
Multiple threads of control can exploit the true parallelism possible on multiprocessor systems.
On a uni-processor system, a thread scheduling algorithm is applied and the processor is scheduled to run each thread one at a time.
All the threads running within a process share the same address space, file descriptor, stack, and other process-related attributes.
Since the threads of a process share the same memory, synchronizing the access to the shared data within the process gains unprecedented importance.

2. Related Elements

2.1 Process Control Block(PCB) or Descriptor

A process control block (PCB) is a data structure used by computer operating systems to store all the information about a process. It is also known as a process descriptor.

When a process is created (initialized or installed), the operating system creates a corresponding process control block.
Information in a process control block is updated during the transition of process states.
When the process terminates, its PCB is returned to the pool from which new PCBs are drawn.
Each process has a single PCB.

2.1.1 Structure of PCB

In multitasking operating systems, the PCB stores data needed for correct and efficient process management. Though the details fo these structures are system-dependent, common elements fall in three main categories:

Process Identification
Process State
Process Control

Status tables exist for each relevant entity, like describing memory, I/O devices, files, and processes.

2.2 Program Counter(PC) or Instruction Pointer(IP) or Instruction Address Register(IAR) or Instruction Counter

Usually, the PC is incremented after fetching an instruction and holds the memory address of ("points to") the next instruction that would be executed.

2.3 Register or Processor Register or Stack Register or Stack Pointer(SP)

Processor registers are normally at the top of the memory hierarchy and provide the fastest way to access data.

2.4 Call Stack

A call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or machine stack, and is often shortened to just "the stack". Although maintenance of the call stack is important for the proper functioning of most software, the details are normally hidden and automatic in high-level programming languages.

Since the call stack is organized as a stack, the caller pushes the return address onto the stack, and the called subroutine, when it finishes, pulls or pops the return address of the call stack and transfers control to that address. If a called subroutine calls on yet another subroutine, it will push another return address onto the call stack, and so on, with the information stacking up and unstacking as the program dictates. If the pushing consumes all of the space allocated for the call stack, an error called a stack overflow occurs, generally causing the program to crash. Adding a subroutine's entry to the call stack is sometimes called "winding"; conversely, removing entries is "unwinding".

3. Thread

3.1 User-level Thread

3.2 Kernel-level Thread

3.3 Difference between User-level thread and Kernel-level thread

USER-LEVEL THREAD	KERNEL-LEVEL THREAD
User-level threads are implemented by users.	Kernel-level threads are implemented by OS.
OS doesn’t recognize User-level threads.	Kernel-level threads are recognized by OS.
Implementation of User-level threads is easy.	The implementation of the Kernel-level thread is complicated.
Context switch time is less.	Context switch time is more.
Context switch requires no hardware support.	Hardware support is needed.
If one user-level thread performs a blocking operation then the entire process will be blocked.	If one kernel thread performs a blocking operation then another thread can continue execution.
User-level threads are designed as dependent threads.	Kernel-level threads are designed as independent threads.
Example: Java thread, POSIX threads.	Example: Window Solaris.

3.4 Thread Control Block(TCB)

TCB is a data structure in the operating system kernel which contains thread-specific information needed to manage it. The TCB is "the manifestation of a thread in an operating system."

4. Central Processing Unit(CPU)

A central processing unit (CPU), also called a central processor, main processor, or just processor is the electronic circuitry within a computer that executes instructions that make up a computer program.

4.1 Operation

4.1.1 Fetch

The first step involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The instruction's location(address) in the program is determined by a program counter(PC), which stores a number that identifies the address of the next instruction to be fetched.

4.1.2 Decode

The instruction that the CPU fetches from memory determines what the CPU will do. In the decode step, performed by the circuitry known as the instruction decoder, the instruction is converted into signals that control other parts of the CPU.

4.1.3 Execute

After the fetch and decode steps, the execution step is performed.

4.2 Structure and Implementation

5. Compare Process and Thread

Process	Thread
Processes are heavyweight operations	Threads are lightweight operations
Every process has its own memory space	Threads use the memory of the process they belong to
Interprocess communication is slow as processes have a different memory address	Inter thread communication is fast as threads of the same process share the same memory address of the process they belong to
Context switching between the process is more expensive	Context switching between threads of the same process is less expensive
Processes don't share the memory with other processes	Threads share the memory with other threads of the same process