How processes talk

How processes talk

When i was starting out my programming journey, i often wondered how two programs in different languages would talk. If I wanted to call some function in python from a javascript program, how would i do that? These types of problems often plagued me. Thats when i stumbled upon inter process communication. It sounds daunting but its actually pretty simple. Its exactly what it says, communication between processes. But first we have to establish what a process is exactly.

What is a Process?

Put simply a process is a program(regular code) that is loaded into the computer's memory for execution. This means that whether its a python program, JavaScript program, or a C program, they all end up as processes when you execute them. You can actually inspect all these processes, and their states by opening your task manager.

Creating a Process

In most programming languages, there are functions to create, and terminate processes. Lets see how we can create a child-process in JavaScript to list all files in the current directory.

const { spawn } = require('child_process');
 
function listFilesInCurrentDirectory() {
    const child = spawn('dir', {shell : true}); 
    // Replace 'dir' with 'ls' on linux. shell param must be true on windows
 
    child.stdout.on('data', (data) => {
        console.log(`Files in the current directory:\n${data}`);
    });
    
    /* Log out any error in child process */
    child.stderr.on('data', (data) => {
        console.error(`Error: ${data}`);
    });
 
    child.on('close', (code) => {
        if (code === 0) {
            console.log('Listing files completed.');
        } else {
            console.error(`Child process exited with code ${code}`);
        }
    });
}
 
listFilesInCurrentDirectory();

You need a JavaScript runtime like Node since OS utilities are not accessible from the browser.

Now lets do some complex calculations in numpy in python, and consume them in JavaScript.

complicated.py
import numpy as np
  
def calculate_eigenvalues():
    matrix = np.random.rand(5, 5)
    eigenvalues = np.linalg.eigvals(matrix)
    return eigenvalues.tolist()
  
print(calculate_eigenvalues())
const { spawn } = require('child_process');
 
function calculateEigenvaluesPython() {
  const pythonProcess = spawn('python', ['complicated.py'], {shell : true});
 
  pythonProcess.stdout.on('data', (data) => {
    const eigenvalues = JSON.parse(data.toString());
    console.log('Eigenvalues:', eigenvalues);
  });
 
  pythonProcess.stderr.on('data', (data) => {
    console.error('Error:', data.toString());
  });
 
  pythonProcess.on('close', (code) => {
    if (code !== 0) {
      console.error(`Python process exited with code ${code}`);
    }
  });
}
 
calculateEigenvaluesPython();

Lets breakdown whats happening here.

  1. We have a python program that is computing some complex calculations in numpy, and then outputting the values to the console(stdout).
  2. Our JavaScript program is creating a python process with the parameter of the filename of the python program we wrote. This is akin to running “python complicated.py” in a shell.
  3. We can then access the output of the python process using the stdout stream.

Inter process communication(IPC)

So far we have discussed what processes are, and how we can create processes. Now lets deep dive into inter process communication.

Shared memory

Usually the variables that we declare in a program(local, or global) are bound to that process’s state. A more technical way of saying that is that those variables are within the address space of the process. This means that when the program ends, that memory is freed. All variables are lost. This also means, the memory of a process cannot be accessed by another process directly. However we can allocate a shared memory that exists outside of the state of a process. This shared memory can be accessed by multiple processes at once, and can be used as a means of communication. lets take a look at an example of how we can send data from one process to another.

import multiprocessing
from multiprocessing import shared_memory
 
def child():
    shm = shared_memory.SharedMemory("my_shared_memory")
    print(str(shm.buf[:5].tobytes())) # outputs hello
 
if _name_ == "_main_":    
    # Creating a process that executes the receiving function
    child_process = multiprocessing.Process(target=child)
    shm = shared_memory.SharedMemory("my_shared_memory", create=True, size=10)
    shm.buf[:5]=b"Hello"
    child_process.start()
    child_process.join()

lets simplify this example. All we have done is create a child process, and a shared memory named my_shared_memory. The child process will use this same name to get a reference to this shared memory. The parent writes “hello” to the shared memory, the child reads it and outputs to the console.

Although, the example showcases a parent child relation between the processes, it is not necessary. To use shared memory, all we need is to agree on its identification(the name parameter in this case). Multiple independent processes can all use the same name to get access to the same shared memory.

Pipes

Another powerful way of communicating between two processes is pipes. A pipe is basically a communication channel with two handles, and data can be sent, and received through it.

import multiprocessing
 
def child(conn):
    received_message = conn.recv()
    print("Hello", received_message) # prints Hello World
    conn.close()
 
if __name__ == "__main__":
    parent_handle, child_handle = multiprocessing.Pipe()
    
    # Creating a process that executes the receiving function
    child_process = multiprocessing.Process(target=child, args=(child_handle,))
    parent_handle.send("World")
 
    child_process.start()
    child_process.join()

In the above example, we have created a child process, and passed it a pipe connection as an argument. Now both parent, and child process can send and receive data.

For pipes, a parent child relation is not necessary, we just need to pass the connection handle to the other process.

HTTP server(Hypertext Transfer Protocol)

An HTTP server can be also seen as a form of IPC. HTTP servers are the backbone of the internet. Whenever you go to a particular website like wikipedia.com, you are basically sending a request to an HTTP server, to which the server responds by sending you the necessary html, CSS, and JavaScript required to render the page. If we break this down in terms of processes, your browser or the browser tab with which you made the request is a process, and the server is another process. Through networking, we have made it possible for these two distant processes on different machines to communicate. Lets see how we can implement our own server in python using the Flask library, and then query the server from a JavaScript client.

server.py
from flask import Flask
 
app = Flask(__name__)
 
@app.route('/')
def hello_world():
    return 'Hello, World!'
 
if __name__ == '__main__':
    app.run()

In this code, we have defined a single endpoint, which responds with “Hello, World!”.

const url = 'http://127.0.0.1:5000/'; 
/* The server is running at port 5000 */
 
fetch(url)
    .then(response => response.text())
    .then(data => {
        console.log('Response from Flask server:', data);
    })
    .catch(error => {
        console.error('Error fetching data:', error);
    });

The JavaScript client is fetching that endpoint, and displaying the results.

Use cases

  1. Process Creation and Binary Execution: Process creation is used to launch multiple processes, each handling specific tasks concurrently. It allows harnessing the power of multiple CPU cores and executing diverse binaries for specialized processing. For instance, a video processing application might utilize separate processes for video decoding, encoding, and rendering, making use of command line utilities like FFmpeg to handle multimedia tasks efficiently.

  2. Distributed Computing: Shared memory facilitates efficient communication and data sharing between processes in distributed computing environments. It enables parallel processing and coordination among multiple nodes in a cluster or grid, effectively tackling computationally intensive tasks. Large-scale data analysis, scientific simulations, and real-time data processing are some of the domains where distributed computing thrives with shared memory IPC.

  3. Graphics and Visualization: Pipes and shared memory are often employed in graphics-intensive applications and data visualization tasks. For instance, a C++ program can generate data, pass it through a pipe to a Python script, and then use a plotting library like Matplotlib to draw a graph. This seamless integration of diverse tools streamlines complex data visualization workflows.

  4. Web Servers and Frontend-Backend Communication: HTTP-based communication is at the heart of web servers and web applications. It allows for seamless interaction between the frontend and backend components of web applications. Through RESTful APIs, WebSocket connections, or GraphQL, web servers can communicate with clients and other backend services, enabling dynamic web content and real-time data updates.

Synchronization

Although this is a vast topic of its own, it deserves a brief overview here. When we have multiple processes interacting asynchronously, and communicating with each other, it becomes the responsibility of the programmer to synchronize them. Consider the following scenario. In a bank account system, multiple processes may concurrently attempt to access and modify the same account balance. For example, customers can make deposits, withdrawals, and balance inquiries concurrently. Without proper synchronization, the following issues can arise:

  1. Race Conditions: Race conditions occur when multiple processes try to update the same account balance simultaneously. If two processes attempt to withdraw money from the account at the same time, the balance might end up incorrect, and one of the transactions could be lost.

  2. Inconsistent Balances: When transactions are not synchronized, reading the account balance while a deposit or withdrawal is in progress may lead to inconsistent or outdated values. The account balance might not accurately reflect the recent transactions.

  3. Deadlocks: Improper synchronization can lead to deadlocks, where processes become stuck because they are waiting for each other to release shared resources, causing the entire system to halt.

Conclusion

In conclusion, Inter-Process Communication is a fundamental concept that empowers the development of complex and distributed software systems. Understanding the diverse IPC mechanisms and their use cases equips developers to design efficient, scalable, and robust applications that harness the full potential of modern computing infrastructures. Whether it's concurrent processing, distributed computing, or seamless frontend-backend interaction, IPC remains an essential tool in the developer's arsenal for crafting innovative and performant software solutions.