Saturday, 19 March 2016

Sub-processing with modern C++

Now and then developers usually find themselves with a requirement to execute shell commands or launch other processes from their code. Usually for such tasks, the target OS provides API/system calls to achieve the same. For example, Linux offers many ways via its system calls and C library/libc, execl, execle, execlp, execv, execvp, execvP, system, popen etc. Many of which may execute the provided command using shell which has its own set of security concerns. And when it comes to chaining two or more commands via piping or monitoring the running process things become quite low level and error prone. This is where the sub-processing library swoops in.

Python subprocess module

The whole idea of creating a subprocess module for C++ came from python subprocess module. Specifically the one distributed with 2.7 version of python.
It has become the defacto module for executing commands in python. It offers a rich set of API's which can be used to achieve various functionalities such as 
  1. Switch to execute a command from shell or without it.
  2.  Redirection of standard input/output/error streams.
  3.  Piping/Chaining multiple commands.
  4.  Interacting with the child process by sending messages to it standard input channel.
  5.  Streamed/block reading of the output.
  6.  Monitoring the process using poll / wait functions.
  7.  Running user defined functions before exec'ing the child.
  8.  Low level details such as working with pipes, file descriptors etc is completely wrapped.
These are few of the features that just came off from the top of my head. For having a more detailed look at it, checkout :

C++ subprocess module

Now, I looked around quite a bit for finding something similar and lightweight for C++ and surprisingly I did not find anything except for Boost.Process, but for some reasons it never got into boost and I somehow didn't like the way the source was written. Looked a bit complicated for the task at hand.
Also, with the modern C++ standard (C++11 to be precise), we can actually obtain syntax similar to python. So, with python subprocess as reference and with C++11 compiler support, I came up with my own version of subprocess module.


Examples

I will directly jump into examples since there is no point explaining how this works. It's just a basic fork-exec stuff with quite a bit low level stuff.

To run a simple command without any extra interaction, just like system call


#include "subprocess.hpp"
namespace sp = subprocess;
auto ret = sp::call({"ls", "-l"});


Now, if we want to run the same command by spawning the shell process and then passing the command to shell, the just do:

auto ret = sp::call({"ls", "-l"}, shell{true});

We have just set the shell parameter to true. As you can see, the way we are passing arguments are quite different. We will talk about/see more of it later.

Just for the sake of comparison, here is what a python program would look like:

import subprocess as sp
ret = sp.call(["ls", "-l"], shell=True)

Almost similar, isn't it ?

Below example shows how to get the output :

auto obuf = check_output({"ls", "-l"}, shell{false});
std::cout << obuf.buf.data() << std::endl;

check_output returns the data collected at the output stream of the process and returns it after the command is executed.
obuf is of type Buffer. It looks almost something like below:

struct Buffer {
  std::vector<char> buf;
  size_t length = 0;
};

This structure is being used instead of plain vector because of the dynamic resizing (not reserve) of the vector which does not tell how many bytes of data were actually written into it.

We can also check the return code of the executed command via retcode function.
But, for that one will have to use Popen class.

auto p = Popen({"ls", "-l"}, shell{true});
int ret = p.retcode();

Now, let's see an example of redirection. Here, we will redirect the error channel to output channel.

auto p = Popen("./write_err.sh", output{"write_err.txt"}, error{STDOUT});

write_err.sh is a script that just writes to stderr. output channel is set to write to a file named write_err.txt and error channel is redirected to output channel i.e to 'write_err.txt' file.

Piping In & Piping Out

Python subprocess supports pipelining in a very intuitive way by allowing successive commands to set their input channels to output channel of previous command. This library also supports that kind of pipelining but also takes the syntax (it terms of easiness) to another level.

As an example, take below unix/linux command as an example:
cat subprocess.hpp | grep template | cut -d, -f 1

Translating to cpp-subprocess:


auto cat  = Popen({"cat", "subprocess.hpp"}, output{PIPE});
auto grep = Popen({"grep", "template"}, input{cat.output()}, output{PIPE});
auto cut  = Popen({"cut", "-d,", "-f", "1"}, input{grep.output()}, output{PIPE});
auto res  = cut.communicate().first;
std::cout << res.buf.data() << std::endl;

This is pretty much the same thing you would do in python as well. But, by making use of variadic templates we can chain these commands and make pipelining look like:

auto res = pipeline("cat subprocess.hpp", "grep Args", "grep template");
std::cout << res.buf.data() << std::endl;

Does that make anyone happy ?

Working with environment variables

Let's see an example directly:


int st= Popen("./env_script.sh", environment{{
                                        {"NEW_ENV1", "VALUE-1"},
                                        {"NEW_ENV2", "VALUE-2"},
                                        {"NEW_ENV3", "VALUE-3"}
                                  }}).wait();

It's more about the syntax.

Other options supported are:
  1. executable - To set the executable.
  2. cwd           - To set the current working directory.
  3. bufsize      - Sets the buffer size of input/output/error streams.
  4. environment - Sets the environment variables required by spawned process.
  5. defer_spawn - Defer the spawning of process. Start it explicitly with start_process function.
  6. shell             - Execute with shell.
  7. input            - Sets the input channel.
  8. output          - Sets the output channel.
  9. error            - Sets the error channel.
  10. close_fds     - Flag to indicate whether all file descriptors needs to be closed before exec'ing.
  11. preexec_func - User define function to be executed before execing.
  12. session_leader - Flag to indicate whether the spawned process must be session leader or not.
Most of them are similar to what python subprocess module supports. So, it would be better to read it's documentation as well as it would be more exhaustive.

About the library

  1. Does not support Windows. Do not have any expertise with working on Windows.
  2. Exception handling is still primitive.
  3. Need to add more and more tests.
  4. Needs C++11 support to compile.
  5. Tested on Linux/Ubuntu and Mac OS with g++-4.8 and clang 3.4 compiler resp.