Saturday 19 March 2016

Sub-processing with modern C++

Now and then developers usually find themselves with a requirement to execute shell commands or launch other processes from their code. Usually for such tasks, the target OS provides API/system calls to achieve the same. For example, Linux offers many ways via its system calls and C library/libc, execl, execle, execlp, execv, execvp, execvP, system, popen etc. Many of which may execute the provided command using shell which has its own set of security concerns. And when it comes to chaining two or more commands via piping or monitoring the running process things become quite low level and error prone. This is where the sub-processing library swoops in.

Python subprocess module

The whole idea of creating a subprocess module for C++ came from python subprocess module. Specifically the one distributed with 2.7 version of python.
It has become the defacto module for executing commands in python. It offers a rich set of API's which can be used to achieve various functionalities such as 
  1. Switch to execute a command from shell or without it.
  2.  Redirection of standard input/output/error streams.
  3.  Piping/Chaining multiple commands.
  4.  Interacting with the child process by sending messages to it standard input channel.
  5.  Streamed/block reading of the output.
  6.  Monitoring the process using poll / wait functions.
  7.  Running user defined functions before exec'ing the child.
  8.  Low level details such as working with pipes, file descriptors etc is completely wrapped.
These are few of the features that just came off from the top of my head. For having a more detailed look at it, checkout :

C++ subprocess module

Now, I looked around quite a bit for finding something similar and lightweight for C++ and surprisingly I did not find anything except for Boost.Process, but for some reasons it never got into boost and I somehow didn't like the way the source was written. Looked a bit complicated for the task at hand.
Also, with the modern C++ standard (C++11 to be precise), we can actually obtain syntax similar to python. So, with python subprocess as reference and with C++11 compiler support, I came up with my own version of subprocess module.


Examples

I will directly jump into examples since there is no point explaining how this works. It's just a basic fork-exec stuff with quite a bit low level stuff.

To run a simple command without any extra interaction, just like system call


#include "subprocess.hpp"
namespace sp = subprocess;
auto ret = sp::call({"ls", "-l"});


Now, if we want to run the same command by spawning the shell process and then passing the command to shell, the just do:

auto ret = sp::call({"ls", "-l"}, shell{true});

We have just set the shell parameter to true. As you can see, the way we are passing arguments are quite different. We will talk about/see more of it later.

Just for the sake of comparison, here is what a python program would look like:

import subprocess as sp
ret = sp.call(["ls", "-l"], shell=True)

Almost similar, isn't it ?

Below example shows how to get the output :

auto obuf = check_output({"ls", "-l"}, shell{false});
std::cout << obuf.buf.data() << std::endl;

check_output returns the data collected at the output stream of the process and returns it after the command is executed.
obuf is of type Buffer. It looks almost something like below:

struct Buffer {
  std::vector<char> buf;
  size_t length = 0;
};

This structure is being used instead of plain vector because of the dynamic resizing (not reserve) of the vector which does not tell how many bytes of data were actually written into it.

We can also check the return code of the executed command via retcode function.
But, for that one will have to use Popen class.

auto p = Popen({"ls", "-l"}, shell{true});
int ret = p.retcode();

Now, let's see an example of redirection. Here, we will redirect the error channel to output channel.

auto p = Popen("./write_err.sh", output{"write_err.txt"}, error{STDOUT});

write_err.sh is a script that just writes to stderr. output channel is set to write to a file named write_err.txt and error channel is redirected to output channel i.e to 'write_err.txt' file.

Piping In & Piping Out

Python subprocess supports pipelining in a very intuitive way by allowing successive commands to set their input channels to output channel of previous command. This library also supports that kind of pipelining but also takes the syntax (it terms of easiness) to another level.

As an example, take below unix/linux command as an example:
cat subprocess.hpp | grep template | cut -d, -f 1

Translating to cpp-subprocess:


auto cat  = Popen({"cat", "subprocess.hpp"}, output{PIPE});
auto grep = Popen({"grep", "template"}, input{cat.output()}, output{PIPE});
auto cut  = Popen({"cut", "-d,", "-f", "1"}, input{grep.output()}, output{PIPE});
auto res  = cut.communicate().first;
std::cout << res.buf.data() << std::endl;

This is pretty much the same thing you would do in python as well. But, by making use of variadic templates we can chain these commands and make pipelining look like:

auto res = pipeline("cat subprocess.hpp", "grep Args", "grep template");
std::cout << res.buf.data() << std::endl;

Does that make anyone happy ?

Working with environment variables

Let's see an example directly:


int st= Popen("./env_script.sh", environment{{
                                        {"NEW_ENV1", "VALUE-1"},
                                        {"NEW_ENV2", "VALUE-2"},
                                        {"NEW_ENV3", "VALUE-3"}
                                  }}).wait();

It's more about the syntax.

Other options supported are:
  1. executable - To set the executable.
  2. cwd           - To set the current working directory.
  3. bufsize      - Sets the buffer size of input/output/error streams.
  4. environment - Sets the environment variables required by spawned process.
  5. defer_spawn - Defer the spawning of process. Start it explicitly with start_process function.
  6. shell             - Execute with shell.
  7. input            - Sets the input channel.
  8. output          - Sets the output channel.
  9. error            - Sets the error channel.
  10. close_fds     - Flag to indicate whether all file descriptors needs to be closed before exec'ing.
  11. preexec_func - User define function to be executed before execing.
  12. session_leader - Flag to indicate whether the spawned process must be session leader or not.
Most of them are similar to what python subprocess module supports. So, it would be better to read it's documentation as well as it would be more exhaustive.

About the library

  1. Does not support Windows. Do not have any expertise with working on Windows.
  2. Exception handling is still primitive.
  3. Need to add more and more tests.
  4. Needs C++11 support to compile.
  5. Tested on Linux/Ubuntu and Mac OS with g++-4.8 and clang 3.4 compiler resp.

12 comments:

  1. Does your c++ code support both win32 and posix as the boost.process does?

    ReplyDelete
    Replies
    1. The answer is found at the end of the article.

      Delete
    2. Ah, sorry, you're right...

      Delete
    3. I just starred https://github.com/eidheim/tiny-process-library - it may be what you need, and may be useful for Arun to add Windows support

      thanks, Arun, your syntax look great and there are lot of features, so i'm interested in Windows support in your library too

      Delete
    4. Thanks Zed. I will have a look at the library you mentioned and see what I can do about it.

      Delete
  2. >pipeline("cat subprocess.hpp", "grep Args", "grep template");

    it looks like typo

    from your examples it's not obvious whether it's possible to communicate with stdin and stdout simultaneously. f.e if i want to generate data on the fly, pass them through bzip2 and then process the data it has produced

    ReplyDelete
    Replies
    1. I am not sure if I understood completely, but 'pipeline' is just a convenience function. It works generally when the first command provided is the data generator. If you have a bit more complex requirement, you can always chain them via Popen object.

      Delete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. All of the examples have the initializer list of command and arguments hard-coded. Can this module be used to execute a command for which the command name and arguments are specified via configuration and of a variable length? How is the initializer list created for such a scenario?

    ReplyDelete
    Replies
    1. Hi Lara, I do not see any reason why it should not already be handled. `Popen` has an overload which takes a string and splits it (though the split operation is not super intelligent).

      Delete
    2. I found it doable but the syntax wasn't obvious to me:
      std:string mycommand ="cp /tmp/a /tmp/b";
      subprocess:Popen({mycommand});

      Delete
  5. Articles can discuss the impact of social media on mental health in teenagers, including its effects on sleep and self-esteem. Digital Hikes Website Many people are using social media platforms like Facebook, Instagram etc.

    ReplyDelete