Sunday, 20 September 2015

Slicing in C++

For the last few days, I have been playing around with Go language. I was basically doing the algorithms part so that I could in parallel write a concise yet beautiful code in its c++11 version as well (but you know thats quite ironic..). So, it was pretty much useless thing, but I wanted to do it anyways just for the fun part. That means, don't blow me away if you don't see me using any standard library functions for achieving something which I am intending to do.


Slices in Go

If you have not yet crossed your path with Go (I suggest you do), slices are basically your 'std::vector' counterpart in Go. Now there is a huge difference technically, but it is still a data structure which you would choose in Go if you are to store elements in memory contiguously.

Slices in Go abstracts away the underlying representation of the container actually storing the elements. But, it is usually an array, because thats what you want to store elements in contiguous memory location.

You can get down to its basics by reading through the below link:

Now what can you do with this slices ? Well, lets see this below example:

func edit_slice(elems []int) {
    for idx, _ := range elems {
        elems[idx] += 3
    }
}

func main() {
    elems := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
    edit_slice(elems[len(elems)/2:])
}

This is a pretty simple code which does nothing but increment each element from the last half by 3. It need not have been the last half, it could have been any 'range' (There is a reason why I have marked it as bold. Will come to that soon).

Now, lets see how someone would do that in c++11:
#include <vector>
#include <algorithm>

void edit_elems_1(std::vector<int>& vec,
                  std::vector<int>::iterator it1,
                  std::vector<int>::iterator it2)
{
    for (; it1 != it2; it1++) {
        *it1 += 3;
    }
}

void edit_elems_2(std::vector<int>& vec)
{
    std::for_each(vec.begin() + vec.size()/2, vec.end(),
                  [&](int& elem) { elem += 3; } );
}


int main() {
    std::vector<int> vec{1,2,3,4,5,6,7,8,9,10};
    edit_elems_1(vec, vec.begin() + vec.size()/2, vec.end());
    edit_elems_2(vec);
    return 0;
}

This time 'std::for_each' came to our rescue, otherwise it looks terrible ('edit_elems_1'). Even the 'std::for_each' is a bit harsh on the eyes compared to Go version.

Ranges in C++


But, this is not just me who is complaining about the inflexibility of 'iterators' in C++. Introducing the 'Range' concept is becoming more and more vocal in c++ world.

I believe it all started with the man 'Andrei Alexandrescu'. You can read through his 15 pages article to understand what he needs to convey:
http://www.informit.com/articles/article.aspx?p=1407357

Andrei implemented this concept in 'D' language. But our c++ experts wont accept defeat that easily, do they? We have got 'boost ranges' in answer to that. There is a series of posts by the author itself ('Eric Neibler') at his own weblog ('http://ericniebler.com/category/ranges/').

The implementation for the ranges in c++ seems to be very complex for me to comprehend right now. But there obviously has to be an easier way to implement what we can do in Go (or even python for that matter, but python is getting complex..huh, I will leave that for another talk).

So, what I have here is a POC for making use of slices in C++. There are many limitations in my design currently:
1) Not at all production ready. Its just a POC.
2) Some design issues. Adding new containers is bit contrived.
3) Many functionalities available in Go not yet supported.
4) Proper exception handling. With little more work it can be improved.
5) I am still not happy with the overall design. Did things in a hurry.

That said, I might not be continuing with this effort unless I really feel an urge to finish it off. So, If you guys think it would be cool or you think you have a much better way to do it, then surely let me know in the comments section.

Slices in C++

Lets just take a dive and see how an implementation of a naive binary search would look in Go and in C++ with my slice implementation.

Go binary search:
func binary_search_rec(arr []int, find int) bool {
    mid := len(arr) / 2

    if arr[mid] == find {
        return true
    }

    if mid == 0 {
        return false
    }

    if find > arr[mid] {
        return binary_search_rec(arr[mid+1:len(arr)], find)
    } else {
        return binary_search_rec(arr[0:mid], find)
    }
}

C++ binary search - with slices:
#include "slice.hpp" // My slice header

template <typename C>
bool binary_search(slice<C> slice, typename C::value_type val)
{
    int mid = len(slice)/2 ; 
                             
    if (slice[mid] == val) return true;
    if (mid == 0) return false;

    if (val > slice[mid]) return binary_search(slice(mid+1, end), val);
    else return binary_search(slice(beg,mid), val);
}

I am sure you can just take a guess and how it would look like without slices. So does this look anything close to 'cool' to you ? This technique can be used at all places where we want to operate on only a part of the data set at a time. Eg: QuickSort etc.

Also, note that the C++ version above just works well with a std::vector or std::string or std::array (Not testes, but should work), but thats not the gist of this comparison, its the terseness.

Besides these trivial examples, there are lots of cases where we need to have just the view of the container subset. For eg, while implementing string algorithms like pattern matching, you will always need to have a view of the substring of the text or whatever. The current available option is to call 'substr' method of string which will return you a copy of the string, now thats expensive. Otherwise, pass the start and end iterator which can be error prone.


Under the hood

The slice class adds a very thin wrapper around the container on which you want to create the slice. One thing it does currently is to derive from the base class i.e if you are creating a slice for a vector, then the slice<C> class is derived from vector. Here template parameter 'C' is the container itself.

Below example shows how you create a slice:

    std::vector<int> vec{1,2,3,4,5,6,7,8,9,10};

    // all elements
    auto vslice = make_slice(vec, 0, vec.size());
    assert(len(vslice) == 10);

    vslice = vslice(beg, 8); // slice of first 8 elements


1. 'make_slice' is a template function which creates a new slice for you.
2. 'operator()' is overloaded to provide the slicing functionality.
3. Two or more slices created by copy construction or by assignment shares the same container underneath since the slice just holds the reference to the container.
4. slice does not own the container. In the above case vector 'vec' is controlled by the user. If vec gets destroyed, no slice must be used after that.
5. 'len' is a global function for getting the length of the slice. Same as 'len' function in Go.

Source Code Location : https://github.com/arun11299/Slices-in-Cpp

The source is currently in a .cc file instead of a header (I am just too lazy sometimes). You can find few more examples on using slices in it.

Compilation : It should work just fine if you have enabled C++11. I have tested it on both Clang and g++ 4.8.


4 comments:

  1. You clearly missed C++11 for-range statement, that allows you to write

    for (int& elem : vec) {
    elem += 3;
    }

    which is even shorter than Go.

    ReplyDelete
    Replies
    1. Well, aren't we talking about slices here ? :)
      c++11 range for cannot be used on a slice or portion of the container. It always iterates the whole container from begin to end.
      Having said that, boost library does offer iterator range facility, but I feel it's usage bit non-intuitive.

      Delete
    2. Also, you might be interested in checking out 'array_view' feature which is part of GSL. I came to know about it last week while watching Bjarne's cppcon 2015 keynote. I think the idea behind it is the same.

      Delete