Sunday, 12 April 2015

Who moved my object ? No, not std::move!

It's been quite a long time since I really wanted something to blog about. So, I thought why not write up something on the new features introduced in C++11 which I found difficult to grasp or come in terms with initially.
It was not the lambda expression or the auto type deduction or the new c++11 threading memory model, but something much easier (Yes, easy because I understand it now ;) ), the move semantics and everything related to it.
The intention of this post is to ease your way into understanding move semantics and its 'not- so-obvious' fallacies.


Temporaries reloaded!


Anyone who has been working with C++ for sometime knows what I want to say here. And you would be right, there is nothing new with the temporaries as such in the new standard. What has been added is a way by which creation of temporaries does not add overhead to the run-time performance of your c++ code (and I know you want it as fast as possible, don't you?)
Let us consider an example code:

void my_func(const std::string& name) {
    names_vec.push_back(name);
}

std::string name("Galadriel");
my_func(name);                    // Passing lvalue
my_func(std::string("Gandalf"));  // Passing temporary string object
my_func("Goku");                  // Passing string literal

This looks like a 'good enough' piece of code. Passing temporaries like this will work since its lifetime would be extended since it gets bound to the const object.
And when doing 'push_back' inside the function, 'name' will get copied for the first two cases, whereas for the third case (string literal), a temporary string object needs to be created before copying it inside the 'names_vec' container.
Huh! Was that the best we could do ? Yeah, kind of in C++98/C++03. Hell no! for C++11. We can do much better in C++11, thanks to move semantics.

Welcome to move semantics


With C++11, a new type for temporary variables was born or standardized. This is called as 'R-Value Reference' , represented as 'T&&' (where T is any built-in or user defined type).
These R-value reference types gets bound to the temporaries or to other R-value references.

void my_func(std::string&& name) {
    names_vec.push_back(name);
}

my_func(std::string("Vegeta"));
std::string lval("lvalue");
my_func(lval);                // Compile error! my_func cannot accept lvalue

You see, how the temporary string object got accepted by our new function signature but got rejected at compile time for the lvalue! What I have shown you here is one way by which R-value reference type accepts parameters i.e as temporaries.

Now, the question is, how would I make 'my_func' accept my lvalue? Here is where 'std::move' comes into picture.

What does std::move do ? 



A hell lot of stuff you might be thinking, but no! All it does is a casting operation! See it for yourselves.
Below is the implementation of std::move in g++ 4.8.2 (Cleaned up a bit for easier understanding):

  template <typename T>
  typename std::remove_reference<T>::type&&
    move(T&& t) 
    { return static_cast<typename std::remove_reference<T>::type&&>(t); }

NOTE: 'std::remove_reference' is a type trait which as the name suggests provides the actual type passed in but without the reference. This is required because of the 'reference-collapsing' rule which we would be discussing later, but for the current context, it can be ignored.

If you ignore the 'remove_reference' type trait, the function template boils down to an unconditional cast to an R-value reference! That means, 'std::move' does not 'move' anything, to move or not is actually decided by the compiler. Doing an 'std::move' is just an request to the compiler to do move optimization.

So, I repeat the question, what would do to make lvalue to be accepted by 'my_func' ? Yes, you move the lvalue which would cast it to an r-value.

Where is the optimization ?



This is the meaty part of the discussion. In simple terms, what move means is to point to a different memory location where the required object is placed or allocated. So, instead of copying the object to a new memory location, just point it to its current location. This is obviously a lot more efficient than copying!
This can be shown easily by the below diagram:



As you can see from the diagram, moving vector 'vl' does not require copying the entire vector. 
But, the same is the case if we had passed it by reference or const reference, what is so special here ? To answer this question, lets look at another slightly bigger example (Read comments in code to understand):

#include <iostream>
#include <utility>
#include <vector>
#include <algorithm>

template<typename T>
void do_work_on_thread(T&& ip_vec) {
    std::for_each(ip_vec.begin(), ip_vec.end(), 
                    [](int v) { std::cout << v << std::endl; } );
    return;
}

void accept_vec(std::vector<int> vec) {       // First complete copy
    std::cout << "Accept vec non optimized" << std::endl;
    // do some work on the input vector
    // .....

    // Asynchronously work on the vector by feeding 
    // it to a thread.
    do_work_on_thread(vec);                   // Second complete copy
    return;
}

void accept_vec2(std::vector<int>&& vec) {    // No copy, vector just moved
    std::cout << "Accept vec optimized" << std::endl;
    // do some work on the input vector
    // .....

    // Asynchronously work on the vector by feeding
    // it to a thread.
    do_work_on_thread(std::move(vec));        // No copy, vector just moved
    return;
}

int main() {
    std::vector<int> v = {1,2,3,4,5,6,7,8,9};
    accept_vec(v);              // Old style
    accept_vec2(std::move(v));  // New move optimized style
    return 0;
}

Being a bit more smart, one copy can be avoided in the 'old style', but still one copy is required for the life time guarantees of the vector. Whereas with move, no copy is required at all!

UPDATE:
In the comments section, Pavel raised a question on where the second copy is happening in case of 'accept_vec' function. And he is right on target, there is no second copy happening there. The vector gets passed as a reference (Because of reference collapsing rules which we will probably discuss in next post).
So, the problem has become just more sever for me. For the 'accept_vec' case, I need to pass the vector by value to 'do_work_on_thread' since the vector is going to be handled asynchronously and I need to guarantee its life time and also at the same time, it should not pass by value if I am moving it.

Well, it seems like it can be achieved with some template magic.Here it goes:

#include <iostream>
#include <utility>
#include <type_traits>

class Test {
public:
    Test() {
        std::cout << "Cons" << std::endl;
    }   
    Test(const Test& other) {
        std::cout << "Copy Cons" << std::endl;
    }   
    Test(Test&& other) {
        std::cout << "Move cons" << std::endl;
    }   
    void operator= (const Test& other) {
        std::cout << "Assignment op" << std::endl;
    }   
    void operator= (Test&& other) {
        std::cout << "Move assignment op" << std::endl;
    }
};

template <typename T, bool U>
struct detail {
        // Constructor copies it to arg1
        detail(const T& param): arg1(param) { 
                std::cout << "Generic detail" << std::endl; 
        }
        typename std::remove_reference<T>::type arg1;
}; 

// Template specialization for detail when T is not a reference
// Here I am just assuming it to be an r-value reference, whereas
// in real scenario one should handle for other types as well.
template <typename T>
struct detail<T, false>  {
        // Constructor just moves the param to arg1
        detail(T& param): arg1(std::move(param)) { 
                std::cout << "Specialized detail" << std::endl; 
        }
        typename std::remove_reference<T>::type arg1;
};

/* Assume that this function
* will process 'arg' asynchronously.
* That means, fwd_ref must own the arg and its
* life time must not be controlled by any other frame
*/
template <typename T>
void fwd_ref(T&& arg) {
    std::cout << "Called fwd_ref" << std::endl;
    std::cout << std::is_reference<T>::value << std::endl;
    detail<T, std::is_reference<T>::value> d(arg);
    return;
}

int main() {
    Test t;
    fwd_ref(std::move(t)); // pass as R-value reference
    fwd_ref(t);            // Pass by value
    return 0;
}


So, in the above code, the magic happens in the 'detail' struct, which copies the constructor argument when it receives a reference, and move the argument when it is 'not-a' reference (which we consider as move for simplicity.
One thing to remember here is that there are other simpler ways to achieve the same with some design change. The only thing that made it complicated is the use of 'forwarding reference' assuming it to be used along with thread.


Move Constructors and Move Assignment


We all know that, if a class is managing a memory resource, it is important to provide the class with a copy constructor, assignment operator and destructor. Support for move operation in user defined classes is done via 'move constructors' and 'move assignment'. So, the 'Big three' becomes 'Big five' in C++11.
Lets look at an example right away(Would not be using smart pointers, apologies for that) [Compiled with g++ 4.8.2]:

#include <iostream>
#include <utility>
                  
class MemoryMgr {
public:
    MemoryMgr(size_t size):mp_block(new uint8_t[size]()),
                           m_blk_size(size) {
    }
        
    MemoryMgr(const MemoryMgr& other) {
        std::cout << "Copy cons" << std::endl;
        if (this == &other) return;
        mp_block = new uint8_t[other.m_blk_size]();
        m_blk_size = other.m_blk_size;
        std::copy(other.mp_block, other.mp_block + other.m_blk_size,
                  mp_block);
    }
/*
    MemoryMgr& operator=(MemoryMgr other) {
        std::cout << "Ass op" << std::endl;
        std::swap(other.mp_block, mp_block);
        std::swap(other.m_blk_size, m_blk_size);

        return *this;
    }   
*/ 
   MemoryMgr(MemoryMgr&& other) {
        std::cout << "Move copy cons" << std::endl;
        mp_block = other.mp_block;
        m_blk_size = other.m_blk_size;
        
        other.mp_block = nullptr;
        other.m_blk_size = 0;
    }

    MemoryMgr& operator=(MemoryMgr&& other) {
        std::cout << "Move assignment op" <<  std::endl;
        if (this == &other) return *this;

        delete[] mp_block;
        mp_block = other.mp_block;
        m_blk_size = other.m_blk_size;

        other.mp_block = nullptr;
        other.m_blk_size = 0;
    }


    virtual ~MemoryMgr() noexcept {
        delete[] mp_block;
        m_blk_size = 0;
    }

private:
    uint8_t* mp_block = nullptr;
    size_t m_blk_size = 0;
};


int main() {
    MemoryMgr obj1(100);
    MemoryMgr obj2(std::move(obj1)); // Move constructor will be called
                                     // NOTE: Once moved, obj1 should not be used again inside main

    MemoryMgr obj3(40);
    obj3 = std::move(obj2);          // Move assignment will be called. Much efficient than regular assignment op
                                     // NOTE: Once moved, obj2 should not be used again inside main

    return 0;
}

One can go through the example and check how the implementation is done for move constructor and move assignment operator. It should be very much clear that, the move operations are much more efficient and faster than the regular copy constructor and assignment operator.

NOTE: I have commented out the assignment operator, since the compiler was finding it ambiguous with the mover assignment operator. This is because, we are accepting the argument by value, which can accept r-value references as well. This can be corrected by making the assignment operator take input parameter by const reference. I leave this to the reader as an exercise :)


Not everything is green !


I would be very skeptic if everything looked so easy and straight-forward in C++. So, lets get down to the quirkiness.

1. Move is more efficient for head based containers/objects
Though all containers in the standard library support move operation, but not for all containers moving would be as cheap as for showed for std::vector. For eg, take the case for std::array, in which the container elements are directly inside std::array object.
Therefore, for this case, the move operation needs to be done for all the elements inside the array. If there is no move constructor provided for the object held by the array, then the time taken is as good as for the copy operation.

2. Once moved, the object state is valid but in unspecified state
Consider the below example:
    std::vector v1 = {1, 2, 3, 4};
    assert( v1.size() == 4 );            // Assertion is valid

    std::vector v2 = std::move(v1);
    assert( v2.size() == 4 );            // Assertion is valid
  
    assert( v1.size() == 4 );            // Assertion is INVALID, size() for v1 may or may not be 4

    v1 = v2;                             // v1 can be assigned to new vector


As can be seen from the example, once moved, it is incorrect to rely on the state of the moved object, for eg: size(), empty(), etc.
The moved object can be assigned with other similar objects sice that does not require any precondition checks.

3. Move constructor and Move assignment operators are not generated by default in many cases
Unlike copy constructor and assignment operator, compiler does not always (*) generate default move constructor and default assignment operator when not provided by the user.

NOTE(*): Not for all cases default constructor or default copy constructor or default assignment operator are implicitly created by the compiler.

a. Default move operations are generated only if user has not explicitly provided implementation for copy constructor or assignment operator or destructor.
b. Default move operations are not generated if the user has provided an implementation for copy constructor or assignment operator or destructor. In this case compiler thinks that something different  or extra needs to be done for moving the resource(s) managed by the class.

4. Do not move objects in return statement
Consider below example:
// Return by Copy version
Object create() {
    Object obj;
    return obj;
}


// Move Version
Object create() {
    Object obj;
    return std::move(obj);
}

Which one do you think would be more efficient, copy version or the move version ? The heading would have given it away anyways :) But, for the sake of explanation, its the copy version that would be more efficient. This is because of NRVO (Named return value optimization), which every decent compiler does (mostly with optimization flags turned on). If RVO/NRVO kicks in, the compiler can completely elide copy and move construction.

With NRVO, the object would be created directly at the destination, requiring no copy at all. This is called as 'copy ellision'.

With the move version, compiler does not do NRVO/RVO, but one has to depend upon the move constructor of Object.

But, NRVO/RVO is not applicable at all times, for eg: the return statements are provided under if-else statements! In this wouldn't it be better to move rather than depend on compiler ?? Cool down, as per the standard, if copy-ellision is not applicable, the 'std::move' is implicitly applied. :)

UPDATE (From Scott Meyers Modern effective C++) : NRVO/RVO optimization requires that we return the local object from the function. But, this is not the case if we return the moved object. The moved object is basically a reference to the object, not the object itself.

Code demonstrating the same [Compiled with g++ 4.8.2]:
#include <iostream>
#include <array>

class Object {
public:
    Object() {
        std::cout << "Cons" << std::endl;
    }   
    Object(const Object& other) {
        std::cout << "Copy Cons" << std::endl;
    }   
    Object(Object&& other) {
        std::cout << "Move cons" << std::endl;
    }   
    void operator= (const Object& other) {
        std::cout << "Assignment op" << std::endl;
    }   
    void operator= (Object&& other) {
        std::cout << "Move assignment op" << std::endl;
    }   
};

std::array<Object, 2> get_array_no_move() {
    std::cout << "Enter get_array" << std::endl;
    std::array<Object, 2> arr;
    std::cout << "Array init" << std::endl;
    arr[0] = Object();
    arr[1] = Object();
    std::cout << "Leaving" << std::endl;
    
    return arr;
}


std::array<Object, 2> get_array() {
    std::cout << "Enter get_array" << std::endl;
    std::array<Object, 2> arr;
    std::cout << "Array init" << std::endl;
    arr[0] = Object();
    arr[1] = Object();
    std::cout << "Leaving" << std::endl;
    
    return std::move(arr);
}

int main() {
    std::array<Object, 2> res = get_array();
    std::cout << "Got res" << std::endl;

    std::array<Object, 2> res2 = get_array_no_move();
    return 0;
}

If you look at the output (after compiling with -O2 optimization flag), it would be clearly visible that get_array_no_move does not have any move constructors called while returning the array object, but the move constructor gets called in case of get_array function. This is because for get_array_no_move the copy and move constructor got elided because of NRVO.

5. Move does not work with const objects
Obviously, how can a constant object move!! Well, the technical reason is that, doing a move may or may not modify the source object, which is against the requirement of const objects.
For this reason, the below example will compile, but you cannot reap the benefits of move:
class Test {
public:
    Test(const std::string& name): m_name(std::move(name)){
    }
private:
    std::string m_name;
};


Phew! So far so good. I was planning to continue with Forwarding references and  Perfect Forwarding in this same post, but now I think it would be better to have them in a separate post else someone would sue me for causing mental depression.

Will update the post with the link to the next post. Till then, wait-for-it.

Update : Part two of the series  

10 comments:

  1. Stephane Molina14 April 2015 at 04:44

    Move operators are also not generated if there's a user defined destructor

    ReplyDelete
    Replies
    1. Thanks for pointing it out, Stephane. I have updated the section.

      Delete
  2. Should it be a compile error here:
    do_work_on_thread(vec); // Second complete copy
    as do_work_on_thread cannot be called on l-value?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. I think it is a false alarm here. My fault.

      As it is a template, T for this function is std::vector& and ip_vec is std::vector& too.
      Then... Could you explain why it is "second complete copy" here, please?

      Delete
    3. Something ate the std::vector template parameter. It was is int.

      Delete
    4. "Then... Could you explain why it is "second complete copy" here, please?"
      Hmm..I do see now why you asked this question. For making the example simple enough, I left an important part :)
      Second copy is required so that the thread doing the work has access to it even when 'accept_vec' function has finished executing since 'do_work_on_thread' is assumed to be async function i.e running on another thread.
      Current code does not show this copy happening and it must be done explicitly by the user.

      Delete
    5. So, things just got more interesting :). The question boils down to, whether I can have a thread running a function with forwarding reference ?
      i.e whether I can have some magic inside the function with forward reference such that it copies the parameter when passed by value (With forward reference, pass-by-value becomes a reference due to the reference collapsing rule) and not-copy it when moved

      Delete
    6. Yes, it looks like I missed the "implicit" thing: do_work_in_thread keeps working in a separate thread.
      I think in the example in "undefined behaviour": as far as accept_vec finishes, vec gets destroyed and (because it was passed by reference) do_work_in_thread works on an object that got "destroyed".
      It is my "theory". And, I agree with you, it is interesting and worthwhile checking.

      Delete
    7. Yes. I have an added a new example under the same section as an 'UPDATE' which would take care of this automagically.
      Note that I have used template to solve it. It could have been done with if-else as well, but that would make the code dirty when one starts handling other types.

      Delete