Sunday, 19 April 2015

Moving forward....

In the previous post, I tried my best explaining about 'move' semantics in C++11. In this post I am planning to cover up remaining things, namely:
1) Forwarding References
2) Reference Collapsing Rules
3) Forwarding Constructors
4) Type Limiting Forwarding Constructors
And as many examples as possible ! Also, all examples were compiled with g++ 4.8.2 and run on Ubuntu 14.02.

Since this post is a follow up on my previous post (the subject line does not indicate that), I would recommend to read through that before jumping here.

Link to previous post : Who moved my object ? No, not std::move!

Let's get started!!

Forwarding References

Now, since we are pretty much comfortable with R-Value references, lets go through some very simple examples and guess (it wouldn't be a guess if you know :) ) what would happen.

class Elf {
};

void take_elf(Elf&& elf) {  //..... (1)
    // do elf stuffs
}

template <typename T>
void templated_elf(T&& param) { // .. (6)
    // do elf stuffs 
}

int main() {
    Elf&& legolas = Elf();    // .... (2)  
    
    Elf&& dobby = legolas;    // .... (3)
    
    Elf rl_dobby = legolas;   // .... (4)

    take_elf(rl_dobby);     // .... (5)

    templated_elf(rl_dobby);// .... (7) 

    return 0;
}

Lets go through the code as per the numbering and find out issues as we go by them. Everything is being done with/on class Elf.

1. It is a function taking an r-value reference of type class Elf. Looks good.

2. Variable 'legolas' is an r-value reference and we are assigning it with an unnamed object or a temporary object of type class Elf. This is what the purpose of r-value reference is, i.e. binding with temporaries. So, all looks fine here.

[Important]
3. Here too, variable 'dobby' is an r-value reference, but in this case we are assigning it with 'legolas' instead of a temporary Elf object. The difference is that, a temporary variable is not an r-value reference . So, once a temporary is bound to an r-value reference for eg: 'legolas', it behaves just like an l-value from there on. That means, 'legolas' is an r-value reference at its point of definition at (2), but from there on, it behaves just like an l-value.
Thus, this statement would result in an compilation error (type mismatch), since we are trying to assign an l-value ('legolas') to an r-value reference which is 'dobby'.

4. If (3) is clear, this statement should appear as a regular assignment statement and there is nothing more to that. We are assigning 'legolas' to 'rl_dobby'. Since both are l-values here, there is no compilation error.

5. This is similar to (3). We are trying to call function 'take_elf' which expects an r-value reference as parameter with an l-value 'rl_dobby'. Same reasons mentioned for (3) holds true here as well.
So, this will result in compilation error (type mismatch). You would see the same error, if the call parameter is replaced to 'legolas'.

6. Welcome to 'Forwarding References'. You might now ask, what is the big difference ? It looks exactly like 'take_elf' except the fact that it is templated. Lets answer this in the next section.


Welcome to Forwarding References


So, your question was "what's the big difference? It's just templated", wasn't it ? Well, what makes it special is the template itself.

Had it been a regular template function, i.e. 'T' or 'T&' or 'T*' instead of 'T&&', it would have behaved just like its explicit type counterpart (mostly), except for the fact that the template version accepts all data types (the signature part).

But a template r-value reference or a forwarding reference (T&&) does not exhibit the same behavior as that of its explicit type counterpart for eg: 'take_elf'. The difference is that 'T&&' behaves like an r-value reference i.e binds with rvalues/ temporaries but they can also behave like an l-value !
This duality allows them to bind with both r-values and l-values.

Thus, statement (7) in the code is valid and does not result in compilation error like it did for (3). It is because of the dual nature of forwarding reference (T&&).

Besides this, they can also bind to const, volatile and both const-volatile objects making them 'greedy functions'. It is so greedy that we have a separate section on it.

We have already seen one such example of forwarding reference in my previous post, where a vector was being passed to a function 'do_work_on_thread' which used a forwarding reference. (Well, that example was slightly incorrect. There is an update beneath that section).

How to write a forwarding reference ?
Quoting from Scott Meyer's 'Effective Modern C++':
"""
For a reference to be universal, type deduction is necessary, but it's not sufficient. The form of the reference declaration must also be correct, and that form is quite constrained. It must be precisely "T&&" 
""" 
NOTE: Here, universal reference means forwarding reference. Initially, it was Scott Meyers who identified the need to have a different terminology for such references and he named it as 'universal' reference. Later, the standard committie standardized it as 'forwarding reference'.

Based upon the above, something like below is not a forwarding reference:

template <typename T>
void func(std::vector<T>&& arg) {...}

But this is a forwarding reference:
auto&& var = Test();


Reference Collapsing Rules

There are basically four rules associated with reference collapsing and is only applicable in case of forwarding references. These are a set of deduction rules which must be remembered while working with forwarding references.










The easiest way to remember is that, only when there are four '&', it collapses into an r-value reference, for all other cases it collapses into an l-value reference. :)

Below example depicts the first two rules:

#include <iostream>

template <typename T>
void func(T& val) {
    val += 10; // For the sake of simplicity,
               // hoping its always int :D
}

int main() {
    int a = 10; 
    int& ans = a;
    func(a);
    std::cout << a << " " << ans << std::endl; // prints 20 20

    int&& ans2 = std::move(a);
    func(a);
    std::cout << ans2 << std::endl; // prints 30

    return 0;
}


Below example depicts the remaining 2 rules which are more important.

#include <iostream>

template <typename T>
void func(T&& param) {
    param += 10; // Assuming its int
}

int main() {
    int a = 10; 

    func(a);
    std::cout << a << std::endl; // prints 20, Rule no 3

    func(std::move(a)) ; // Rule no 4
    return 0;
}


As you would have got it by now, it's because of these rules when an l-value is passed to a forwarding reference, it accepts it as a reference.

Now, you could ask, why these rules are required at all ? These rules are required for 'perfect forwarding' to work, which we will see in the next section.

Perfect Forwarding


Why is such a thing there in the first place ? Well, it is because of the dual nature of forwarding references which we had seen earlier. Due to that, basically two different types converged to become an l-value reference inside the function body. After that, there is no way (without std::forward and some template magic) to determine what was the exact type passed into the function.

Consider below example:
void do_work(??) { // What type of argument ?
}


template <typename T>
void wrapper(T&& args) {
    // do some work on argss
    do_work(args);
}

So, here you are. Inside function 'wrapper' when you call 'do_work', you want to copy 'args' if it was passed as l-value or pass it as r-value if it was passed as r-value. Without 'std::forward', you could have done some template magic (check previous blog post for the source) to get the same effect, but that is error prone and I am not even sure if it would be a practical solution.
Let's see how std::forward solves this issue.

std::forward

Like std::move, std::forward is also a function template that does nothing but casting. But unlike std::move which does an unconditional cast, std::forward does a conditional cast.

This is how the implementation of std::forward looks like in g++ 4.8.2 (Made it visually appealing).

template <typename T>
T&&  forward(typename std::remove_reference<T>::type& t) noexcept
{ 
    return static_cast<T&&>(t); 
}

template <typename T>
T&&  forward(typename std::remove_reference<T>::type&& t) noexcept
{   
      return static_cast<T&&>(t);
}

Basically, the first function is enough for std::forward to work and from my experimentation also, the second overload never got called. So not sure why the second overload is there. In our examples, we will consider std::forward as the first function only.

Now, lets get back to our previous example and try to solve it by using std::forward.

class Test {
};

void do_work(const Test& t) { 
    Test a(t);
    //.....
}

void do_work(Test&& t) {
    // .......
}


template <typename T>
void wrapper(T&& args) {
    // do some work on argss
    do_work(std::forward<T>(args));
}

We solved the problem by creating 2 overloads of the 'do_work' function. First one accepting the argument by reference and the second one accepting the argument as r-value reference.
It's the job of the std::forward to cast it to the correct type and thereby call the correct overload.

Let's see how that is done. Consider the case of an l-value first:
Test t;
wrapper(t);

// Inside wrapper std::forward is called with template
// type as T.
// Since an l-value was passed, due to reference collapsing rules
// std::forward$lt;T> would essentially be std::forward<Test>

//So std::forward would be instantiated like

Test& &&  forward(typename std::remove_reference<Test&>::type& t) noexcept
{ return static_cast<Test& &&>(t); }

// Applying reference collapsing rules on it
Test& forward(Test& t) noexcept
{ return static_cast<Test&>(t); }

So, as you see for the above case, 'std::forward' casts it to an l-value reference and thereby calling the first 'do_work' overload.

Lets see the same for an r-value reference:
Test t;
wrapper(std::move(t));

// Inside wrapper, since an r-value was passed, due to reference collapsing rules
// std::forward<T> would essentially be std::forward<Test&&>

// So std::forward would be instantiated like this
Test&& && forward(Test&& && t) noexcept {
    return static_cast<T&& &&>(t);
}

//Applying reference collapsing rules
Test&& forward(Test&& t) noexcept {
    return static_cast<T&&>(t);
}

As you can see, this time it correctly cast it to an r-value reference, thereby making it call the second overload of the 'do_work' function.
This is the reason why std::forward is called conditional cast.


With that we finish with the basics of forwarding references and std::forward. The thing to remember is that, move is called with r-value references and forward is called with forwarding references.

Where to go from here ?

Back to your life :)
Take a break. Once you are ready to face fresh challenges, go through below resources:
1) http://thbecker.net/articles/rvalue_references/section_01.html
2) Scott Meyers "Effective Modern C++". Some examples in this post are influenced by his explanation.
3) What we have covered are the basics. There are more to move semantics based on its usage. For eg: Move constructor and its fallacies.
4) Know about move only types like std::unique_ptr, std::thread etc and how to work with them.



Sunday, 12 April 2015

Who moved my object ? No, not std::move!

It's been quite a long time since I really wanted something to blog about. So, I thought why not write up something on the new features introduced in C++11 which I found difficult to grasp or come in terms with initially.
It was not the lambda expression or the auto type deduction or the new c++11 threading memory model, but something much easier (Yes, easy because I understand it now ;) ), the move semantics and everything related to it.
The intention of this post is to ease your way into understanding move semantics and its 'not- so-obvious' fallacies.


Temporaries reloaded!


Anyone who has been working with C++ for sometime knows what I want to say here. And you would be right, there is nothing new with the temporaries as such in the new standard. What has been added is a way by which creation of temporaries does not add overhead to the run-time performance of your c++ code (and I know you want it as fast as possible, don't you?)
Let us consider an example code:

void my_func(const std::string& name) {
    names_vec.push_back(name);
}

std::string name("Galadriel");
my_func(name);                    // Passing lvalue
my_func(std::string("Gandalf"));  // Passing temporary string object
my_func("Goku");                  // Passing string literal

This looks like a 'good enough' piece of code. Passing temporaries like this will work since its lifetime would be extended since it gets bound to the const object.
And when doing 'push_back' inside the function, 'name' will get copied for the first two cases, whereas for the third case (string literal), a temporary string object needs to be created before copying it inside the 'names_vec' container.
Huh! Was that the best we could do ? Yeah, kind of in C++98/C++03. Hell no! for C++11. We can do much better in C++11, thanks to move semantics.

Welcome to move semantics


With C++11, a new type for temporary variables was born or standardized. This is called as 'R-Value Reference' , represented as 'T&&' (where T is any built-in or user defined type).
These R-value reference types gets bound to the temporaries or to other R-value references.

void my_func(std::string&& name) {
    names_vec.push_back(name);
}

my_func(std::string("Vegeta"));
std::string lval("lvalue");
my_func(lval);                // Compile error! my_func cannot accept lvalue

You see, how the temporary string object got accepted by our new function signature but got rejected at compile time for the lvalue! What I have shown you here is one way by which R-value reference type accepts parameters i.e as temporaries.

Now, the question is, how would I make 'my_func' accept my lvalue? Here is where 'std::move' comes into picture.

What does std::move do ? 



A hell lot of stuff you might be thinking, but no! All it does is a casting operation! See it for yourselves.
Below is the implementation of std::move in g++ 4.8.2 (Cleaned up a bit for easier understanding):

  template <typename T>
  typename std::remove_reference<T>::type&&
    move(T&& t) 
    { return static_cast<typename std::remove_reference<T>::type&&>(t); }

NOTE: 'std::remove_reference' is a type trait which as the name suggests provides the actual type passed in but without the reference. This is required because of the 'reference-collapsing' rule which we would be discussing later, but for the current context, it can be ignored.

If you ignore the 'remove_reference' type trait, the function template boils down to an unconditional cast to an R-value reference! That means, 'std::move' does not 'move' anything, to move or not is actually decided by the compiler. Doing an 'std::move' is just an request to the compiler to do move optimization.

So, I repeat the question, what would do to make lvalue to be accepted by 'my_func' ? Yes, you move the lvalue which would cast it to an r-value.

Where is the optimization ?



This is the meaty part of the discussion. In simple terms, what move means is to point to a different memory location where the required object is placed or allocated. So, instead of copying the object to a new memory location, just point it to its current location. This is obviously a lot more efficient than copying!
This can be shown easily by the below diagram:



As you can see from the diagram, moving vector 'vl' does not require copying the entire vector. 
But, the same is the case if we had passed it by reference or const reference, what is so special here ? To answer this question, lets look at another slightly bigger example (Read comments in code to understand):

#include <iostream>
#include <utility>
#include <vector>
#include <algorithm>

template<typename T>
void do_work_on_thread(T&& ip_vec) {
    std::for_each(ip_vec.begin(), ip_vec.end(), 
                    [](int v) { std::cout << v << std::endl; } );
    return;
}

void accept_vec(std::vector<int> vec) {       // First complete copy
    std::cout << "Accept vec non optimized" << std::endl;
    // do some work on the input vector
    // .....

    // Asynchronously work on the vector by feeding 
    // it to a thread.
    do_work_on_thread(vec);                   // Second complete copy
    return;
}

void accept_vec2(std::vector<int>&& vec) {    // No copy, vector just moved
    std::cout << "Accept vec optimized" << std::endl;
    // do some work on the input vector
    // .....

    // Asynchronously work on the vector by feeding
    // it to a thread.
    do_work_on_thread(std::move(vec));        // No copy, vector just moved
    return;
}

int main() {
    std::vector<int> v = {1,2,3,4,5,6,7,8,9};
    accept_vec(v);              // Old style
    accept_vec2(std::move(v));  // New move optimized style
    return 0;
}

Being a bit more smart, one copy can be avoided in the 'old style', but still one copy is required for the life time guarantees of the vector. Whereas with move, no copy is required at all!

UPDATE:
In the comments section, Pavel raised a question on where the second copy is happening in case of 'accept_vec' function. And he is right on target, there is no second copy happening there. The vector gets passed as a reference (Because of reference collapsing rules which we will probably discuss in next post).
So, the problem has become just more sever for me. For the 'accept_vec' case, I need to pass the vector by value to 'do_work_on_thread' since the vector is going to be handled asynchronously and I need to guarantee its life time and also at the same time, it should not pass by value if I am moving it.

Well, it seems like it can be achieved with some template magic.Here it goes:

#include <iostream>
#include <utility>
#include <type_traits>

class Test {
public:
    Test() {
        std::cout << "Cons" << std::endl;
    }   
    Test(const Test& other) {
        std::cout << "Copy Cons" << std::endl;
    }   
    Test(Test&& other) {
        std::cout << "Move cons" << std::endl;
    }   
    void operator= (const Test& other) {
        std::cout << "Assignment op" << std::endl;
    }   
    void operator= (Test&& other) {
        std::cout << "Move assignment op" << std::endl;
    }
};

template <typename T, bool U>
struct detail {
        // Constructor copies it to arg1
        detail(const T& param): arg1(param) { 
                std::cout << "Generic detail" << std::endl; 
        }
        typename std::remove_reference<T>::type arg1;
}; 

// Template specialization for detail when T is not a reference
// Here I am just assuming it to be an r-value reference, whereas
// in real scenario one should handle for other types as well.
template <typename T>
struct detail<T, false>  {
        // Constructor just moves the param to arg1
        detail(T& param): arg1(std::move(param)) { 
                std::cout << "Specialized detail" << std::endl; 
        }
        typename std::remove_reference<T>::type arg1;
};

/* Assume that this function
* will process 'arg' asynchronously.
* That means, fwd_ref must own the arg and its
* life time must not be controlled by any other frame
*/
template <typename T>
void fwd_ref(T&& arg) {
    std::cout << "Called fwd_ref" << std::endl;
    std::cout << std::is_reference<T>::value << std::endl;
    detail<T, std::is_reference<T>::value> d(arg);
    return;
}

int main() {
    Test t;
    fwd_ref(std::move(t)); // pass as R-value reference
    fwd_ref(t);            // Pass by value
    return 0;
}


So, in the above code, the magic happens in the 'detail' struct, which copies the constructor argument when it receives a reference, and move the argument when it is 'not-a' reference (which we consider as move for simplicity.
One thing to remember here is that there are other simpler ways to achieve the same with some design change. The only thing that made it complicated is the use of 'forwarding reference' assuming it to be used along with thread.


Move Constructors and Move Assignment


We all know that, if a class is managing a memory resource, it is important to provide the class with a copy constructor, assignment operator and destructor. Support for move operation in user defined classes is done via 'move constructors' and 'move assignment'. So, the 'Big three' becomes 'Big five' in C++11.
Lets look at an example right away(Would not be using smart pointers, apologies for that) [Compiled with g++ 4.8.2]:

#include <iostream>
#include <utility>
                  
class MemoryMgr {
public:
    MemoryMgr(size_t size):mp_block(new uint8_t[size]()),
                           m_blk_size(size) {
    }
        
    MemoryMgr(const MemoryMgr& other) {
        std::cout << "Copy cons" << std::endl;
        if (this == &other) return;
        mp_block = new uint8_t[other.m_blk_size]();
        m_blk_size = other.m_blk_size;
        std::copy(other.mp_block, other.mp_block + other.m_blk_size,
                  mp_block);
    }
/*
    MemoryMgr& operator=(MemoryMgr other) {
        std::cout << "Ass op" << std::endl;
        std::swap(other.mp_block, mp_block);
        std::swap(other.m_blk_size, m_blk_size);

        return *this;
    }   
*/ 
   MemoryMgr(MemoryMgr&& other) {
        std::cout << "Move copy cons" << std::endl;
        mp_block = other.mp_block;
        m_blk_size = other.m_blk_size;
        
        other.mp_block = nullptr;
        other.m_blk_size = 0;
    }

    MemoryMgr& operator=(MemoryMgr&& other) {
        std::cout << "Move assignment op" <<  std::endl;
        if (this == &other) return *this;

        delete[] mp_block;
        mp_block = other.mp_block;
        m_blk_size = other.m_blk_size;

        other.mp_block = nullptr;
        other.m_blk_size = 0;
    }


    virtual ~MemoryMgr() noexcept {
        delete[] mp_block;
        m_blk_size = 0;
    }

private:
    uint8_t* mp_block = nullptr;
    size_t m_blk_size = 0;
};


int main() {
    MemoryMgr obj1(100);
    MemoryMgr obj2(std::move(obj1)); // Move constructor will be called
                                     // NOTE: Once moved, obj1 should not be used again inside main

    MemoryMgr obj3(40);
    obj3 = std::move(obj2);          // Move assignment will be called. Much efficient than regular assignment op
                                     // NOTE: Once moved, obj2 should not be used again inside main

    return 0;
}

One can go through the example and check how the implementation is done for move constructor and move assignment operator. It should be very much clear that, the move operations are much more efficient and faster than the regular copy constructor and assignment operator.

NOTE: I have commented out the assignment operator, since the compiler was finding it ambiguous with the mover assignment operator. This is because, we are accepting the argument by value, which can accept r-value references as well. This can be corrected by making the assignment operator take input parameter by const reference. I leave this to the reader as an exercise :)


Not everything is green !


I would be very skeptic if everything looked so easy and straight-forward in C++. So, lets get down to the quirkiness.

1. Move is more efficient for head based containers/objects
Though all containers in the standard library support move operation, but not for all containers moving would be as cheap as for showed for std::vector. For eg, take the case for std::array, in which the container elements are directly inside std::array object.
Therefore, for this case, the move operation needs to be done for all the elements inside the array. If there is no move constructor provided for the object held by the array, then the time taken is as good as for the copy operation.

2. Once moved, the object state is valid but in unspecified state
Consider the below example:
    std::vector v1 = {1, 2, 3, 4};
    assert( v1.size() == 4 );            // Assertion is valid

    std::vector v2 = std::move(v1);
    assert( v2.size() == 4 );            // Assertion is valid
  
    assert( v1.size() == 4 );            // Assertion is INVALID, size() for v1 may or may not be 4

    v1 = v2;                             // v1 can be assigned to new vector


As can be seen from the example, once moved, it is incorrect to rely on the state of the moved object, for eg: size(), empty(), etc.
The moved object can be assigned with other similar objects sice that does not require any precondition checks.

3. Move constructor and Move assignment operators are not generated by default in many cases
Unlike copy constructor and assignment operator, compiler does not always (*) generate default move constructor and default assignment operator when not provided by the user.

NOTE(*): Not for all cases default constructor or default copy constructor or default assignment operator are implicitly created by the compiler.

a. Default move operations are generated only if user has not explicitly provided implementation for copy constructor or assignment operator or destructor.
b. Default move operations are not generated if the user has provided an implementation for copy constructor or assignment operator or destructor. In this case compiler thinks that something different  or extra needs to be done for moving the resource(s) managed by the class.

4. Do not move objects in return statement
Consider below example:
// Return by Copy version
Object create() {
    Object obj;
    return obj;
}


// Move Version
Object create() {
    Object obj;
    return std::move(obj);
}

Which one do you think would be more efficient, copy version or the move version ? The heading would have given it away anyways :) But, for the sake of explanation, its the copy version that would be more efficient. This is because of NRVO (Named return value optimization), which every decent compiler does (mostly with optimization flags turned on). If RVO/NRVO kicks in, the compiler can completely elide copy and move construction.

With NRVO, the object would be created directly at the destination, requiring no copy at all. This is called as 'copy ellision'.

With the move version, compiler does not do NRVO/RVO, but one has to depend upon the move constructor of Object.

But, NRVO/RVO is not applicable at all times, for eg: the return statements are provided under if-else statements! In this wouldn't it be better to move rather than depend on compiler ?? Cool down, as per the standard, if copy-ellision is not applicable, the 'std::move' is implicitly applied. :)

UPDATE (From Scott Meyers Modern effective C++) : NRVO/RVO optimization requires that we return the local object from the function. But, this is not the case if we return the moved object. The moved object is basically a reference to the object, not the object itself.

Code demonstrating the same [Compiled with g++ 4.8.2]:
#include <iostream>
#include <array>

class Object {
public:
    Object() {
        std::cout << "Cons" << std::endl;
    }   
    Object(const Object& other) {
        std::cout << "Copy Cons" << std::endl;
    }   
    Object(Object&& other) {
        std::cout << "Move cons" << std::endl;
    }   
    void operator= (const Object& other) {
        std::cout << "Assignment op" << std::endl;
    }   
    void operator= (Object&& other) {
        std::cout << "Move assignment op" << std::endl;
    }   
};

std::array<Object, 2> get_array_no_move() {
    std::cout << "Enter get_array" << std::endl;
    std::array<Object, 2> arr;
    std::cout << "Array init" << std::endl;
    arr[0] = Object();
    arr[1] = Object();
    std::cout << "Leaving" << std::endl;
    
    return arr;
}


std::array<Object, 2> get_array() {
    std::cout << "Enter get_array" << std::endl;
    std::array<Object, 2> arr;
    std::cout << "Array init" << std::endl;
    arr[0] = Object();
    arr[1] = Object();
    std::cout << "Leaving" << std::endl;
    
    return std::move(arr);
}

int main() {
    std::array<Object, 2> res = get_array();
    std::cout << "Got res" << std::endl;

    std::array<Object, 2> res2 = get_array_no_move();
    return 0;
}

If you look at the output (after compiling with -O2 optimization flag), it would be clearly visible that get_array_no_move does not have any move constructors called while returning the array object, but the move constructor gets called in case of get_array function. This is because for get_array_no_move the copy and move constructor got elided because of NRVO.

5. Move does not work with const objects
Obviously, how can a constant object move!! Well, the technical reason is that, doing a move may or may not modify the source object, which is against the requirement of const objects.
For this reason, the below example will compile, but you cannot reap the benefits of move:
class Test {
public:
    Test(const std::string& name): m_name(std::move(name)){
    }
private:
    std::string m_name;
};


Phew! So far so good. I was planning to continue with Forwarding references and  Perfect Forwarding in this same post, but now I think it would be better to have them in a separate post else someone would sue me for causing mental depression.

Will update the post with the link to the next post. Till then, wait-for-it.

Update : Part two of the series