- Part-1 : Getting to know the basics
- Part-3 : Performance benchmarks
In the
previous post we saw some basic type traits and metafunctions that are used in the implementation of
std::mem_fn and
std::function. The main gist of this post is to understand the implementation of
std::function in
libstd++.
Also, I would have liked to talk about things like:
- The performance penalty of using std::function.
- When to use std::function.
- Some performance benchmarks.
I will leave the above topics as the centre of discussion in the third part of the series (This was supposed to be the sentinel post as per my original plan). Discussing them here will just make this post a torture to read.
A naive implementation
Library development in C++ is tough. It becomes even more intricate when it is supposed to be generic. The most difficult part or rather detail oriented part is the type system itself. Dealing with the types together with their qualifications results in lot of metaprogramming boilerplate code and if you forget something it
could lead to some nasty UB.
So, let's first try to implement our own but rather very simple version of
std::function. I will be skipping
a-great-many required details to keep the code small.
The requirements
1. Should be able to invoke any callable target.
2. The underlying type of the callable object must be hidden i.e the only type information that should be visible is the pure function signature only.
3. The target callable object must be valid as long as the function object is valid.
I have kept bare minimum requirements for my implementation to keep off from doing metaprogramming hackery which we would be seeing anyways while going through the implementation of
std::function.
The first attempt
This version just implements the basic interface of Function and also implements the specialization for function pointer. In the later versions (or attempts), the interface developed for Function would more or less remain the same only the specialization required to handle other types of callable objects would be handled.
template <typename> class Function;
template <typename Ret, typename... Args>
class Function<Ret(Args...)>
{
public:
using implementation = detail::function_impl_base<Ret, Args...>;
//Default constructor
Function() = default;
template <typename Callable>
Function(Callable f): // Requires callable be copyable
impl_base_(std::make_unique<detail::func_impl<Callable>>(
std::move(f), detail::constructor_tag()))
{}
// Copy constructor
Function(const Function& other)
{
this->impl_base_.reset(other.impl_base_->clone());
}
// Copy Assignment
Function& operator=(Function other)
{
this->impl_base_.reset(other.impl_base_.release());
return *this;
}
template <typename Callable>
Function& operator=(Callable cb)
{
this->impl_base_ =
std::make_unique<detail::func_impl<Callable>>(
std::move(cb), detail::constructor_tag());
return *this;
}
Ret operator()(Args&&... args)
{
assert (impl_base_.get());
return (*impl_base_)(std::forward<Args>(args)...);
}
private:
std::unique_ptr<implementation> impl_base_ = nullptr;
};
Nothing much fancy. We have the same signature as that of
std::function i.e the type
Function is not aware about the type of the
call target. That means that for function pointer or member function pointer or functor, the syntax of the Function object declaration would be exactly the same (provided the function signature is same of course) even though the actual types of all these callable objects are different. This is achieved using a technique called
'Type Erasure'.
Type Erasure
I am not intending to explain type-erasure in detail here. There are already a lot of content on it floating on the web. boost::any is one of the very basic example of type-erasure. shared_ptr, std::function are the other that I know of in the standard library that makes use of this technique.
Type-erasure in it's basic form is implemented by making use of polymorphism and thus requires dynamic memory allocation. This is how my implementation of Function does the type-erasure.
In the above code function_impl_base is the base class defining the interface. The derived classes implementing the specializations for different types of Callable types must implement the functions provided by the interface.
namespace detail {
struct constructor_tag {};
template <typename Ret, typename... Args>
class function_impl_base
{
public:
virtual Ret operator()(Args&&... args) const = 0;
virtual function_impl_base* clone() = 0;
virtual ~function_impl_base() {}
};
template <typename T> class func_impl;
// Specialization for Function pointers
template <typename Ret, typename... Args>
class func_impl<Ret(*)(Args...)> final : public function_impl_base<Ret, Args...>
{
public:
using callable_t = Ret(*)(Args...);
/*
* This is a greedy constructor. constructor_tag is just used so as
* to not interfere with regular copy-constructor.
*/
template <typename F>
func_impl(F&& callable, constructor_tag): callable_(std::move(callable))
{}
func_impl(const func_impl&) = default;
func_impl& operator=(func_impl&) = default;
Ret operator()(Args&&... args) const
{
return callable_(std::forward<Args>(args)...);
}
func_impl* clone()
{
return (new func_impl(*this));
}
private:
callable_t callable_;
};
}
Let's now look at the constructor of
Function in a bit more detail:
template <typename Callable>
Function(Callable f): // Requires callable be copyable
impl_base_(std::make_unique<detail::func_impl<Callable>>(std::move(f), detail::constructor_tag()))
{}
The constructor is a template constructor and is the only part of
Function class (lets forget about assignment and copy constructors for the time being) which actually knows about the type of the callable object. But this knowledge is lost as soon as the constructor exits. But, we need to save this information in some form, right ? This is where inheritance comes into picture. Using inheritance, the type of the information is saved as RTTI (Run Time Type Information) probably in the
vtable. This is exactly the purpose of
function_impl_base.
The class
func_impl<Ret(*)(Args...)> inherits from
function_impl_base and implements the call operator.
The final attempt
Not much difference, only added specialization for member function pointer and functors:
namespace detail {
template <typename Ret, typename... Args>gt;
class function_impl_base
{
public:
virtual Ret operator()(Args&&... args) = 0;
virtual function_impl_base* clone() = 0;
virtual ~function_impl_base() {}
};
template <typename Signature, typename Functor>gt; class func_impl;
// Specialization for Function pointers
template <typename Ret, typename... Args>gt;
class func_impl<Ret(Args...), Ret(*)(Args...)>gt; final
: public function_impl_base<Ret, Args...>gt;
{
public:
using callable_t = Ret(*)(Args...);
// The need for constructor tag removed in this version
func_impl(callable_t callable): callable_(std::move(callable))
{}
func_impl(const func_impl&) = default;
func_impl& operator=(func_impl&) = default;
Ret operator()(Args&&... args)
{
return callable_(std::forward<Args>gt;(args)...);
}
func_impl* clone()
{
return (new func_impl(*this));
}
private:
callable_t callable_;
};
// Specialization for Functors
template <typename Functor, typename Ret, typename... Args>gt;
class func_impl<Ret(Args...), Functor>gt;
: public function_impl_base<Ret, Args...>gt;
{
public:
using callable_t = Functor;
func_impl(Functor callable): callable_(std::move(callable))
{}
func_impl(const func_impl&) = default;
func_impl& operator=(func_impl&) = default;
Ret operator()(Args&&... args)
{
return callable_(std::forward<Args>gt;(args)...);
}
func_impl* clone()
{
return (new func_impl(*this));
}
private:
callable_t callable_;
};
// Specialization for Member function pointers
// Leveraging the use of mem_fn
template <typename Class, typename Member, typename Ret, typename... Args>gt;
class func_impl<Ret(Args...), Member Class::*>gt;
: public function_impl_base<Ret, Args...>gt;
{
public:
using callable_t = Member (Class::*);
func_impl(callable_t callable): callable_(std::move(callable))
{}
func_impl(const func_impl&) = default;
func_impl& operator=(func_impl&) = default;
Ret operator()(Args&&... args)
{
return call(std::forward<Args>gt;(args)...);
}
template <typename ClassType, typename... Params>gt;
Ret call(ClassType obj, Params&&... args)
{
return std::mem_fn(callable_)(obj, std::forward<Params>gt;(args)...);
}
func_impl* clone()
{
return (new func_impl(*this));
}
private:
callable_t callable_;
};
}
template <typename>gt; class Function;
template <typename Ret, typename... Args>gt;
class Function<Ret(Args...)>gt;
{
public:
using implementation = detail::function_impl_base<Ret, Args...>gt;;
using call_signature = Ret(Args...);
//Default constructor
Function() = default;
// Copy constructor
Function(const Function& other)
{
this->gt;impl_base_.reset(other.impl_base_->gt;clone());
}
// Copy Assignment
Function& operator=(Function other)
{
this->gt;impl_base_.reset(other.impl_base_.release());
return *this;
}
template <typename Callable>gt;
Function& operator=(Callable cb)
{
this->gt;impl_base_ =
std::make_unique<detail::func_impl<call_signature, Callable>gt;>gt;(std::move(cb));
return *this;
}
// constructor
template <typename Callable>gt;
Function(Callable f): // Requires callable be copyable
impl_base_(std::make_unique<detail::func_impl<call_signature, Callable>gt;>gt;(std::move(f)))
{
}
Ret operator()(Args&&... args)
{
assert (impl_base_.get());
return (*impl_base_)(std::forward<Args>gt;(args)...);
}
explicit operator bool() const noexcept
{
return impl_base_ != nullptr;
}
private:
std::unique_ptr<implementation>gt; impl_base_ = nullptr;
};
The above code can be accessed from
HERE.
Issues with the custom implementation
1. Memory allocation. Dynamic memory allocation is costly. It is probably ok if you know you are not going to create many instances of Function object and pass it around. But it's a big no-no in a performance critical code. This can be resolved in some cases by doing SBO (Small Buffer Optimization), which is what almost all implementation does.
2. No type checking at all. Since the type of the actual callable object gets erased after constructor and assignment operators. It is better to check whether the user is correctly using the function object. If not, it should be a compile time error rather than leading to subtle undefined behaviour in the code.
3. Overhead of a virtual function call and that of an additional call to the callable object. This can be pretty bad, if you going to call your implementation functions via Function.
4. The cost of generic programming. This is my personal opinion, customized solution can be made faster than the above Function implementation, but when you have to use something like Function as part of your interface to client code, then this is the way we have to do and improve upon it. This is something which I would like to touch upon in the final part of this series.
The Implementation of std::function
In this section we will see the implementation of
std::function in
libstd++ 6.1. We will be making use of some of the type traits discussed in
Part-1 and will see some new ones here. Let's first lookup at some concepts and tricks individually and connect them up later.
Location Invariance
The metafunction for this check is:
template <typename T>
struct is_location_invariant
: is_trivially_copyable<T>::type
{ };
This is one of the checks that is made to determine if SBO can be performed on the passed callable type or not. This is mainly relevant for Functors, since function pointer and member function pointers are trivially copyable. So, it is necessary that one must make sure that his/her functor implements the
TriviallyCopyable concept.
For the lazy people (from cppreference) :
- Every copy constructor is Trivial or deleted (since C++14)
- Every move constructor is Trivial or deleted (since C++14)
- Every copy assignment operator is Trivial or deleted (since C++14)
- Every move assignment operator is Trivial or deleted (since C++14)
- at least one copy constructor, move constructor, copy assignment operator, or move assignment operator is non-deleted
| (since C++14) |
- Trivial non-deleted (since C++14) destructor
This implies that the class has no virtual functions or virtual base classes.
Scalar types and arrays of TriviallyCopyable
objects are TriviallyCopyable
as well, as well as the const-qualified (but not volatile-qualified) versions of such types.
No-Copy Types
This is nothing but the types of callables that does not need to be copied:
class Undefined_class;
union Nocopy_types
{
void* M_object;
const void* M_const_object;
void (*M_function_pointer)();
void (Undefined_class::*M_member_pointer)();
};
This basically means, below types are no-copy types:-
- Pointer to a callable object.
- Pointer to a const callable object.
- A function pointer.
- A member function pointer.
The lack of any template parameters should give you an hint that the actual types mentioned inside the union is not going to be used. Maybe it's the size of the union ? We will get to that shortly.
Small Buffer Optimization
Let's see the data structure employed for the all important small buffer optimization by libstd++
union Any_data
{
void* M_access() { return &M_pod_data[0]; }
const void* M_access() const { return &M_pod_data[0]; }
template <typename T>
T& M_access() {
return *static_cast <T*>(M_access());
}
template <typename T >
const T& M_access() const {
return *static_cast<const T*>(M_access());
}
Nocopy_types M_unused;
char M_pod_data[sizeof(Nocopy_types)];
};
It is just a char buffer having a size of sizeof(Nocopy_types) bytes, which should be 8 bytes on 64 bit platform. This is another constraint that the functor must have other than the location invariance, for the implementation to perform SBO.
If the size of your functor is more than 8 bytes, then you will have to pay for one dynamic allocation.
The Function_base class
This class basically deals with the small buffer or Any_data directly. It makes the decision of what needs to be stored in Any_data i.e whether it should be the callable object as it is or the memory location of the callable object after allocating it on heap.
There are 2 inner classes named Base_manager and Ref_manager (a specialization for reference_wrapped functors), which does the actual dealing with Any_data.
Lets dig into the implementation of Base_manager, which would give us a lot of idea on what is happening behind the scenes.
template<typename Functor>
class Base_manager
{
protected:
static const bool stored_locally =
(is_location_invariant<Functor>::value
&& sizeof(Functor) <= M_max_size // 8 bytes on 64 bit
&& alignof(Functor) <= M_max_align
&& (M_max_align % alignof(Functor) == 0));
typedef integral_constant<bool, stored_locally> Local_storage;
// Retrieve a pointer to the function object
static Functor*
M_get_pointer(const Any_data& source)
{
const Functor* ptr =
stored_locally? std::addressof(source.M_access<Functor>())
/* have stored a pointer */ : source.M_access<Functor*>();
return const_cast<Functor*>(ptr);
}
// Clone a location-invariant function object that fits within
// an _Any_data structure.
static void
M_clone(Any_data& dest, const Any_data& source, true_type)
{
new (dest.M_access()) Functor(source.M_access<Functor>());
}
// Clone a function object that is not location-invariant or
// that cannot fit into an _Any_data structure.
static void
M_clone(Any_data& dest, const Any_data& source, false_type)
{
dest.M_access<Functor*>() =
new Functor(*source.M_access<Functor*>());
}
// Destroying a location-invariant object may still require
// destruction.
static void
M_destroy(Any_data& victim, true_type)
{
victim.M_access<Functor>().~Functor();
}
// Destroying an object located on the heap.
static void
M_destroy(Any_data& victim, false_type)
{
delete victim.M_access<Functor*>();
}
public:
static bool
M_manager(Any_data& dest, const Any_data& source,
Manager_operation op)
{
//skipped
........
}
static void
M_init_functor(Any_data& functor, Functor&& f)
{ M_init_functor(functor, std::move(f), Local_storage()); }
private:
static void
M_init_functor(Any_data& functor, Functor&& f, true_type)
{ new (functor.M_access()) Functor(std::move(f)); }
static void
M_init_functor(Any_data& functor, Functor&& f, false_type)
{ functor.M_access<Functor*>() = new Functor(std::move(f)); }
};
Now that's something. The integral constant
Local_storage is set to true if the
functor can be stored inside the
Any_data buffer else it will be false and the other member functions gets called based on that using
tag-dispatching. For eg. check out the definitions for
M_clone member function.
Let's look at functions
M_init_functor and
M_get_pointer in bit more detail to understand how the callable object is stored and retrieved by std::function.
static void
M_init_functor(Any_data& functor, Functor&& f)
{ M_init_functor(functor, std::move(f), Local_storage()); }
static void
M_init_functor(Any_data& functor, Functor&& f, true_type)
{ new (functor.M_access()) Functor(std::move(f)); }
static void
M_init_functor(Any_data& functor, Functor&& f, false_type)
{ functor.M_access<Functor*>() = new Functor(std::move(f)); }
Considering the passed callable object meets the requirement for being stored on stack,
Local_storage would be of
true_type, hence the second M_init_functor will get called, which makes use of placement new operator to construct the object on the stack itself.
If the callable object does not meet the requirement for being stored locally i.e it is either not
TriviallyCopyable or has data members making its size more than what is allowed (i.e 8 bytes on 64 bit platform), then the callable object is created on heap and its memory location gets stored in the
Any_data buffer. This is what the third
M_init_functor function does.
static Functor*
M_get_pointer(const Any_data& source)
{
const Functor* ptr =
stored_locally ? std::addressof(source.M_access<Functor>())
: source.M_access<Functor*>();
return const_cast<Functor*>(ptr);
}
This must be now easier to reason about. Depending upon whether the callable object is stored locally or not,
M_get_pointer gets the address of the callable object. The other member functions are just based upon this technique.
The
Ref_manager is just basically a specialization to take care of functor passed via a
reference_wrapper. Which is something one must consider to do if their function object size is greater than 8 bytes (on 64 bit platform) AND if the scope of the callable object outlives the scope of the
std::function object.
So, having seen
Base_manager and
Ref_manager class, lets see how Function_base class looks like:
class Function_base {
public:
static const std::size_t M_max_size = sizeof(Nocopy_types);
static const std::size_t M_max_align = alignof(Nocopy_types);
template <typename Functor>
class Base_manager { ... };
template <typename Functor>
class Ref_manager : public Base_manager <Functor*>
{
.....
};
typedef bool (*Manager_type)(Any_data&, const Any_data&,
Manager_operation);
Any_data M_functor;
Manager_type M_manager;
};
As one can see,
Manager_type is a function pointer pointing to either
Base_manager::M_manager or
Ref_manager::M_manager function. These functions basically fill the destination
Any_data with the stored callable object from the source
Any_data based upon the operation. Something like below:
static bool
M_manager(Any_data& dest, const Any_data& source,
Manager_operation op)
{
switch (op)
{
case get_functor_ptr:
dest.M_access <Functor*>() = M_get_pointer(source);
break;
case clone_functor:
M_clone(dest, source, Local_storage());
break;
case destroy_functor:
M_destroy(dest,Local_storage());
break;
}
return false;
}
Function Handlers
The function handlers are kind of similar in functionality to func_impl in my custom implementation i.e to invoke the actual callable object. The classes which we saw till now were to only manage the storage part, they were not concerned with invoking the call operator.
Function handler classes are derived from either Function_base::Base_manager or Function_base::Ref_manager based upon the functor type. The function handlers are also specialized for plain function pointers, member pointers and for functors.
We will be seeing only few of those specialization.
1. Specialization for plain function pointer.
template<typename Signature, typename Functor>
class Function_handler;
template<typename Res, typename Functor, typename... ArgTypes>
class Function_handler<Res(ArgTypes...), Functor>
: public Function_base::Base_manager<Functor>
{
typedef Function_base::Base_manager<Functor> Base;
public:
static Res
M_invoke(const Any_data& functor, ArgTypes&&... args)
{
return (*Base::M_get_pointer(functor))(
std::forward<ArgTypes>(args)...);
}
};
2. Specialization for Member pointer.
template<typename Class, typename Member, typename Res,
typename... ArgTypes>
class Function_handler<Res(ArgTypes...), Member Class::*>
: public Function_handler<void(ArgTypes...), Member Class::*>
{
typedef Function_handler<void(ArgTypes...), Member Class::*>
Base;
public:
static Res
M_invoke(const Any_data& functor, ArgTypes&&... args)
{
return std::mem_fn(Base::M_get_pointer(functor)->value)(
std::forward<ArgTypes>(args)...);
}
};
This is very similar to what we did for member pointers in our custom implementation i.e let
std::mem_fn do the heavy lifting of taking care of both member data pointer and member function pointer.
Design uptill now
Specialization of a regular functor is also not much different than what we saw for plain function pointer. The design is like below:
1.
Function_base is responsible for storing and retrieving the pointers to the callable object.
2. The function handler objects fetches the pointer to the callable objects from
Function_base and invokes them with the passed arguments.
The std::function class
This is what we all have been waiting for.But, you would be surprised (or not) to know that we have already covered all the heavy lifting part ! function class is more about the interface it provides, the type checkings and conversions.
The basic interface is pretty close to what we implemented in our custom solution, so we will be only looking at the interesting parts i.e the things we left out from our custom implementation.
A stripped down version of std::function:
template < typename Res, typename... Args >
class function<Res(Args...)>
: private Function_base
{
typedef Res Signature(Args...);
public:
function(): Function_base() {}
template <typename Functor,
typename = Requires<not<is_same<Functor, function>>, void>,
typename = Requires<Callable<Functor>, void>>
function(Functor);
Res operator()(ArgTypes... args) const;
private:
using _Invoker_type = Res (*)(const Any_data&, ArgTypes&&...);
Invoker_type M_invoker;
};
template <typename Res, typename... ArgTypes>
template <typename Functor, typename, typename>
function<Res(ArgTypes...)>::
function(Functor f)
: Function_base()
{
typedef Function_handler <Signature, Functor> My_handler;
My_handler::M_init_functor(M_functor, std::move(f));
M_invoker = &My_handler::M_invoke;
M_manager = &My_handler::M_manager;
}
This is the part of code which I found pretty interesting. It is interesting because, it does not make use of inheritance the way we used in our custom implementation. With this kind of implementation, it is not required to do any dynamic allocation unless required because of the nature of functor.
The
Functor parameter will determine what kind of
Function_handler class we need to use. The invoker, which is a function pointer will be set to the correct
M_invoke function of the function handler.
Thus, a single call to a callable object would require at most 2 pointer dereferences, no virtual calls and no dynamic allocation (depending on your Functor as we have already seen).
Type checking on assigning target
If we look at the constructor (or copy or assignment) template parameters a little bit more closely, we will see something like:
Callable <Functor>
Let us see what it is:
template<typename From, typename To>
using check_func_return_type
= or<is_void<To>, is_same<From, To>, is_convertible<From, To>>;
template<typename Func,
typename Res2 = typename result_of<Func(ArgTypes...)>::type>
struct Callable : check_func_return_type<Res2, Res> { };
The idea is to check whether the provided callable type yields a return type (
Res2) when invoked with the argument types passed as the template parameters of
std::function and that return type i.e
Res2 is compatible with what was provided in the
std::function declaration or in its template parameter.
The reason for adding
is_same is pretty interesting. For details check
THIS bug report.
I am not sure why
is_convertible is required since its failure would not result in lookup of a constructor with different signature as there is only one definition of the constructor. The only advantage I see is, any wrong usage of
std::function i.e. assigning it with a target with incompatible arguments would result in error thrown at the constructor itself, rather than at the calling site. If anybody knows about any other advantage, then please do let me know.
For example, see the usage of
func1 and
func2 and the error it yields using both my custom implementation (which does not use conversion checks) and
std::function.
Error with
My implementation:
./my_function.hpp:72:14: error: ambiguous conversion from derived class 'D' to base class 'A':
struct D -> struct B -> struct A
struct D -> struct C -> struct A
return callable_(std::forward<Args>(args)...);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As one can see, the error is thrown at the point where we are actually invoking the callable target. This is not good since it is pointing an issue in the library implementers code, which is actually not the case. A good C++ library must try its best to point the error in the users code rather than to its own internal implementation.
inheritance.cc:45:5: error: no viable overloaded '='
f = func2;
~ ^ ~~~~~
It points to the user code which messed it up. This is one nice advantage of using
is_convertible which assigning a target.
A note about libcxx
The libcxx implementation of std::function looked a lot more straightforward than the libstd++ implementation and they provide a buffer size of 24 bytes (on 64 bit arch) for the SBO, which is 3 times more than what libstd++ offers. Also, libcxx supports allocator other than global new (I am not sure whether standard mandates that or not till c++14), which libstd++ does not currently.
Another difference w.r.t libstd++ is that the libcxx version makes use of virtual function call instead of function pointer as in libstd++. Now, I wouldn't state that function pointer would perform faster than a virtual function call, but I remember seeing the performance measurements of both types of call and a call via function pointer was slightly faster than a virtual function call. Now, don't get all paranoid by that, as we will see in the final part of the blog series, std::function should better be not present in a tight code where performance matters most. In all other cases, the difference (if at all) shouldn't matter at all.
One problem that I see with non-standard way of implementing SBO is performance regression when going from one standard library implementation to another i.e. while moving from a library implementation that has larger internal buffer size than to a one having less (eg: libc++ -> libstd++). This would certainly waste quite some time of the developer who is debugging the issue (as in most cases this developer would not be the one who wrote the code :))