Monday, 16 April 2018

Designing a kind of Shared Pointer in C++

What is this about

  • Designing my own variant of policy based shared_ptr class for fun.
  • Understanding the standard library implementation while codifying my design.

What is this not about

  • A replacement for std::shared_ptr type.
  • Understanding the implementation of shared_ptr in the standard libraries.
  • Tutorial on implementing a shared_ptr and thereby teaching some of the advanced concepts.

Introduction

I am aware of the implementation details of libstd++'s shared_ptr implementation to some extent. It is quite daunting to be honest even though the basic idea behind reference counted storage seems to be quite easy. 
So, one free evening I just thought to write my own version of shared_ptr to see how it feels to write one. I had few design choices in mind before I started with its implementation:
  • Should have policy based reference counting mechanism. The standard implementation which we have today makes use of atomic counters even when we are working in single threaded program. So, it is not zero cost abstraction per-say.
  • Will be the make_shared equivalent i.e. both the type and the reference count variables would be stored together in one single allocation.
  • No deleter support! Only Allocator support is there. No particular reason for this. Just wanted to avoid additional complexity of it.
  • No aliasing constructor. Its not a low hanging fruit anyways.
  • No covariance support.
  • No adoption of any other pointer types.
  • No atomic ref count policy support :D (Got lazy, but I do plan to complete it)
  • And some more which I forgot now :) 
NOTE : I wont be surprised if someone finds bugs or UB in the implementation. As said before, it is written for the purpose of understanding (read running away from boredom).

I will be presenting 4 versions of the design, each version incrementally better than its previous with additional features.

All the code presented here can be found at Cpp-Shared-Pointer
The code has been compiled and tested with clang 5 on Macbook.

Version-1

The first version just focuses on getting the shared_ptr constructor, control_block interface correct. There is no support for allocators and weak_ptr.

Control Block

This is the where the object and its associated reference counters are stored. The shared_ptr class would perform just one allocation for this control_block. There would not be two different allocations for the object and the reference counters respectively.

/*!
 * The control block object where the
 * allocated object resides along with
 * its reference count.
 */
template <
  // The allocated object type.
  typename T,
  // The ref count policy to be used.
  // Single threaded Policy or Multi Threaded Policy
  typename RefCntPolicy>
struct control_block;

This structure can be specialised on the basis of RefCntPolicy. For that purpose, we will define two additional structs used for selecting the appropriate control_block at compile time.

// Reference Counting policies
struct single_threaded_t {};
struct multi_threaded_t {};


Version-1 (and all other versions) has only support for single threaded control_block specialization.

/*!
 * Control Block specialized for Single Thread.
 */
template <typename T>
struct control_block<T, single_threaded_t>
{
  using storage_type = typename std::aligned_storage<sizeof(T), alignof(T)>::type;

  storage_type value_buf;
  size_t ref_cnt = 0;

  /**
   * Calls destructor of the embedded object.
   */
  ~control_block()
  {
    (reinterpret_cast<T*>(&value_buf))->~T();
  }
};


There are bunch of free functions for adding/ decrementing / fetching the reference count which one can lookup in the complete source. In later versions these free functions have been moved to as member functions.

The shared_ptr class design:

template 
class shared_ptr
{
public:
  /**
   * Create a shared pointer from the Args.
   */
  template <typename... Args,
           typename Cond = std::enable_if_t<
                             std::is_constructible<T, Args...>::value ||
                             std::is_convertible<T, Args...>::value>>
  explicit shared_ptr(Args&&... args);
  .
  .
  .
public: // Pointer like semantics
  /**
   */
  T* operator->() noexcept
  {
    return detail::get_type_ptr(ctrl_blk_);
  }
  .
  .

  T& operator*() noexcept
  {
    return *(detail::get_type_ptr(ctrl_blk_));
  }
  .
  .
  .
private:
  /// The control block where the pointed type actually lives
  detail::control_block<T, RefCntPolicy>* ctrl_blk_ = nullptr;
};

Constructing the object inside control block is pretty easy using placement new:

template <typename T, typename RCP>
template <typename... Args, typename Cond>
shared_ptr<T, RCP>::shared_ptr(Args&&... args)
{
  ctrl_blk_ = new detail::control_block<T, RCP>{};

  // Construct the object and increment ref count
  detail::construct(ctrl_blk_, std::forward<Args>(args)...);
  detail::add_ref(ctrl_blk_);
}

/**
 * Construct the object in its aligned storage area.
 */
template <typename T, typename... Args>
void construct(
    control_block<T, single_threaded_t>* cb,
    Args&&... args)
{
  new (&cb->value_buf) T{std::forward<Args>(args)...};
}


This with other helper functions is actually what is basically needed for implementing a shared pointer! Oh, the destructor:

template <typename T, typename RCP>
shared_ptr<T, RCP>::~shared_ptr()
{
  // moved from? not in control any more
  if (!ctrl_blk_) return;

  auto ref_cnt = detail::decr_ref(ctrl_blk_);
  if (ref_cnt == 0) {
    delete ctrl_blk_;
  }
}

For the complete implementation : https://github.com/arun11299/Cpp-Shared-Pointer/blob/master/ver_1/shared_ptr.hpp


Version-2

This is where things get serious (and ugly...)
Additions in this version:
  1. Support for Allocator
  2. Support for Deleter (but not used actually and removed in later version)
  3. Type erasure for control_block since Allocator and Deletor are not part of the control block type.

Since Allocator (and Deletor, but I am going to ignore it for the rest of the post), is not part of the type, I need to make it part of my constructor template. And as per my design, if someone wants to use this shared_ptr, he should be able to use it like:

struct MyType {
  explicit MyType(int, int) {}
  .....
};

arnml::shared_ptr<MyType> mtyp_sp{ 10, 42 };


So , we need some way to:
  1. Pass Allocator.
  2. Pass variadic number of arguments for construction of for eg: MyType
This resulted in the real pain point of the design, writing constructors using SFINAE. It took me some time to get it correct. It is probably broken for most use cases in Version-2 and Version-3. More or less fixed in Version-4.

Since it is broken in this version, I don't want waste time here to explain about the constructors. Instead we will see the change in design of control_block.

Since we need to store Allocator (Sorry, no EBO technique used) and not make it part of the Type, we need to use some kind of type erasure technique. So, I create an abstract base class :D

/*!
 */
template <typename T>
struct control_block_base
{
  using storage_type = typename std::aligned_storage<sizeof(T), alignof(T)>::type;
  /// The storage buffer to create the type `T`
  storage_type value_buf;

  /// Create a control object derived impl
  template <typename Allocator, typename Deletor, typename RefCntPolicy>
  static control_block_base* create(Allocator&& allocator, Deletor&& deletor);

  /**
   * Calls destructor of the embedded object.
   */
  virtual ~control_block_base()
  {
    (reinterpret_cast<T*>(&value_buf))->~T();
  }
  
  .
  .
  .
  ///
  virtual void add_ref()   = 0;
  ///
  virtual size_t get_ref() const = 0;
  ///
  virtual size_t dec_ref() = 0;
  ///
  virtual void destruct() noexcept = 0;
};

And the derived specialised control_block:

/*!
 * Control Block specialized for Single Thread.
 */
template <typename T, typename Allocator, typename Deletor>
struct control_block<T, Allocator, Deletor, single_threaded_t>
  : control_block_base<T>
{
  ///
  using allocator_type = typename std::allocator_traits<
    std::decay_t<Allocator>>::template rebind_alloc<control_block>;
.
.
.
.
};

Not mush to explain here. You ask, how we use this in the shared_ptr class ? Ofcourse, as pointer to the base class. You ask, then how do we create the correct specialized derived class instance of the control_block? See:
/**
 * Construct the control block.
 */
template <typename T>
template <typename Allocator, typename Deletor, typename RCP>
control_block_base<T>*
control_block_base<T>::create(
    Allocator&& allocator,
    Deletor&& deletor)
{
  using DerSelf = typename detail::control_block<
                                      T,
                                      std::decay_t<Allocator>,
                                      std::decay_t<Deletor>,
                                      RCP
                                      >;

  typename std::allocator_traits<
    std::decay_t>Allocator>>::template rebind_alloc<DerSelf> rb_alloc;

  auto address = static_cast<DerSelf*>(rb_alloc.allocate(1));
  auto ctrl_blk = new (address) DerSelf{std::forward<Allocator>(allocator),
                                        std::forward<Deletor>(deletor)};

  return static_cast<control_block_base<T>*>(ctrl_blk);
}


Version-3

This was mostly about fixing the shared_ptr constructors.  One thing which I knew but experience this time is to how frustrating it could be to write constructor taking forwarding references! Making use of SFINAE to control overloading makes it worse!!

We will see all the constructors and the need for them in Version-4.



Version-4

This version has following additions:
  • Fixed all the constructor issues (as far as I could make it run for my small set of tests).
  • Added support for weak_ptr

Constructors not constructive enough

  • The constructor for incomplete types. 

/**
 * The default constructor.
 */
explicit shared_ptr(std::nullptr_t)
{
  //...
}

This is needed to construct shared_ptr to incomplete types. The nullptr_t  is there to differentiate between a default construction of a complete type and default construction of an incomplete type. (I am sure there might be other solution for this.)
For eg:
struct A {
    A() : ptr(nullptr) {}     // (1)
    arnml::shared_ptr<A> ptr;
};

arnml::shared_ptr<A> sp_a{};  // (2)


The difference is that, for (1) there is no control_block created, but for (2), the control block is allocated and the object of type A is created inside the control block value buffer.


  • The default constructor.

/**
 * For default constructible types
 */
explicit shared_ptr()
    : shared_ptr(std::allocator<T>{})
{
  //...
}

This we have already seen. Note that it uses the default std::allocator and uses forwarding constructor which we will see later.

  • Forwarding constructor. Accepts only arguments used to create the wrapped object of type T.

  /**
   * Forwarding constructor.
   * Create a shared pointer from the Args.
   */
  template <typename First, typename... Args,
            typename Allocator=std::allocator<T>,
            typename Cond = std::enable_if_t<
                              !std::is_same<std::remove_reference_t<T>, shared_ptr>::value &&
                              arnml::detail::is_allocator<Allocator>::value &&
                              !arnml::detail::is_allocator<First>::value
                            >
           >
  explicit shared_ptr(First&& f, Args&&... args)
    : shared_ptr(Allocator{}, std::forward<First>(f), std::forward<Args>(args)...)
  {
    //....
  }

And all my problems started. Just look at the SFINAE condition. This constructor will only accept arguments (at least one) that are used to create an instance of type T. Otherwise it will fail in compilation at a static_assert which I have added in the control_block creation function. This is still _not_ correct! The allocator here is probably redundant. You can only pass default allocator with it. Then, how to pass my own allocator ? Wait for it.

  • Constructor (sink) for default constructible type. (Check that #2 forwards to this)
  /**
   * Constructor for default constructible types.
   */
  template <typename Allocator=std::allocator<T>,
            typename = std::enable_if_t<detail::is_allocator<Allocator>::value>
           >
  explicit shared_ptr(Allocator&& alloc=Allocator{});



  • Constructor (sink) for Allocator and object construction args.
  /**
   */
  template <typename... Args, typename Allocator=std::allocator<T>,
           typename Cond = std::enable_if_t<
                              !std::is_same<std::remove_reference_t<T>, shared_ptr>::value  &&
                              arnml::detail::is_allocator<Allocator>::value>>
  explicit shared_ptr(Allocator&& alloc, Args&&... args);


And then there are copy constructor, move constructor, assignment operator and move assignment operator.

The thing to learn here is that this is terrible. I will probably never try to do this in some large codebase. Why is it bad (do I really need to say) ? I will point out few:

  1. SFINAE based overloading is bad for maintenance unless it is really very small compile time checks.
  2. Bug prone. There are quite a lot of bugs or unhandled conditions for sure in the above code. It is difficult and time consuming to identify them.
  3. Fixing one overload can break other.
  4. Compilation time is important. The more nasty template stuff you do, the more you pay.
It doesn't mean you shouldn't do this. There might be legit cases where you want to do this. But make sure you think it through many times.

weak_ptr

Nothing fancy about the weak_ptr. It holds a reference to the control_block created by the shared_ptr instance.

The control_block is changed to include the new weak reference count and its associated APIs.

template <typename T, typename Allocator>
struct control_block<T, Allocator, single_threaded_t>
  : control_block_base<T>
{
  ///
  using allocator_type = typename std::allocator_traits<
    std::decay_t<Allocator>>::template rebind_alloc<control_block>;

  /// Reference counter for single threaded applications
  size_t ref_cnt_ = 0;
  /// Weak ref count
  size_t weak_ref_cnt_ = 0;
.
.
.
.


Few things to note about the behaviour of weak_ptr:
  1. If the shared reference count or the strong reference drops down to 0, but the weak reference count is not 0, then only the destructor of type T is called. The control_block is not deallocated because it has the weak reference count.
  2. In the above case, the control block is freed only when the weak reference count reaches 0.
  3. To access the embedded object of type T within control_block, the lock method must be called on the weak_ptr. It creates a new instance of shared_ptr which is valid if and only if there is a strong reference count.

Example:

  struct A
  {
    arnml::weak_ptr<A> ptr;
  };

  arnml::shared_ptr<A> a{};
  a->ptr = a;

Another Example (using the example from cppreference):

  auto observe = [](arnml::weak_ptr<int> weak) {
    if (auto observ = weak.lock()) {
      std::cout << "observe locked to get shared ptr\n";
    } else {
      std::cout << "observe failed to get a shared ptr\n";
    }
  };

  arnml::weak_ptr<int> wp;
  observe(wp);

  {
    arnml::shared_ptr<int> sp_int{42};
    wp = sp_int;
    std::cout << "weak ptr initialized with shared_ptr\n";
    observe(wp);
  }



For the complete implementation : https://github.com/arun11299/Cpp-Shared-Pointer/tree/master/ver_4