••• Exploit the full potential of the CRTP! •••
In this presentation you will learn:
▸ what is the curiously recurring template pattern
▸ the actual cost (memory and time) of virtual functions
▸ how to implement static polymorphism
▸ how to implement expression templates to avoid loops and copies
2. In this presentation...
1. Understand the CRTP (Curiously Recurring Template Pattern)
template <class T>
class Animal { … };
class Chicken: public Animal<Chicken> { … };
2. Implement compile-time polymorphism & avoid overhead of virtual functions
// run-time polymorphism
void call_animal(Animal& a) { std::cout << a.call() << std::endl; } // overhead...
// compile-time polymorphism
template <class T>
void call_animal(Animal<T>& a) { std::cout << a.call() << std::endl; } // no overhead!
3. Implement expression templates & avoid useless loops and copies
Vec<double> x(10), y(10), z(10);
Vec<double> tot = x + y * z; // how to avoid 2 loops and 2 copies here?
2
3. The Curiously Recurring Template Pattern (CRTP)
● A class template (e.g. std::vector<T>, std::map<T1,T2>, … ) defines a family of classes.
● We can also derive a class X from a class template that uses X itself as template parameter!
Does that sound complicated? Easier done than said.
This C++ template construct is called the Curiously Recurring Template Pattern (CRTP).
● Why do we need this curious pattern?
→ static polymorphism: alternative to virtual, avoids memory and execution time overhead
→ expression templates: computes expressions only when needed, removes loops and copies
3
Container<double> u; // container of doubles
Container<char> v; // container of chars
Container<Container<int*> > w; // container of containers of int ptrs
// Container storing objects of some type T
template <class T>
class Container {
T* data_; // buffer containing the data
size_t size_; // buffer's length
};
template <class T>
class Base { … };
class Derived: Base<Derived> { … }; // A curious derivation!
5. ● The keyword virtual is used to allow a method of a class Base to support dynamic polymorphism.
This means that the method can be overridden in a class Derived, such that any call to the virtual
function from a Base pointer/reference to a Derived object is resolved as if it came from a Derived:
● Dynamic polymorphism can be very useful in the following cases:
→ call a method of Derived from an external function
→ iterate some operation on a vector of pointers to Base
Dynamic Polymorphism
5
struct Animal {
virtual std::string call() const { return "animal call"; }
std::string food() const { return "animal food"; }
};
struct Hen: Animal {
virtual std::string call() const override { return "cluck!"; }
std::string food() const { return "corn"; }
};
// Example of usage
Hen h;
Animal* ptr = &h; ptr->call(); // "cluck!", because call() is an overridden virtual
Animal& ref = h; ref.food(); // "animal food", because food() is not virtual
// pass by reference (or pointer), to allow call to overridden virtual function
void print_call(Animal const& a) { std::cout << a.call() << std::endl; }
Hen h; print_call(h); // prints "cluck!"
Rooster r; print_call(r); // prints "cock-a-doodle-doo!"
std::vector<Animal*> v; // vector of pointers, to allow call to overridden virtual function
v.push_back(new Hen());
v.push_back(new Rooster());
for (auto a : v) std::cout << a->call() << std::endl;
6. ● Using virtual functions allows us to implement dynamic polymorphism, a.k.a. runtime polymorphism.
This means that the implementation of the polymorphic method to be used is known only at runtime.
To support this mechanism, both memory and execution overheads are introduced!
● Most compilers use vptrs and vtables to implement dynamic polymorphism.
– vtable: hidden static member added to each class with virtual functions, with pointers to functions.
– vptr: hidden pointer added to any object of a class with virtual functions, pointing to the correct vtable
→ Space-cost overhead: each object of a class with virtual functions requires an extra member.
sizeof(vptr) = sizeof(void*) could be 8 bytes on a 64-bit machine: a lot for small-size classes!!
The Cost of Dynamic Polymorphism
6
struct X {
int val; // sizeof(int) = 4
double foo(); // function, doesn't take space!
};
sizeof(X) = 4;
struct Y {
int val; // sizeof(int) = 4
virtual double foo(); // virtual function, sizeof(void*) = 8
};
sizeof(Y) = 16; // 4(val) + 8(vptr) + 4(bit padding) !!
→ Time-cost overhead: virtual functions (in most cases) cannot be inlined.
Moreover, two levels of indirection are required when calling a virtual functions:
– get the vtable pointed by the vptr (pointer deference)
– get the method pointed by the vtable (call to function using pointer to function)
...is this always necessary? CRTP to the rescue!
7. ● Using the CRTP we can implement a compile-time alternative to virtual fcns: static polymorphism.
This will remove the memory and time overhead!
Static Polymorphism
7
Dynamic Polymorphism Static Polymorphism
// (Abstract) Base class
struct Animal {
virtual std::string call() = 0; // pure virtual fcn
};
// Derived class
struct Hen: Animal {
virtual std::string call() { return "cluck!"; }
};
void print_call(Animal& a) {
std::cout << a.call() << std::endl;
}
Hen h; print_call(h); // prints "cluck!"
// (Templated) Base class
template <class T> struct Animal {
std::string call() { return static_cast<T*>(this)->call(); }
};
// Derived class (using CRTP!!)
struct Hen: Animal<Hen> {
std::string call() { return "cluck!"; }
};
template <class T> void print_call(Animal<T> & a) {
std::cout << a.call() << std::endl;
}
Hen h; print_call(h); // prints "cluck!"
● Of course, this comes with some limitations in the flexibility of static polymorphism.
E.g. different CRTP-Derived classes cannot be addressed with a common Base pointer/reference!
std::vector<Animal*> vec = {new Hen(), new Frog(), new Hen()}; // only works with dynamic polymorphism!
for (auto x : vec) print_call(*x);
8. Recap: Static VS Dynamic Polymorphism
8
Dynamic Polymorphism
● resolved at run time (dynamic binding)
using vptr and vtable
● base class is abstract if any of its functions
is a pure virtual (virtual void foo() =0;)
● memory cost to store vptr in each object,
can be significant for small classes
● time cost for dynamic dispatch at every
virtual function call, no inlining
● very flexible: pass base pointer/reference to
a function, iterate over arrays of base
pointers/references, ...
Static Polymorphism
● resolved at compile time (early binding)
using CRTP templates resolution
● base class is a templated class, its methods
call the ones of its derived class instantiation
(static_cast<T*>(this)->call();)
● no memory overhead
● no time overhead, possible inlining for
optimization
● limited flexibility: no addressing via Base
pointers/references
10. Math Expressions: Readability VS Performance
10
● Using C++’s operator overloading and external functions we can implement complex operations
while producing very readable code close to natural language.
Consider, e.g. a class Vec3 and assume we have defined sum and cross product:
Using these functions we can write a code for (a x b) + c that would be totally obfuscated otherwise!
● The second version is way more readable and maintainable. But this has a high performance cost!
→ the first version requires only 1 loop, the second 2 loops (1 fo cross prod., 1 for sum)
→ the first version allocates memory only once, while the second allocates memory
– for the result tmp1 of cross(a, b)
– for the result tmp2 of tmp1 + c
– for the copy constructor of d (unless copy elision and/or move semantics)
● How can we keep readability while avoiding useless loops, memory allocation and copies?
Vec3 cross(Vec3 const& v, Vec3 const& w);
Vec3 Vec3::operator+(Vec3 const& w);
// 1) without operator overloading
Vec3 d;
d[0] = a[1]*b[2] – a[2]*b[1] + c[0];
d[1] = a[2]*b[0] – a[0]*b[2] + c[1];
d[2] = a[0]*b[1] – a[1]*b[0] + c[2];
// 2) with operator overloading !
Vec3 d = cross(a, b) + c;
11. Expression Templates
11
● The observed performance hit comes from evaluation of expressions before needed.
Expression templates exploit the CRTP to achieve lazy evaluation and loop fusion.
This allows us to produce readable code without any performance loss:
→ the result of an operation between two vectors or expression is an expression
→ an abstract syntax tree of expressions is built at compile time…
→ ...and is evaluated only when needed
c + (a x b) c
ba
x
+
● In practice, expression templates can be implemented as follows.
→ the nodes are templated, generic expression
→ the leaves are actual vectors
→ all the expressions are built at compile-time without evaluation
→ evaluation is triggered by assignment (or copy c-tor) to an actual vector
Vec3 d = c + cross(a, b);
12. Expression Templates: Example (1/3)
12
● First of all we define the “abstract” base class for a generic vector expression.
Then, the actual vector class inherits through the CRTP and “overloads” the operators.
Remember: the assignment of a vector expression to an actual vector triggers the evaluation!
// Generic vector expression
template <class E>
class VecExpression {
public:
// "virtual" operator[], const version
double operator[](int i) const {
return static_cast<E const&>(*this)[i];
}
// "virtual" size()
size_t size() const {
return static_cast<E const&>(*this).size();
}
// cast to E
E& operator()() {
return static_cast<E&>(*this);
}
E const& operator()() const {
return static_cast<E const&>(*this);
}
};
// Actual vector, inheriting through CRTP
class Vec : public VecExpression<Vec> {
public:
// "overload" operator [], const and non-const versions
double operator[](int i) const { return data_[i]; }
double& operator[](int i) { return data_[i]; }
// "overload" size()
size_t size() const { return data_.size(); }
// c-tor
Vec(size_t sz) : data_(sz) {}
// c-tor from VecExpression, triggers evaluation!
template <class E>
Vec(VecExpression<E> const& ve) : data_(ve.size()) {
for (size_t i =0; i< ve.size(); ++i)
data_[i] = ve[i];
}
private:
std::vector<double> data_;
};
13. Expression Templates: Example (2/3)
13
● Now we define classes (still inheriting through CRTP) for the operations we want to implement:
let us consider, e.g., sum between vectors and (elementwise) logarithm of a vector.
● Finally, we add two overloads as syntactic sugar.
template <class E1, class E2>
class VecSum: public VecExpression<VecSum<E1, E2> > {
public:
// operator [], const version
double operator[](int i) const { return ve1_[i] + ve2_[i]; }
// size()
size_t size() const {return ve1_.size(); }
// c-tor
VecSum(E1 const& ve1, E2 const& ve2) :
ve1_(ve1), ve2_(ve2)
{
assert(ve1.size()==ve2.size());
}
private:
E1 const& ve1_;
E2 const& ve2_;
};
template <class E>
class VecLog: public VecExpression<VecLog<E> > {
public:
// operator [], const version
double operator[](int i) const { return std::log(ve_[i]); }
// size()
size_t size() const { return ve_.size(); }
// c-tor
VecLog(E const& ve) :
ve_(ve)
{}
private:
E const& ve_;
};
template <class E1, class E2>
VecSum<E1,E2> operator+(E1 const& ve1, E2 const& ve2)
{
return VecSum<E1,E2>(ve1, ve2);
}
template <class E>
VecLog<E> log(E const& ve)
{
return VecLog<E>(ve);
}
14. Expression Templates: Example (3/3)
14
● We can now use our expression templates to form complex expressions!
Let us now consider the expression of Vec
e = log(a + b + log(c)) + d
If we had used the naive overloadings for operator+ and log the above expression would have
required 5 loops and 5 memory allocations (or even more, without copy elision / move semantics!).
● Using the expression templates that we have implemented, we can avoid all these loops and allocations
while preserving the natural syntax:
● Note that without “syntactic sugar” the syntax would have been cumbersome! Just to write
c = log(a + b)
we would have needed to type
Vec e = log(a + b + log(c)) + d
Vec e = VecLog<VecSum<Vec,Vec> >(VecSum<Vec,Vec>(a,b));