## Friday, 16 March 2012

### Deferring execution

Go language has defer statements which allow for postponing execution of a function or method call until the end of the current function. Similarly, D has scope guard statements which allow statement execution to be delayed until the end of the scope (optionally specified to only execute under successful or failed scenarios).

 Go D lock(l) // unlocking happens before // surrounding function returns defer unlock(l)  lock(l); // unlock on leaving the scope scope(exit) unlock(l); 

This is similar to 'finally' statements that are used in languages like Java and Python. In C++, we have our trusty RAII technique to execute clean up code at the end of scope. But for the cases when that is not convenient, boost provides ScopeExit library to accomplish something akin to D:

lock(l);
BOOST_SCOPE_EXIT( (&l) )
{
unlock(l);
} BOOST_SCOPE_EXIT_END


If you look at Boost.ScopeExit source, it is filled with enough macro magic to make your head spin. The upside is that it works with C++03. But with C++11 and lambdas, we can build the same functionality and more with easy to read, macro free code.

The basic idea is to put deferred code into a lambda function and then store the lambda in a class that will execute it when it destructs. Let's start with this:

template <typename F>
struct deferrer {
F fun;
~deferrer() { fun(); }
};

template <typename F>
deferrer<F> defer(F const& f) {
return deferrer<F>{ f };
}

With this simple class (struct for now) and helper function, we can defer code like this:
void foo() {
lock_t l;
lock(l);
auto d = defer([&l] { unlock(l); });
// ...
}


However, there are a number of problems with this code. First, if lambda function throws, we will get a throwing destructor and the program will go to terminate() in those cases when the stack was already being unwound due to an exception. We can either leave things as is and disallow deferred code to throw or surround fun() invokation in try/catch block. Since this problem plaques RAII as well, I'll just resort to the former.

Second, and more serious problem, is that deferrer<F> is copyable and since defer() returns the object by value, there will be temporaries that get constructed and destructed. Since the destructor executes our function, our deferred code will get executed multiple times and prematurely! Well, that's the worst case scenario. Most compilers now implement copy elision and so RVO will actually not make this scenario appear. But relying on a compiler optimization is a bad practice anyway.

You might be saying right now, "wait a minute, this is C++11, shouldn't the return value be moved instead of copied?". Yes, if the move constructor is available. But by defining the destructor, we opted out of move constructor (and move assignment operator) being implicitly defined for us. Remember, the C++ committee tightened the rules regarding when the implicit generation is OK. Dave Abrahams has a nice post on the topic. BTW, gcc 4.6 still generates the move constructor and I haven't tested the soon to be released 4.7.

The best thing to do here is to forbid copying and define a move constructor:
template <typename F>
class deferrer {
F fun;
bool enabled;

public:
deferrer(F const& f) : fun(f), enabled(true) {}

// move constructor
deferrer(deferrer<F>&& rhs) : fun(rhs.fun), enabled(rhs.enabled) {
rhs.enabled = false;
}

// move assignment
deferrer<F>& operator=(deferrer<F>&& rhs) {
if( this != &rhs ) {
fun = rhs.fun;
enabled = rhs.enabled;
rhs.enabled = false;
}
return *this;
}

// no copying
deferrer(deferrer<F> const& ) = delete;
deferrer<F>& operator=(deferrer<F> const&) = delete;

~deferrer() {
if( enabled )
fun();
}

// add this as a bonus
void cancel() { enabled = false; }
};


Now let's turn our attention to the case when the deferred action should happen not at the end of lexical scope of where deferment was specified but at the end of the function scope or the end of some other lexical scope. For example:
void foo(const char* path, bool cleanup) {
int fd = creat(path, S_IRUSR);
if( cleanup ) {
auto d = defer([path, fd]{
close(fd);
});
}
else {
auto d = defer([fd]{ close(fd); });
}
// oops, closed and maybe unlinked too early!
// ....
}


We can fix this by creating another class that we can instantiate in the lexical scope at the end of which we want the deferred action to be executed:
void foo(const char* path, bool cleanup) {
deferred d; // deferred action will execute at the end of foo()
int fd = creat(path, S_IRUSR);
if( cleanup ) {
d = defer([path, fd]{
close(fd);
});
}
else {
d = defer([fd]{ close(fd); });
}
// ....
}

To implement such a beast, we'll need to use std::function to type erase the lambda into nullary function:
template <typename F>
class deferrer {
friend class deferred;
// remainder not changed
};

class deferred {
std::function<void ()> fun;

public:
deferred() = default;
deferred(deferred const&) = delete;
deferred& operator=(deferred const&) = delete;

template <typename F>
deferred(deferrer<F>&& d) {
if( d.enabled ) {
fun = d.fun;
d.enabled = false;
}
}

~deferred() {
if( fun )
fun();
}
};


Conclusion

While I think that the time it takes to define an RAII object is well spent, I admit that the occasional use of scope exit facility can be convenient. It is also great to see that lambda functions allow us to implement few simple utility classes that can do what other languages had to provide in the language itself.

## Saturday, 3 March 2012

### Recursing constexpr - Part II

In the previous post, we developed a logarithmic depth accumulate() function that can be used to emulate a for-loop. In this post, we'll look at how to construct a function that emulates a while-loop with the recursive depth of O(lg(n)) where n is the number of iterations of the while-loop. Recall that our initial motivation is to be able to determine if a large number is prime at compile time as was done in this post.

A while-loop can be generally written as follows:

state = initial state;
while( test(state) )
mutate state;

Translating this to functional form, we can write the signature as:

template <typename T, typename Test, typename Op>
T while_loop(T init, Test test, Op op);


The main idea behind the implementation of while_loop() is to call a helper function that will execute up to n iterations of the loop. As we have seen in previous post, this can be done with recursive depth of O(lg(n)). If there are more iterations that are still needed (that is test(state) is still true), while_loop() recursively calls itself with 2n. Instead of n, the implementation uses depth argument to signify the maximum recursion depth. This is illustrated below:

The code looks like this:

template <typename T, typename Test, typename Op>
constexpr T exec(int first, int last, T init, Test test, Op op) {
return
(first >= last || !test(init)) ? init :
(first + 1 == last) ?
op(init) :
exec((first + last) / 2, last,
exec(first, (first + last) / 2, init, test, op), test, op);
}

template <typename T, typename Test, typename Op>
constexpr T while_loop(T init, Test test, Op op, int depth = 1) {
return !test(init) ? init :
while_loop(exec(0, (1 << depth) - 1, init, test, op), test, op, depth+1);
}


There is one downside to this algorithm. The exec() function performs the test to determine if it should continue or bail at the point of entry. This performs more tests than necessary. For example, as exec() recurses along the left edge of the tree, it performs the test over and over without mutating the state. This can be fixed by introducing another helper and performing the test only on the right branch of the recursion:

template <typename T, typename Test, typename Op>
constexpr T guarded_exec(int first, int last, T init, Test test, Op op) {
return test(init) ? exec(first, last, init, test, op) : init;
}

template <typename T, typename Test, typename Op>
constexpr T exec(int first, int last, T init, Test test, Op op) {
return
(first >= last) ? init :
(first + 1 == last) ?
op(init) :
guarded_exec((first + last) / 2, last,
exec(first, (first + last) / 2, init, test, op), test, op);
}


Finally, we can use while_loop() to implement is_prime():

struct test_expr {
int n;
constexpr bool operator()(int x) const { return x*x <= n && n % x != 0; }
};

constexpr int inc(int x) {
return x + 1;
}

constexpr bool gt_than_sqrt(int x, int n) {
return x*x > n;
}

constexpr bool is_prime(int n) {
return gt_than_sqrt(while_loop(2, test_expr{ n }, inc), n);
}

int main() {
static_assert(is_prime(94418953), "should be prime");
return 0;
}


Logarithmic Growth

To see that the recursion depth will be O(lg(n)) where n is the iteration count, observe that when the last iteration completes, while_loop() will have a recursive depth d and exec() will also produce depth d. Thus, the maximum recursion depth will be 2d. At the first level, exec() would have completed 1 iterations, 3 at the second, 7 at the third, and $2^d-1$ at level d. If $d = \left \lceil lg(n) \right \rceil$, we need to show that

$\sum_{i=1}^{\left \lceil lg(n) \right \rceil}2^i-1 \geq n$

for all sufficiently large n. In that case, the recursion of depth $\left \lceil lg(n) \right \rceil$ can fit more than n iterations. By the use of geometric series formula, the sum becomes $2^{\left \lceil lg(n) \right \rceil+1}-\left \lceil lg(n) \right \rceil-2$; and since the first term is dominating, we just need to show that $2^{\left \lceil lg(n) \right \rceil+1}\geq n$. Note that $2^{\left \lceil lg(n)+1 \right \rceil}=2(2^{\left \lceil lg(n) \right \rceil})\geq 2n\geq n$.

Practical Simplification

Since our initial goal was to not exceed the compiler recursion limit, we do not actually need truly logarithmic depth. It will suffice to map the loop iterations onto a tree of certain depth. For example, if we choose a depth of 30, we can have up to $2^{30}=1073741823$ iterations, more than enough for most practical purposes. In this case, the while_loop() function simplifies to:

template <typename T, typename Test, typename Op>
constexpr T while_(T init, Test test, Op op) {
return guarded_accumulate(0, (1 << 30) - 1, init, test, op);
}