2015-04-22

Boost coroutines instead of state machines? Maybe...

If you are reading this post you are probably already familiar with the concept of coroutine.

If you are not, let’s say coroutines are a way to have cooperative (non pre-emptive) multitasking, possibly on a single thread. You can have multiple flows of execution that yield control to each other without being pre-empted, and without using threads, semaphores, and condition variables. In simple terms, coroutines are an extension to the concept of function: while functions cannot be resumed after “returning,” coroutines can be resumed from where they were suspended when “yielding” to another coroutine.

There is no language support in C++ for coroutines, but some libraries are available, that use some platform-specific code and a few ingenious hacks to make coroutines possible in C++. A noteworthy one is Boost.Coroutine, based on Boost.Context, a low-level library capable of doing the “magic” of a userspace context switch.

I had problems myself understanding coroutines when I saw them for the first time, and I consistently find it difficult to explain them to other developers, however smart and experienced. I think that parsing stream data is a good case study that makes it immediately understandable what coroutines can give you in real-world applications. When you parse data coming from a stream you have limited control on (e.g. a socket), you often end up coding a state machine that consumes one byte at a time, as the data you are parsing may be coming to you chunked in various ways. In this post, I will introduce a simple C++ example of a coroutine-based parser, present a few benchmarks, and try to cause in you mixed feelings about Boost.Coroutine by showing you a few pros and cons of using it.

Let’s take a very simple network protocol, that is based on variable-length messages, the length of which is contained in the first byte of the message. A message of length zero represents the connection-close message.

class state_machine_parser
{
private:

    char m_message[255];
    uint8_t m_message_length = 0;
    uint8_t m_message_offset = 0;

    enum { MESSAGE_HEADER, MESSAGE_BODY, COMPLETE } m_state = MESSAGE_HEADER;

public:

    void process(char c)
    {
        switch (m_state)
        {
        case MESSAGE_HEADER:
            m_message_length = c;
            if (m_message_length == 0)
            {
                m_state = COMPLETE;
            }
            else
            {
                m_message_offset = 0;
                m_state = MESSAGE_BODY;
            }
            break;

        case MESSAGE_BODY:
            m_message[m_message_offset++] = c;
            if (m_message_offset == m_message_length)
            {
                // do something with the message
                m_state = MESSAGE_HEADER;
            }
            break;

        case COMPLETE:
            break;
        }
    }

    bool complete() const
    {
        return m_state == COMPLETE;
    }
};

Usually protocols are much more complex than this, and I don’t know you, but I get lost in those massive state-machine parsers with many states (IDLE, HEADER_VERSION, HEADER_LENGTH, HEADER_LENGTH_EXTENDED, PRE_BODY_MASK, BODY, WAITING_CR, WAITING_LF, etc...) plus a number of variables named offset, len, bytes_to_read, and similar.

Let’s look at the following example:

void coroutine_parser(data_source<char>& s, data_source<char>::yield_type& yield)
{
    char message[255];
    uint8_t message_length;

    do
    {
        s.read((char*)&message_length, 1, yield);
        s.read(message, message_length, yield);
        // do something with the message
    }
    while (message_length > 0);
}

For now, try not to think about data_source. Assume it is some sort of socket. If it really were a socket, this code would implement some old fashioned, blocking I/O stuff running on a separate thread.

By the means of Boost.Coroutine, you can code data_source like this:

template<typename DataType>
class data_source final
{
public:

    typedef DataType data_type;
    typedef boost::coroutines::symmetric_coroutine<void>::call_type call_type;
    typedef boost::coroutines::symmetric_coroutine<void>::yield_type yield_type;

private:

    data_type* m_dest_buffer;
    size_t m_pending = 0;

public:

    data_source() = default;
    data_source(const data_source&) = delete;
    data_source& operator=(const data_source&) = delete;
    data_source(data_source&&) = delete;
    data_source& operator=(data_source&&) = delete;

    template<typename InputIt>
    void write(InputIt begin, InputIt end, call_type& parse_coroutine)
    {
        while (parse_coroutine)
        {
            if (m_pending == 0)
            {
                parse_coroutine();
            }

            if (begin == end)
            {
                break;
            }

            while ((begin != end) && (m_pending != 0))
            {
                *m_dest_buffer++ = *begin++;
                m_pending--;
            }
        }
    }

    void read(data_type* dest_buffer, size_t len, yield_type& yield)
    {
        if (len != 0)
        {
            m_dest_buffer = dest_buffer;
            m_pending = len;
            yield();
        }
    }
};

This simple helper class abstracts away most of the complexity, thus making the parsing routine itself very neat. You can provide the data to be parsed by calling write with an STL-like range, and the parsing coroutine will resume when the buffer contains exactly the number of bytes requested. The buffer is provided by the coroutine itself, which means no unnecessary copies.

You may argue this is more code than the state machine alone, but bear in mind that data_source can be reused, and does not depend on the protocol―maybe you need to add something: for a start, some way to handle and report errors.

An example of how to bootstrap and feed this parser is available in this GitHub repository.

At this point, if you are an experienced Boost.Coroutine user, you are either yawning, or yelling at me because you like to do things in a different way and/or I did something you find silly. The point of this post however is that even if I feel compelled to use coroutines, there are a few points that concern me:

  1. Performance. I made some benchmarks and the results are not encouraging. I started by comparing the time required to parse 1,000,000 very short sequences with the state machine parser vs. a straightforward coroutine-based implementation, and the result was a 16x slowdown for the coroutine.
    I managed to get some improvement by reusing the same stack for all the coroutine invocations: the slowdown reduced to 10x.
    Finally, I thought of re-using the same coroutine instance rather than returning from it (when receiving the last, zero-length message) and creating a fresh instance every time. This reduced the slowdown to 7x.
    It seems still a lot to me. Of course, if you are parsing data from the network, its inherent latency plus the overhead of the numerous system calls involved will probably be a dominant factor, rather than a few milliseconds wasted every million messages.
    Nevertheless, don’t assume coroutines come for free.
    The benchmarks I am talking about are available here for you to inspect and possibly try on your machine. Feel free to improve them, and explain me how to further reduce the performance impact of the coroutines.
    If you get a crash, scroll down to the last point of this list to learn why.
  2. (Lack of) consensus. In most real-world cases, you are not the owner of the code you write, and I challenge you to convince your colleagues (especially those fearing Boost or the Standard Library itself!) that everybody should use coroutines because they are cool. It is something that C and C++ programmers have never seen, and they will be suspicious. If anything goes wrong, coroutines will instantaneously be accused, and will be eradicated from the codebase overnight at the first sign of trouble. Let’s wait for them to be part of the C++ programming language itself, it will be way easier to convince resistant developers to use and trust them.
  3. Portability. You may be sacrificing portability for something that, after all, is just syntactic sugar.
  4. Valgrind. Boost.Context and Boost.Coroutine puzzle Valgrind. The fix didn’t work for me (it instantaneously segfaulted) and it is a pain in the neck because you have to rebuild Boost.
  5. Exceptions. There is some quirky stuff to care about if you want to catch exceptions inside your coroutine. Make sure you carefully read what the documentation says.
  6. The API keeps changing. Just compare Boost.Coroutine in version 1.54, 1.55, and 1.56. Three consecutive versions, three completely different worlds. Backward compatibility is guaranteed I think, but the old API disappeared from the documentation, not even reported as deprecated, just swallowed by the dark. This is confusing to say the least, and reduces consensus (see point 2).
  7. I found a bug, without doing anything particularly strange. The code contained in my benchmark will trigger it. Your copy of Boost is probably affected. Here’s the patch.

The bottom line is that coroutines are an effective tool that can improve the readability of complex parsing routines, and much more (there are plenty of use cases). Boost.Coroutine is an impressive tool you can use to bring coroutines into your C++ code, not just for parsing, of course. I just recommend to go easy on them, for the above reasons.

Tags: cpp, boost
comments powered by Disqus
Fork me on GitHub