An Introduction to Monads for C++ Programmers

This web page would not be complete without a monad tutorial, so I will use this occasion to write one. This tutorial will be using C++, as others have already written tutorials using other programming languages.

I find reasoning about monads in C++ to be relatively hard, not because monads are hard to understand, but because their implementation in C++ is beyond terrible. Haskell code is very declarative, in contrast, and understanding how a particular piece of code works is often as simple as applying substitutions like you would do in mathematics to go from an initial expression to its final form. For this reason, I will be using Haskell-style pseudo-code to clarify parts that need further explanation.

DISCLAIMER: Haskell programmers will want to kill me for the aberrations that I am about to state. At some point, I will provide the definitions of return and bind for the Maybe monad and say that the monad definition is complete, without a single mention of functors or applicatives. My goal was to keep the discussion as simple as possible; I will introduce functors and applicatives in another series.

Why monads?

Attempting to explain what monads are without explaining why we need them in the first place would be futile.

In a pure functional programming language like Haskell, functions are said to be pure. A pure function is a function in the mathematical sense: it maps an input to an output, and does nothing else. Pure functions are referentially transparent: replacing a function with its expression leaves the program's semantics unaltered. Given the same input, a pure function always yields the same output, no matter how many times the function is applied to that input. In addition, pure functions have no side effects: they do not write to the console, talk over the network, or mutate any kind of global state. As a consequence, a compiler has the freedom to evaluate independent computations in whatever order it feels like.

In many situations, we need to run computations in a specific order, write the result to the console, or save a trace of the program's execution to a log file. Bringing order to chaos and performing side effects is exactly what monads allow us to do.

What is a Monad?

Leaving category theory aside and staying on more earthly grounds, a monad can be defined as a computation mechanism that allows us to sequence actions in a context. The key words here are sequence and context.

Sequence

Monads help us solve the problem of running computations in a specific order. That is exactly what we mean by sequence: do x, then y, then z. As a C++ programmer, you should feel at home with this concept:

int x = 2+3;
int y = f(x);
std::cout << "y is " << y;

Here we have a sequence of statements. We tell the computer to compute 2+3 and assign the result to x, then assign f(x) to y, and then show the result on the console, in that particular order (ignoring compiler optimisations). The concept of sequence in monads is exactly the same.

"But why do we need anything special to run statements in a sequence?" - I hear you say. After all, a C++ program is, essentially, a sequence of statements. Remember where we are coming from: a world in which functions are so well behaved (functions are pure) that the underlying machine has much freedom as to the order in which to evaluate them in. Monads allow us to determine that order.

When describing monads, we often use the word monadic action to refer to an action, computation, step or statement in a sequence of statements. We can use all of these terms interchangeably to refer to each of the statements in a sequence like the above code snippet. A monadic action is to monads what a function is to regular computations: the basic building block with which we build more complex computations.

Context

The context is what allows side effects to take place, solving our second problem. This concept will, ironically enough, be new to most C++ programmers, so I will illustrate it by moving on to the definition of our first monad.

The Maybe Monad

The Maybe monad allows us to sequence actions in the context of failure. That is, an action in the Maybe monad can either yield a value or result in failure. This allows us to express computations that may go wrong and fail, such as taking the head of an empty list.

In C++, we would start by defining the following Maybe data type:

template <typename a>
struct Maybe
{
    std::shared_ptr<a> value;

    // Evaluate the monadic action and yield its value
    const a& operator() () const
    {
        return *value;
    }
};

The Maybe struct is a little container that may or may not contain a value, depending on the state of its internal pointer. However, to fully understand monads, we should not think of Maybe just as a container, but rather as a special kind of computational entity, as hinted by the addition of a function operator. The function operator returns the value being pointed to, assuming that it exists. This gives Maybe objects a new spin: they can be treated as something we can call or evaluate to yield a value. This is what we mean by computational entity, or function object in plain C++ terms. In addition, Maybe objects will have special denotations. A Maybe<a> object containing a null pointer will denote a failed computation, whereas a Maybe<a> object with an initialised pointer will denote a successful computation that yields a value of type a. Think of Maybe, then, as a special kind of function object or computational entity that may potentially yield a value depending on the state of its internal pointer.

At this point, we can introduce some new vocabulary. As discussed, a Maybe value can be seen as a special kind of computational entity, and we just happen have a word for it. We say that a Maybe object is a monadic value or monadic action. It is a value because it is really just another object in memory that we can store and pass around, like an int or an std::vector. It is an action because the Maybe object is also a function object, so we can look at it as something we can evaluate by calling its function operator on to yield a value.

To construct Maybe values, we define two functions, Nothing and Just, which construct failed and successful computations, respectively:

// Construct a Nothing value, denoting failure
template <typename a>
Maybe<a> Nothing ()
{
    return Maybe<a>(); // null pointer, empty value / failed computation
}

// Construct a Just value, denoting a successful computation
template <typename a>
Maybe<a> Just (const a& x)
{
    Maybe<a> maybe;
    maybe.value.reset(new a(x));
    return maybe;
}

The function names Nothing and Just here are not arbitrary. The Nothing function constructs empty Maybe values, which stand for failed computations, hence the word Nothing. The Just function, on the other hand, constructs Maybe values that do contain an internal value. These non-empty Maybe values represent successful computations that just yield the value we pass to the Just function, hence the name Just. The functions Nothing and Just therefore construct failed and successful Maybe computations, respectively.

I defined the Nothing and Just functions instead of adding two constructors to the Maybe struct because we cannot give constructors a name, and I really wanted these constructor functions to have a proper name. The reason is that we can now perform an exercise that is very common in functional programming: we will refer to values constructed with the Nothing function as Nothing values, and to values constructed with the Just function as Just values. Thus, the distinction between the functions and the values they construct becomes very blurry; this is the essence of functional programming.

From this point on, when we talk about Nothing or Just, it will not matter whether we are referring to the functions or to values constructed with those functions. The expression Just(3) can be regarded as a function call to the Just function with value 3, or as a Maybe value that yields a 3 (the Just function returns a Maybe value). Similarly, the expression Nothing() can be viewed as a function call to the Nothing function, or as a Maybe value denoting a failed computation. Function and value are equal.

Let us wrap up by exercising our new vocabulary. A Nothing value, named after the Nothing function, represents a failed computation. A Just value, named after the Just function, represents a successful computation. We say that a Maybe<a> value represents a computation that can either yield Nothing or Just<a> value, hence the words Maybe, Just and Nothing in the definitions above. A Maybe value can also be called a monadic value or monadic action, depending on how we want to think about it.

Are you still thinking of Maybe as a simple container at this point, or is it starting to look a bit more magical? In a few moments we will see how Nothing and Just work as failed and successful computations.

Having defined the Maybe struct together with the Nothing and Just functions, we can now create some example monadic values or actions:

Maybe<int> x = Just(3);   // A successful computation that yields value 3
Maybe<int> y = Nothing(); // A failed computation

So far we have defined what the Maybe monad looks like, but we have not defined how it behaves. This behaviour is what will shed light on the real meaning of Nothing and Just. For a monad definition to be complete, we must define two basic operations.

return

The return function takes a value of type a and yields a monadic value Maybe<a>:

template <typename a>
Maybe<a> ret (const a& x) // I am calling the function `ret` because
{                         // `return` is a special keyword in C++.
    return Just<a>(x);
}

If we look at Maybe as a container, we can say that return takes a simple value and puts it in a Maybe. This is like taking a value and using push_back to add it to an std::vector. But we should not think of Maybe as a yet another container.

If we look at Maybe as a special kind of computational entity, then the return function takes a simple value and wraps it in the Maybe monad. The return function answers the question - "What is the simplest computation that this monad can express?" - In the particular case of the Maybe monad, the simplest computation is one that yields Just<a> value.

Another way of seeing the return function is as a function that takes a simple value and puts it in the context of the Maybe monad. Remember when we said that monads allow us to sequence actions in a context? The return function allows us to take a simple value and lift it into the context of the Maybe monad. It transforms a value of type a into a monadic value or monadic action Maybe<a> that yields that value when evaluated (its function operator is called on).

You might have noticed that return is just Just with another name. The definition of return for the Maybe monad just happens to be that simple.

An example of return in action:

Maybe<int> x = ret(3); // Just<int>(3)

bind

Context and sequence, those are the two key words in the definition of a monad. The return function allows us to put values into context, so there must be another function to sequence actions in that context. That is exactly what the bind function does.

The bind function takes a Maybe<a>, a function from a to Maybe<b>, and applies the function to the a value contained in the Maybe<a> to yield a Maybe<b>:

template <typename a, typename b>
Maybe<b> bind (const Maybe<a>& x, const std::function<Maybe<b>(const a&)>& f)
{
    if (is_nothing(x))
        return Nothing<b>();
    else
        return f(x());
}

where

// Return true if the Maybe value is Nothing, false if it is Just a value.
template <typename a>
bool is_nothing (const Maybe<a>& x)
{
    return x.value.get() == nullptr;
}

Note that if the input Maybe<a> is Nothing, then there is nothing to apply the function to (the internal pointer is null) and bind simply yields Nothing. If, on the other hand, the input Maybe<a> value is Just<a> value (the internal pointer actually points to something) then bind applies the function to the a value inside of it, yielding a Maybe<b> value. In pseudo-Haskell (okay, this is actually Haskell):

bind :: Maybe a -> (a -> Maybe b) -> Maybe b
bind Nothing f = Nothing
bind m f = f m

Looks simple? Fantastic, because this completes the definition of the Maybe monad.

Visualising the Maybe Monad

At this point, you might be asking yourself what we have actually done. You understand the Maybe struct and the Nothing and Just functions, as well as return and bind. But you still do not see how a Maybe value is supposed to represent any kind of computation beyond a function object we can call, or what role the return and bind functions play at all. That is fine. What I wanted to illustrate is how relatively simple the definition of the Maybe monad is. The monad's true power shall manifest itself when we put it into use.

Recall the definition of the bind function and think about what we are doing for a second:

template <typename a, typename b>
Maybe<b> bind (const Maybe<a>& x, const std::function<Maybe<b>(const a&)>& f)
{
    if (is_nothing(x))
        return Nothing<b>();
    else
        return f(x());
}

Binding Nothing to a function yields Nothing. Binding Just<a> value to a function yields the result of applying that function to the a contained in the Just<a>, which is itself another monadic value, now of type Just<b>. In other words, if the input Maybe<a> is a failed computation, or Nothing, bind propagates the failure and yields Nothing. In contrast, if the input Maybe<a> is a successful computation, or "Just<a> value", then bind runs the next step in the sequence by applying the function to the a in the Just<a>, yielding Just<b>.

And now we can repeat the process. The Just<b> value can itself be passed to bind again together with another function to yield a Just<c> value. And the Just<c> can then again be transformed with bind into Just<d>. Or maybe this last computation fails, yielding Nothing, in which case binding this failed Maybe<d> value to the next step in the computation will also yield Nothing. And the Nothing will propagate down the chain to make the entire sequence of monadic actions yield Nothing, denoting that the overall computation has failed. If one action in the sequence fails, the entire sequence fails. If all of the actions are successful, the sequence yields the expected result. Welcome aboard, we have just defined computations under the context of failure.

Still not fully grasping it? Let us illustrate the Maybe monad in action with some examples.

Defining functions in the Maybe monad

Consider a function to divide an integer x by an integer y:

int div (int x, int y)
{
    if (y == 0)
        // what here?
    else
        return x/y;
}

Division is defined for all pairs of integers except when the denominator is 0. What is the div function supposed to yield in that case? As a C++ programmer, you might opt for throwing an exception. That is perfectly fine. In fact, exceptions are just another monad, and the entire C++ language itself can be seen as one big monad. Since we want to illustrate how the Maybe monad works, however, let us represent failure using Maybe:

Maybe<int> div (int x, int y)
{
    if (y == 0)
        return Nothing<int>(); // Nothing denotes failure
    else
        return Just(x/y); // Just denotes success
}

Notice how the div function now yields a value of type Maybe<int> instead of just a plain int. When we do this, we say we that have lifted the function into the Maybe monad. If the denominator is 0, the div function yields Nothing, otherwise it yields Just the result of the division x/y.

Cool! So even though division is not defined for every pair of integers, the Maybe monad does allow us to write div as a total function. The Maybe monad allows us to handle the failure case when y=0.

Finally, let us define a function that increments a number by one:

Maybe<int> inc (int x)
{
    return Just(x+1);
}

The inc function always returns a Just value, since there is no failure case here. This is perfectly valid; we never said that actions in the Maybe monad have to fail.

Having these two functions in hand, let us see how we can thread or sequence computations in the Maybe monad together with bind.

Sequencing computations with bind

Remember that bind takes a Maybe<a> value, a function from a to Maybe<b>, and returns a Maybe<b>:

template <typename a, typename b>
Maybe<b> bind (const Maybe<a>& x, const std::function<Maybe<b>(const a&)>& f)
{
    if (is_nothing(x))
        return Nothing<b>();
    else
        return f(x());
}

A value of type Maybe<a> represents a computation that may or may not yield a value of type a. The bind function allows us to thread computations by taking the a wrapped in these Maybe<a> values and applying a function to it to yield a new monadic value of type Maybe<b>. This new Maybe value can in turn be fed as an input to the next computation, and the result of that computation to the next one, and so on. But remember that monads sequence actions in a context. The context in the case of the Maybe monad is failure. If bind comes across a Nothing - a failed computation - then the entire sequence of actions results in Nothing, or failure. If everything goes well, bind yields the expected result wrapped in a Just.

The following examples use bind to feed Maybe int values to functions taking plain int values and yielding Maybe int values:

Maybe<int> a = bind(ret(2), inc); // Just(3)
Maybe<int> b = bind(bind(ret(2), inc), inc); // Just(4)
Maybe<int> c = bind(ret(4), std::bind(div, _1, 2)); // Just(2)

In the first example, we take value 2 and lift it into the monad with ret. This results in a Just(2) value, a computation yielding a 2. This Just(2) value is fed to bind together with inc. The result is that the 2 in the Just(2) is incremented, resulting in a Just(3). To see why this happens, use the definitions of ret, bind and inc and substitute them in:

bind (ret(2), inc) = bind (Just(2), inc) = inc(2) = Just(3)

In the second example, we inject 2 into the monad to obtain Just(2). The Just(2) value is fed to bind together with inc. This yields Just(3). The value 3 contained in the Just(3) is once again fed to bind with inc, yielding a final result of Just(4). Again, substitute in the function appropriate definitions to see how this works:

  bind ( bind (ret(2), inc), inc )
= bind ( bind (Just(2), inc), inc )
= bind (inc(2), inc)
= bind (Just(3), inc)
= inc(3)
= Just(4)

The third example feeds the value 4 wrapped in a Just to a function that divides its argument by 2, yielding Just(4/2) = Just(2).

Sequencing failure

So far so good, but none of the above examples have a failure case. What makes the Maybe monad interesting is that it allows us to deal with failure, so let us see how bind sequences failed computations.

Consider a function that computes (1/x) + 1:

int compute (int x)
{
    return (1/x) + 1;
}

The compute function is undefined for x=0, so it is unsafe in this form. We can lift the compute function into the Maybe monad to make it safe:

Maybe<int> compute (int x)
{
    std::function<Maybe<int>(const int&)> div_by_x = std::bind(div, _1, x);
    return mbind(mbind(ret(1), div_by_x), inc);
}

Using the monadic version of compute, we can now apply this function to integer values with no fear of dividing by 0:

Maybe<int> d = compute(1); // Just(2)
Maybe<int> e = compute(0); // Nothing

The first example computes (1/1) + 1 = Just(2). What is more interesting is the second example, which attempts to compute (1/0) + 1. The Maybe monad detects the erroneous division by 0 and yields Nothing in return, denoting that the computation has failed. The value 1 is never divided by 0, so the program does not crash. Maybe handles the failure gracefully and yields Nothing.

To understand how each of these computations work, apply substitutions as usual:

  compute 1
= bind (bind (ret(1),  div_by_1), inc)
= bind (bind (Just(1), div_by_1), inc)
= bind (div_by_1(1), inc)
= bind (Just(1), inc)
= inc(1)
= Just(1)

and

  compute 0
= bind (bind (ret(1),  div_by_0), inc)
= bind (bind (Just(1), div_by_0), inc)
= bind (div_by_0(1), inc)
= bind (Nothing, inc)
= Nothing

The first example is just like the ones in the previous section, in the sense that all computations produce Just values and the final result is the expected value of (1/x) + 1 wrapped in a Just.

The second example is where the Maybe monad manifests its power. When we evaluate compute(0), everything goes well until we attempt to divide 1 by 0:

bind (div_by_0(1), inc)

The div function yields Nothing in this case:

Maybe<int> div (int x, int y) // x = 1, y = 0
{
    if (y == 0) // true
        return Nothing<int>(); // failure
    else
        return Just(x/y);
}

This leaves us with

bind (Nothing, inc)

Now recall the definition of bind:

template <typename a, typename b>
Maybe<b> bind (const Maybe<a>& x, const std::function<Maybe<b>(const a&)>& f) // x = Nothing
{
    if (is_nothing(x)) // true
        return Nothing<b>(); // propagate failure
    else
        return f(x());
}

bind finds Nothing in its input and yields Nothing, propagating the failure down the monadic chain:

  bind (Nothing, inc)
= Nothing

The final result is therefore Nothing. One simply cannot divide by 0.

The Side Effects of Maybe

We have said that monads help us solve two problems: determining order and allowing side effects. We have seen how bind solves the order problem by threading or sequencing monadic actions, running one after the other. But at what point were our Maybe actions allowed to have side effects?

A Maybe action is allowed to do two things: yielding a value or yielding Nothing. Compare this to regular (pure) functions, which can only do one thing: yielding a value. When a Maybe action results in Nothing, the entire sequence of actions is short-circuited and the overall result is Nothing. This is the side effect that a Maybe action can have. A Maybe action can yield a value just like regular functions, or it can fail the entire computation and make it result in failure. This second nature of monads is what determines their side effects. In the case of the Maybe monad, the side effect is failure. The Maybe monad therefore allows us to deal with failure.

Aftermath

A monad allows us to sequence actions in a context. The context is what makes one monad different from another. The Maybe monad puts us in the context of failure: a Maybe action can either yield a value or result in failure. This second nature of monads, being able to do something other than computing values, is what allows monadic actions to have side effects. Monads also answer the question of what order to run computations in.

"But I can throw exceptions in C++, this whole monads thing is a waste of time". Sure you can. In fact, the Maybe monad is less powerful than an exception, because while the Maybe monad will allow you to represent computations that may fail, it will not tell you why they failed. Exceptions, on the other hand, often contain a message describing the problem encountered.

But that is not the point. The Maybe monad is just one of the many monads that exist, simple enough to be able to write a short tutorial about it. Other monads can be defined, such as the Either monad, which can deal with exceptions, or the List monad, which handles non-deterministic computations.

Monads are also often used to limit the side effects a computation may have. We may want write code that can send or receive messages over the network, for example, and be able to do only that. By defining a monad to deal with the specifics of the network, we make sure our network code is not having other side effects, such as writing changes to a database. This makes reasoning about our code much easier, since the monad (together with a decent type system) ensures that the code is doing only the kinds of things it was intended to do.

Moving Forward

Had fun? Start learning you a haskell!

Source Code

The C+ source code below is a little different from what we have seen throughout the tutorial. bind was ambiguous with std::bind, and the compiler was also having trouble matching certain types and deducing others. But hey, you can get things to work with a little extra effort.

C++

Haskell

data Maybe a = Just a | Nothing

return :: a -> Maybe a
return = Just

bind :: Maybe a -> (a -> Maybe b) -> Maybe b
bind Nothing _ = Nothing
bind (Just x) f = f x

mydiv :: Int -> Int -> Maybe Int
mydiv _ 0 = Nothing
mydiv x y = Just (x `div` y)

inc :: Int -> Maybe Int
inc = return . (+1)

compute :: Int -> Maybe Int
compute x = Just x >>= \y -> (mydiv 1 y >>= inc)