Archives for drcabana.org

Monty Hall problem

There is currently an active discussion of the Monty Hall problem on the Clojure discussion group. The discussion centers on the simulation of the problem; here I want to look at it from first principles. The problem is actually an easy one to reason about when looked at from the right perspective.

Here is how the Monty Hall problem was stated in the discussion mentioned above.

Suppose you’re on a game show and you’re given the choice of three doors [and will win what is behind the chosen door]. Behind one door is a car; behind the others, goats [unwanted booby prizes]. The car and the goats were placed randomly behind the doors before the show. The rules of the game show are as follows: After you have chosen a door, the door remains closed for the time being. The game show host, Monty Hall, who knows what is behind the doors, now has to open one of the two remaining doors, and the door he opens must have a goat behind it. If both remaining doors have goats behind them, he chooses one [uniformly] at random. After Monty Hall opens a door with a goat, he will ask you to decide whether you want to stay with your first choice or to switch to the last remaining door. Imagine that you chose Door 1 and the host opens Door 3, which has a goat. He then asks you “Do you want to switch to Door Number 2?” Is it to your advantage to change your choice?

Let’s generalize the problem. You are presented with N doors, behind one is the car, behind the other N-1 doors there are goats. Monty knows what is behind each door. You choose a door, then Monty opens all but two of the doors, yours and one other. Behind the N-2 open doors are goats. Two doors remain closed. Should you switch from your original choice?

Not to ruin the suspense, but for N > 2 you should switch. Let’s see why.

We need to consider the probabilities of several events; here is some notation to describe the relevant events.

  • C: the event that your original choice of door hides the car.
  • C: the event that your original choice of door does not hide the car.
  • M: the event that Monty opens N-2 doors, none reveal the car.
  • MC: both M and C occur.
  • MC: both M and C occur.

By the time you are asked to reconsider your choice, event M has happened, and one but not both of C and C has happened. Another way to put it is that MC xor MC has happened. These two events are mutually exclusive, and one or the other occurred, but you don’t know which. So you need to decide which was more probable.

The choice before you is to decide which is larger: P(MC), the probability of MC, or P(MC), the probability of MC. If P(MC) is larger, you should stick with your orginal choice. If P(MC) is larger, you should switch. If the two are equal, it does not matter whether you switch.

The first key observation: P(M) = 1. No matter what door you chose, Monty always can and always will open all doors but yours and one other, and not reveal the car. He can easily do this because he knows what is behind every door.

The next key observation concerns the notion of probabilistically independent events. Events A and B are independent iff P(AB) = P(A)*P(B). This is a special relationship; not all events are independent. But we have here a special case. An event of probability 1 is independent of any other event.

In particular, since P(M)=1, then M is independent of C, hence P(MC)=P(M)*P(C)=P(C). Similarly, P(MC)=P(M)*P(C)=P(C).

This reduces the choice before us to deciding which is larger, P(C) or P(C). But this is trivial: P(C) is 1/N, and P(C) is (N-1)/N.

An interesting special case of the game is for N=2. There are only two doors, Monty opens no doors, and it makes no difference whether or not you switch doors. For N>2 you should switch.

If you are not convinced that it pays to switch, I have a proposition for you. Let’s play the game a few times with a deck of cards, say using 10 red cards as goats, one black card as the car (N=11). You play the role of Monty, I’ll be your opponent. I’ll consistently play the always-switch strategy. Play is for a dollar a round (more if you prefer). Monty pays if the opponent gets the car, the opponent pays otherwise. I’ll be happy to play for long as you like.

The transpose function

I am going to discuss a function I think belongs in the Clojure core language. The function is called transpose. It is defined as follows.

(defn transpose [xs]
  (apply map vector xs))

Here is what it does:

user> (def m [ ['a 'b 'c 'd] [1 2 3 4] ])
#'user/m

user> m
[[a b c d] [1 2 3 4]]

user> (transpose m)
([a 1] [b 2]  [d 4])

user> (transpose (transpose m))
([a b c d] [1 2 3 4])

Frequently we are presented with data physically organized as rows or records. These rows may be semantically meaningful, but even so, we often want to consider the same data as columns. The transpose functions allows us to easily move between the row and column views of the data. Here’s a typical example.

Suppose we took measurements of some test subjects. For each subject we record height in inches, weight in pounds, age in years. The data looks like this:

user> (def data [ [60.0 180.1 22] [62.0 197.4 19.1] [58.8 166.6 29.1] [59 174 31] ]) 

It is typical to want to compute the sample means for the data. No problem:

user> (map mean (transpose data))
(59.95 179.525 25.3)

Here's another use case. We have several functions f,g,h etc. We want a way to apply all the functions to the same argument x, and return the functions in the order corresponding to (sort (f x) (g x) ...). The problem is a bit contrived, but will serve to illustrate what I think of as the TST pattern: transpose, sort, transpose. Here's the code.

(defn f [x] x) ;; set up test functions
(defn g [x] 3)
(defn h [x] (* x x))
(defn i [x] (* 2 (- x)))

(defn vals [funcs x]
  (for [f funcs] (f x)))

user> (vals [f g h i] 13)
(13 3 169 -26)

With that machinery in place, here's the code showing the transpose-sort-transpose pattern.

(defn fsort [funcs x]
  (let [v (vals funcs x)]
    (->> [v funcs]          ;;line A
         (transpose)
         (sort-by first)
         (transpose)
         (second))))

This will return the functions in the desired order. Here's why. Look at the [v funcs] pair in line A. It looks like

[ [(f x) (g x) (h x) (i x)]  [f g h i] ]

Running that through the first transpose results in

[  [(f x) f ] [(g x) g] [(h x) h] [(i x) i]  ]

Then that is sorted on the first coordinate, reordering it, say to

[  [(g x) g ] [(i x) i] [(h x) h] [(f x) f]  ]

Now the functions are in the order we want. The second transpose decouples the functions and values, resulting in

[ [(g x) (i x) (h x) (f x)]  [g i h f] ]

Running this result through second leaves us with just the functions in the desired order. If you are of a mind to test this, here's one way

(defn in-order? [funcs x]
  (apply <= (vals (fsort funcs x) x)))

user> (every? true? (map (partial in-order? [f g h i]) (range 50000)))
true

I have found the transpose-sort-transpose pattern pops up with surprising regularity. But a pattern pops up in part because we have reified it by giving it a name, and so can recognize it. Giving the transpose function a name makes it much easier to see uses for the TST pattern.

The transpose function is not something I made up. It has a long tradition in linear algebra, where it is called transpose. I hope the transpose function can make the jump into Clojure core. In the meantime, I'll just keep its definition loaded as a yasnippit shortcut.

Synergy, ftw

These are some notes on running Synergy over an ssh tunnel.

The problem: I use two computers at work. The one named Beavis runs Ubuntu, the other, named d73649, runs Win7. Each has its own monitor, mouse, and keyboard, but I prefer to use a single mouse and keyboard as much as possible. Synergy is a sweet piece of free and open software designed for just that purpose. With Synergy running, I can use the mouse to drag the cursor from one monitor to the other. The keyboard addresses the machine whose monitor the cursor is on, and I can copy/paste from one machine to the other.

Here’s the setup. I run the Synergy server, synergys, on Beavis, and the client, synergyc, on d73649. The server is configured via a single config file, ~/.synergy.conf. Mine looks like this:

section: screens
   beavis:
   d73649:
end

section: links
    beavis:
           right = d73649
    d73649:
           left = beavis
end

section: aliases
    d73649:
          d73649.myjob.com
end

Basically I am identifying the allowed client(s), and specifying that the monitor for Beavis is to the left of the monitor for d73649. The server is started by running the synergys command with no options. It runs in the background and listens for client requests.

The drill is to first start the server, then the client. I like to tunnel the Synergy traffic through ssh, to protect it from snooping. I do by starting the client with the following script.

#!/usr/bin/sh
## Connect windows box running synergy client to synergy server listening on port 24800.
## Windows machine has cygwin installed, and we are using the cygwin ssh client.
## Connection is tunneled through ssh to prevent snooping of keyboard traffic.

#------- FQHN of machine running the synergy server
server=beavis.myjob.com

#------- Location of syergy client on local box
#------- Note use of cygwin paths
client=/cygdrive/c/programs/synergy/synergyc.exe

#------- Either the ssh key should have no password set, or there should be
#------- an ssh agent configured to provide the password at execution time.
key=~/.ssh/frd_rsa

ssh -i ${key} -f -N -L localhost:24800:${server}:24800 ${server} && ${client} localhost

I leave the connection running for days at a time. Works like a champ.

Other thoughts: Synergy works with Macs too. There are various gui's for configuration and running the client, but I have not tried them. You'll still need to have a mouse and keyboard on both machines, for dealing with reboots and times when the client session is locked. This is great software. It fills a specific need, and does so quite well.

Sow the wind, reap the whirlwind

At the most recent meeting of TriClojure, I had the pleasure of chatting with Michael Fogus, one of the authors of The Joy of Clojure. I mentioned that I had done some Mathematica (Mma) programming, and Michael asked me about a couple of Mma constructs called Sow and Reap. Sow and Reap are always used in tandem, and together form a powerful debugging tool. I’ll try to illustrate the what they do with some simple examples.

By itself, Sow behaves much like the identity function:

In[1]:= 3 + Sow[4]
Out[1]= 7

To make Sow useful, we have to enclose it inside a call to Reap:

In[2]:= Reap[3 + Sow[4]]
Out[2]= {7, {{4}}}

Disregarding all the nesting for a moment, notice we got back the result of the computation, 7, and the value of the expression wrapped in a Sow call, 4. Together Sow and Reap let you easily examine intermediate results inside of computations. Let’s look at a more complicated example.

First we’ll define a couple of zero argument random functions; one returns an even integer, the other an odd integer.

randEven[] := 2 * Random[Integer, {1, 5}]
randOdd[] := randEven[] - 1

Next we define a function of one argument which uses both of the functions above.

f[x_] := x + randEven[] + randOdd[] + randEven[]

Evaluating f[2] will return a random integer. What if we wanted the option to know what random intermediate values were used to compute the final output of f[2]? We can use Sow and Reap to instrument f in a way that does not disturb its normal behavior, but allows us to look inside a call to f by simply wrapping that call in a Reap.

Here’s the instrumented version of f:

f[x_] := x + Sow[randEven[]] + Sow[randOdd[]] + Sow[randEven[]]

The revised version of f behaves exactly like the original version: it returns a random integer. But if we want to look inside a call to f, we can wrap it it a Reap:

In[4]:= Reap[f[2]]
Out[4]= {15, {{2, 5, 6}}}

We see the final value of the call, 15, and the three intermediate results, {2,5,6}. But why does the Reap output have the double nesting {{2, 5, 6}}? It is because we have the option to decorate our calls to Sow with tags of our choosing, and Reap will collect the values in separate buckets for us.

Here’s the idea. Let’s use tags to separate the odds and evens; the tags can be arbitrary.

f[x_] := x + Sow[randEven[], foo] + Sow[randOdd[], bar] + Sow[randEven[], foo]

Now a call to Reap shows us the odds and evens in separate bins as it were:

In[6]:= Reap[f[2]]
Out[6]= {15, {{4, 2}, {7}}}

The instance of Reap used to enclose our Sow calls need not immediately enclose:


In[9]:= Reap[Min[0, f[2]]]
Out[9]= {0, {{2, 10}, {5}}}

My examples are contrived, but I think they illustrate the idea. A bit of thought might just make you wish you had this facility available in your favorite language. Reap and Sow can be just the thing for getting insight into what is going on inside a computation.

And I have to give mad props to whomever picked such evocative names as Sow and Reap. I am reminded of Hosea 8: “They sow the wind, and reap the whirlwind”. Pure genius.

Cracker Barrel puzzle

A recent posting to the Trifunc group invited the members to solve the Cracker Barrel puzzle.

Brian Adkins, the author of the challenge, posted a solution in Haskell. Here is my Clojure solution.

The game is played on a triangular board with 15 holes, labeled 0 through 14. Initially, 14 of the holes contain pegs, one hole is empty. The goal is to eliminate pegs by jumping over them. For a full explanation of the game, see the links cited above. My numbering scheme for the game board is similar to the one shown here, except that I number from 0 to 14 rather than from 1 to 15.

I represent moves by triples I call neighborhoods. Each neighborhood (for example [0 2 5]) designates the coordinates of three contiguous holes. Corresponding to the triple of coordinates is a triple of booleans denoting the presence/absence of pegs in the holes. If the triple of booleans is either [true true false] or [false true true], then a capture is possible, and the triple of integers unambiguously determines a legal move.

Though the solve and solve-all functions are mutually recursive, I did not bother to use a trampoline. Blowing the stack is not a danger here. The stack depth is proportional to the length of the game, and the maximum length for a game is thirteen moves.

Lazy coins

Here’s a problem I ran across on the CompSci Stack Exchange. Suppose you have a biased coin that comes up heads with probability p. You don’t know what p is, but assume it is neither 0 nor 1. How can you use the coin to simulate a fair coin, i.e. one that produces heads with probability very close to 1/2?

The procedure is surprisingly simple. To produce one “fair” toss, you toss the biased coin n times, and count the number of heads. If the count is even, the result is heads, if odd, tails. The value of n needed will depend on the bias of the coin, but typically something like n=20 will suffice.

Lazy sequences provide a simple and elegant way to build probabilistic models of coin tossing.

As seen above, eyeballing a few tosses is not going to suffice to distinguish fair coins from unfair. We’ll need to collect some statistics.

Here’s how we use repeated tosses to simulate a fairer coin.

Notice the oscillation in the sequence of sample means. To understand where it’s coming from, imagine flipping a coin that always comes up heads, an evil coin. The parity of the number of heads observed is exactly correlated to the parity of the blocksize n in our unbias function. This causes our scheme to fail miserably for evil coins, as below.

With a highly biased (but not evil) coin we see much the same oscillation for small blocksize n. If we track the sample means over a contiguous range of blocksizes n, high sample means alternate with low sample means. Regardless of these oscillations, as n grows, the sample means does converge to 1/2. Can we prove it?

It is convenient to the coin as a random variable X which assumes values 1 and -1 with probabilities p and q=1-p respectively. Flipping the coin repeatedly results in a sequence of independent, identically distributed random variables X_i.

Consider the random variable Y_n defined as:
Y_n = \frac 12 + \frac 12 \left(\prod_{i=1}^n X_i\right)

Notice that Y_n takes on values 0 and 1, depending on the parity of the number of heads seen in n coin tosses, exactly as in unbias function.

The expected value of Y_n is:
E\left(Y_n\right) = E\left(\frac 12 + \frac 12 \left(\prod_{i=1}^n X_i\right)\right) = \frac 12 + \frac 12 \,E\left(\prod_{i=1}^n X_i\right)

Because the X_i. are statistically independent, we can exchange the order of product and expectation:
E\left(\prod_{i=1}^n X_i\right) = \prod_{i=1}^n E\left(X_i\right) = \prod_{i=1}^n (p-q) = (p-q)^n

Since |p-q| < 1, the term (p-q)^n converges to zero as n increases, and done. I'll close by pointing out a couple of potential pitfalls of using lazy sequences to model coin tossing. Both are easily avoided. Here's the first. A given coin is a fixed sequence. Each time I look at the first 10 elements they are the same. Make new coins as required.

Don’t do this: The sequence fair is now a million elements long and won’t be garbage collected as long as you hold a reference to its head. If you must name a coin, try to do it inside a let block so that it will be garbage collected when the let block falls out of scope.

A thread safe fsm?

Today I wanted to play around with building a small thread-safe finite state machine. I’m mostly interested in the thread safety angle, so the machine is kept extremely simple. Clojure’s protocol feature turned out to be just what the doctor ordered.

The fsm consists of three things: the current state, the input alphabet, and a transition function which maps the current state and an input symbol to the next state. The fsm has two ‘methods’: state, which returns the current state, and act which acts on the input symbol and current state to compute the next state. Here is how act and state are implemented.

Let’s make things more concrete by building a machine. Our input alphabet will contain only a single symbol, 1. The machine will count the parity mod 4 of the number of times it has seen the symbol 1. It ignores input other than the symbol 1. We’ll call the machine blue, in honor of the Sweetwater Blue Ale I had while building it.

Playing with it at the repl we see that at first glance everything appears to work.

But is it thread safe? Let’s beat on it with some threads.

Well, 100000 mod 4 is 0, so it got the right answer. And blue passed various similar tests, which I’ll spare you. But the thought occurs that maybe my test is not a very good test. So let’s test the test. Yes, I’m kind of goofing on you, but not entirely. The way to test the test is to build a flawed version of the act function, one with a deliberate concurrency bug, and see whether the test catches it.

Using the new, flawed definition, we build another machine and test again. Notice the final state is ::1, which is incorrect. The red machine started in state ::0, so after seeing the symbol 1 exactly 100000 times it should have been in state ::0. This suggests the beat-on-it-with-threads test has some merit, and that having passed it repeatedly gives the thread safety of the blue machine some plausibility.

bind, unit, and all that

I’ve given in to the unholy urge to write a monad tutorial. Here are some remarks that might help you decide whether it’s worth your time to read on.

My intended readers are programmers interested in functional programming, not mathematicians. No category theory is required.

All the code will be in Clojure. The level of Clojure expertise required is intermediate. The reader should be comfortable defining functions, especially functions that return functions as values, and should understand the comp operator. That said, there is nothing in the code that exploits features unique to Clojure. Everything I’m going to show you could easily be done Scheme or (I suspect) Ruby or Python.

My goal will be to explain what a monad is. That might seem too obvious to mention, but bear with me. Suppose I were writing about regular expressions. I might come up with several very different tutorials. One might discuss the languages recognized by finite state automata. Another might concern the symbolic technology used to describe regexes. Another tutorial might show how to use regexes to solve some particular class of problems. Someone new to regular expressions and looking for the third tutorial might well find the first confusing.

This tutorial will explain what a monad is, with as little mathematical formalism as possible. It does not attempt to show how to use monads to solve a variety of practical problems, nor does it make any use of the clojure.contrib.monads library. I try to keep things as concrete as I can, but a certain level of abstraction is inherent in the topic. I have put much effort into explaining the meaning of the so-called monad laws as clearly and intuitively as the subject and my ability allow.

Here’s the plan. We’ll start by solving a simple but not entirely trivial programming problem, using pure functional programming. We will not use mutable state, nor macros. A fairly natural solution falls out; that solution contains the elements that make up a monad. We’ll discus what those elements are, what the monad laws are, and what those laws mean.

So much for the prelude; let’s get started.

I will take as given that the function is the fundamental unit of modularity in functional programming, and that functions interact via composition. Here is our starting point.

The following two expressions are equivalent; both are examples of the composition of functions. The first example is more idiomatic, but we will be using the comp operator and the latter form extensively.

Our goal will be to build an enhanced composition operator, call it comp*, that traces the flow of data through composed functions. The comp* operator should compute the value one would expect, and also present intermediate results as data flows into and out of the functions that make up the computation. Here’s an example of what I have in mind.

The first problem we have to tackle is that Clojure functions do not have access to their names (they may not even have names), and so cannot provide their names to the proposed comp* operator. One way to get around that is to create enhanced functions that do know their names.

The type signature of (verbose f) is not the same as that of f. We are going to need to keep track of type signatures, so let’s introduce some terminology. I’ll refer to three types of functions, imaginatively called type 0, type 1, or type 2, depending on type signature, as follows:

  • type 0 has signature Number->Number
  • type 1 has signature Number->[String, Number]
  • type 2 has signature [String, Number]->[String, Number]

By using verbose to create type 1 functions we have traded one problem for another. The type 0 functions sq and dbl could be composed with one another, but did not know their own names. The type 1 functions v-sq and v-dbl know their own names but they cannot be composed with one another. Their type signatures don’t allow it.

We seem to be at an impasse, since our goal is precisely to compose type 1 functions. Strictly speaking that is not possible, but let’s see what we can do. The way forward is to define a new function, which we will call bind.

Bind takes a type 1 function as input, and returns a type 2 function. Notice that unlike type 1 functions, type 2 functions can be composed.

The output traces the computation, but the calling syntax is ugly and puts too much burden on the user. We don’t want to have to call bind explicitly, nor to wrap the final argument by hand. And it would be nice to be able to compose an arbitrary number of functions. Not a problem.

That completes the first part of this tutorial. We solved the problem we set out to solve. Notice that we did so without using mutable state, without using macros, and without modifying the source code of the type 1 functions whose compositions we set out to trace. The entire solution was based on the creation of two functions, unit and bind, which we then used to implement some syntactic sugar, comp*.

In solving our toy problem, we built a monad. What is a monad? It is essentially a pair of functions, bind and unit, that satisfy three conditions known as the monad laws. The principal goal of this post is to present the monad laws in the simplest way possible. But we will start with somewhat complicated version of the monad laws. Why?

Clojure has an excellent monad library called clojure.contrib.monads. Its author, Konrad Hinsen, has written some fine monad tutorials which I hope this article will complement. I want this presentation to be useful to someone who might want to use Konrad’s work. That means I need to bridge the gap between my exposition and his. I will start by considering the monad laws as Konrad presents them in part 2 of his Monad Tutorial for Clojure Programmers.

Yow! That is quite something to get your head around. I promised a simple version the monad laws. Here’s my first cut, with a simpler version still to come.

Comparing Konrad’s presentaton and mine, it is evident that we are not using the same terminology. Allow me to translate. The function I call bind, Konrad calls m-bind. What I call unit, Konrad calls m-result. I use f and g, he uses function, function1, and function2. But those are just names, and not terribly important.

The important difference between Konrad’s version of the monad laws and mine stems from the fact that I define bind as a function of a single argument. My bind takes a function as an argument, and returns a function. Konrad’s version of bind takes two arguments, a function f and a value x, and it returns a value computed using f and x. The returned value is generally not a function. The monad laws describe the contracts that bind and unit must fulfill. Not surprisingly, the arity of bind affects the written formulation of the monad laws. By reducing the arity of bind, I get a simpler set of laws.

Why did I choose one way and Konrad the other? Because we had different purposes in mind. Konrad wanted to build a monad library, I wanted to present the monad laws in the simplest manner possible.

I chose to write bind in a point-free style precisely because that style allows a simpler formulation of the monad laws. Konrad chose his version of bind because it is better suited than mine to the business of building a monad library. In particular, his choice to reverse the usual order of arguments and go with (m-bind [x function] …) instead of (m-bind [function x]…) was brilliant. I hope to explore that idea in another posting.

I promised you a simple version of the monad laws. We’re not there yet. To get there, we are going to do two things. One is that we will abandon Clojure’s prefix notation in favor of infix notation. The second amounts to a change of variables. Notice the repetition of the pattern (comp (bind f) g) in my version of the monad laws. This is a sign of a missing abstraction. We’ll write (comp (bind f) g) as f * g, and write (comp f g) as f . g.

Then we can write the monad laws as follows. Here parentheses serve only to indicate operator precedence, and bind has precedence over composition.

Together, the first two monad laws require that unit be an identity element with respect to the * operator. Just as 0 is an identity element with respect to addition and for every x, x+0 = 0+x = x, so unit is an identity element with respect to *, and for every f, f*unit = unit*f = f. Identity elements are sometimes called unit elements, which is why I prefer the name unit over the more commonly used return, or Konrad’s m-result. I think return is an especially unfortunate choice of name, because the word return is already overloaded with other associations.

Take a hard look at that third monad law. Does it seem familiar? You have probably seen that pattern before. Compare the third monad law to the law of logarithms:

Both log and bind follow the general pattern foo(a ⊕ b) = foo(a) ⊗ foo(b). Functions that have this property are called homomorphisms. The logarithm function is typically the first and certainly the most important homomorphism that most of us encounter. But I promised no category theory, and the spirit of that promise demands that I not pursue abstract algebra either. Let’s take a different approach, and think about type signatures.

We invented bind in order to get around the fact that type 1 functions cannot be composed. Recall that if f and g are type 1, then (bind f) and (bind g) are of type 2, and type 2 functions can be composed with one another. Notice that (bind f) can be also be composed with g, resulting in a type 1 function. So there are two ways to use bind, f, and g to create a type 2 function. One is (comp (bind f) (bind g)). The other is (bind (comp (bind f) g)). The third monad law demands that these two ways of building a type 2 function result in exactly the same function.

Okay, we have defined bind and unit. We have contemplated the monad laws. Now it’s time to bite the bullet and show that our particular bind and unit in fact satisfy the monad laws. In each of the cases below, the functions f and g are assumed to be of type 1.

The first monad law states that (= (comp (bind f) unit) f).

Asserting the equality of the functions f and (comp (bind f) unit) is equivalent to asserting that for any and every type-appropriate argument n, (f n) is equal to ((comp (bind f) unit) n). Here’s the proof.

The second monad law asserts that (= (comp (bind unit) f) f). We need to show that for any f and any type appropriate n, (f n) is equal to ((comp (bind unit) f) n). Here we go.

Keep in mind that this sort of gruntwork is not part of the day to day use of monads. Only the monad designer needs to establish that the monad laws hold.

We have just one more law to go. I tried doing the next proof in the style of the first two proofs, but found the prefix notation tough sledding. I’m going to switch to infix notation. I love prefix for programming, but infix seems better suited to doing algebra.

Here are the notational shortcuts I’ll be using:

  • Bg denotes (bind g)
  • f x denotes that the function f is applied to x
  • f . g denotes (comp f g)
  • f . g x denotes ((comp f g) x); notice composition has precedence over application.
  • Functions are right associative: f g h x denotes f (g (h x)).
  • Notice that f g h x is the same as f.g.h x
  • x : y denotes the string concatenation (str x y)
  • F x denotes (first x)
  • S x denotes (second x)
  • commas are whitespace
  • [a, b] denotes a pair with elements a and b

Let’s describe the action of bind in the infix notation:
Instead of ((bind g) [t x]) we’ll write Bg [t x].

We’ll make heavy use of this identity, which is the definition of bind written in the new notation: Bg [t x] = [t : F.g x, S.g x ]
Make sure that makes sense to you, or just skip the next section entirely.

The third monad law states that (comp (bind f) (bind g)) = (bind (comp (bind f) g)). We’ll write that as Bf . Bg = B(Bf . g).

To show this equality holds, we’ll show that for arbitrary [t x],
Bf . Bg [t x] = (B(Bf .g)) [t x] ;; call this ML3

First consider the left hand side of ML3.

Now consider the right hand side of ML3:

Hold that thought, while we figure out what h x is.

Now we know what h x is, and can substitute for F.h x and S.h x in expression B. In particular,
expression C shows us what h x is, and we see that
F.h x = F.g x : F.f.S.g x
and
S.h x = S.f.S.g x.
Substituting these into expression B give us
[t : F.h x, S.h x] =
[t : F.g x : F.f.S.g x, S.f.S.g x] ;; Call this D

Notice that expressions A and D are identical, which establishes ML3. We’re done. We showed that bind and unit satisfy the monad laws.

A remark by Donald Knuth comes to mind: “Beware of bugs in the above code; I have only proved it correct, not tried it”. I am not as brave as Professor Knuth, so I wrote some code to test whether the monad laws actually hold. The idea is to randomly generate functions, perform the appropriate compositions, and then test whether the laws hold. It’s actually pretty easy to do; you can get the code here. This post has already run longer than I expected; I’ll say no more about testing.

So what is a monad? It’s a pair of functions, bind and unit, which satisfy the monad laws. Bind and unit do not have fixed definitions like sine and cosine. Instead, they vary from monad to monad. But in all cases they must satisfy the monad laws. There is a bit more to it, in that the monad laws presumably operate on some specific set of functions. Here I waved my hands and called those functions type 1 functions. A more formal treatment would specify those functions more precisely, and consider a monad to be a triple, say bind, unit, and a type constructor, or some such. I chose to omit such considerations in favor of simplicity.

But in the context of programming, what are monads for? It is unfortunate that the term design pattern has been turned into a synonym for meta-boilerplate. Because design pattern might have been a fair description for the methodological role of monads The choice to write or use a monad is the choice to organize one’s code in a coherent, principled, and flexible fashion. My one example is hardly enough to show that. But perhaps it may help a programmer new to monads to better understand what the fuss is about, and to get started reading other material.

Speaking of further reading, most treatments of programming with monads default to the use of Haskell. But the Clojure community is fortunate to have some very good monad tutorials. Check out Konrad Hinsen’s and Jim Duey’s articles.

Credit where credit is due. I stole the use of the name unit, and the idea for my tracing example, from You Could Have Invented Monads, by Dan Piponi. I think this is a fantastic article, but it does require some knowledge of Haskell.

I hope you found this informative; thank you for your attention.

The expressiveness of functional programming

I’d like to revisit the discrete convolution problem described in my previous post, to consider the expressiveness of functional programming. Here I am interested in the power of code to communicate intent; the bare metal speed of the code is secondary.

The various comments and proposed implementations in this Google Groups thread are what got me thinking about this. My thanks to all the contributors. I’ll begin by quoting some imperative code posted to the thread:

The code is brief, and the style is familiar to most programmers. But it is all how, and no what. I look at it, and I do not see what it means. This proposed solution to an inherently mathematical problem make no use of mathematical constructs other than addition and subtraction. I think this is a shame, since convolution has a mathematical structure. A convolution is a sequence of inner products:

My proposed solution is longer than the imperative solution, but not much longer. And I should not have had to write inner-product; the inner product is so fundamental a mathematical operation that it should be part of every language.

I believe conv describes convolution more clearly than does convolve, because it does so in terms of operations on mathematical structures rather than as the outcome of changes to the contents of computer memory.

It also seems to me that convolve overspecifies the solution. The introduction of loops and mutation imposes a sequential structure on the solution that is not present in the underlying mathematics. For some fascinating thoughts along those lines I highly recommend Guy Steele’s ICFP 2009 presentation. It is an eye-opener.

While I’m ranting, I don’t think I should have had to write the tails function either, at least in so naked a form. The notion of an iterated function deserves a place in a functional programming language. I think Clojure is missing something along the lines of nest-list (stolen from Mathematica’s NestList).

Postscript: a few days later, while browsing core.clj, I noticed that Clojure already has the function iteration feature I wanted, it is called iterate. The implementation is quite pretty, much nicer than what I came up with:

Discrete convolution of finite vectors

Isaac Hodes posted a request to both StackOverflow and the Clojure mailing list, asking for some help implementing a discrete convolution function in Clojure.

I am trying to create a lazy/functional/efficient/Clojuresque function to carry out convolution on two lists/vectors of (ideally BigDecimals but that may be inefficient) doubles.

Right now, it seems as though this is something Clojure cannot do.

Let’s cut right to the code. Here’s my implementation:

So, how does this implementation work? And just what does the convolve function compute? Let’s consider the latter question first.

Recall how you multiply polynomials. Suppose we want to multiply 1+x by 2+3x+5x^2. Using the standard rules of algebra, we find the answer by gathering like terms. The order-0 term is 1*2=2. The order-1 term is 1*3x + x*2=5x, the order-2 term is x*3x + 5x^2=8x^2, and the order-3 term is x*5x^2=5x^3. It is notationally convenient to drop the variables and represent the polynomials as vectors of coefficients in standard form. In the case above, we multiply [1 1] by [2 3 5] and get [2 5 8 5]. The convolve function performs exactly the computation required to multiply the polynomials.

So, how does it work? My hope is that some (crude) pictures will tell the story; you be the judge. Let’s consider a particular example, (convolve [1 2 3] [4 5 6 7 ])
whose value is (4 13 28 34 32 21). Here’s the picture:

3---2---1
--------4---5---6---7    ;; 1*4=4

----3---2---1
--------4---5---6---7    ;; 2*4+1*5=13

--------3---2---1
--------4---5---6---7    ;; 3*4+2*5+1*6=28

------------3---2---1    
--------4---5---6---7    ;; 3*5+2*6+1*7=34

----------------3---2---1
--------4---5---6---7    ;; 3*6+2*7=32

--------------------3---2---1
--------4---5---6---7    ;; 3*7=21

As requested, the code is lazy:

user> (class (convolve [1 2 3] [4 5 6 7 ]))
clojure.lang.LazySeq

But does it perform? Let’s take a look.

user> (def foo2 (into [] (range 100)))
#'user/foo2

user> (def foo3 (into [] (range 1000)))
#'user/foo3

user> (def foo4 (into [] (range 10000)))
#'user/foo4

user> (time (do (doall (convolve [1 2 3] foo2 )) nil))
"Elapsed time: 2.827 msecs"
nil

user> (time (do (doall (convolve [1 2 3] foo3 )) nil))
"Elapsed time: 76.275023 msecs"
nil

user> (time (do (doall (convolve [1 2 3] foo4 )) nil))
"Elapsed time: 4101.002754 msecs"

Ouch. The time appears to be quadratic in the length of the second argument. We can do better than that with just a minor tweak. The problem is the performance of the drop function. Let’s get rid of it. Here’s the new version of the code.

And here are the new performance numbers, which look much better. A million points in under three seconds, with no dropping into java-isms. Sweet.

user> (def foo5 (into [] (range 100000)))
#'user/foo5

user> (def foo6 (into [] (range 1000000)))
#'user/foo6

user> (time (do (doall (convolve [1 2 3] foo3 )) nil))
"Elapsed time: 6.909401 msecs"
nil

user> (time (do (doall (convolve [1 2 3] foo4 )) nil))
"Elapsed time: 62.894431 msecs"
nil

user> (time (do (doall (convolve [1 2 3] foo5 )) nil))
"Elapsed time: 255.064899 msecs"
nil

user> (time (do (doall (convolve [1 2 3] foo6) nil)))
"Elapsed time: 2784.131159 msecs"
nil