Archives for

Discrete convolution of finite vectors

Isaac Hodes posted a request to both StackOverflow and the Clojure mailing list, asking for some help implementing a discrete convolution function in Clojure.

I am trying to create a lazy/functional/efficient/Clojuresque function to carry out convolution on two lists/vectors of (ideally BigDecimals but that may be inefficient) doubles.

Right now, it seems as though this is something Clojure cannot do.

Let’s cut right to the code. Here’s my implementation:

So, how does this implementation work? And just what does the convolve function compute? Let’s consider the latter question first.

Recall how you multiply polynomials. Suppose we want to multiply 1+x by 2+3x+5x^2. Using the standard rules of algebra, we find the answer by gathering like terms. The order-0 term is 1*2=2. The order-1 term is 1*3x + x*2=5x, the order-2 term is x*3x + 5x^2=8x^2, and the order-3 term is x*5x^2=5x^3. It is notationally convenient to drop the variables and represent the polynomials as vectors of coefficients in standard form. In the case above, we multiply [1 1] by [2 3 5] and get [2 5 8 5]. The convolve function performs exactly the computation required to multiply the polynomials.

So, how does it work? My hope is that some (crude) pictures will tell the story; you be the judge. Let’s consider a particular example, (convolve [1 2 3] [4 5 6 7 ])
whose value is (4 13 28 34 32 21). Here’s the picture:

--------4---5---6---7    ;; 1*4=4

--------4---5---6---7    ;; 2*4+1*5=13

--------4---5---6---7    ;; 3*4+2*5+1*6=28

--------4---5---6---7    ;; 3*5+2*6+1*7=34

--------4---5---6---7    ;; 3*6+2*7=32

--------4---5---6---7    ;; 3*7=21

As requested, the code is lazy:

user> (class (convolve [1 2 3] [4 5 6 7 ]))

But does it perform? Let’s take a look.

user> (def foo2 (into [] (range 100)))

user> (def foo3 (into [] (range 1000)))

user> (def foo4 (into [] (range 10000)))

user> (time (do (doall (convolve [1 2 3] foo2 )) nil))
"Elapsed time: 2.827 msecs"

user> (time (do (doall (convolve [1 2 3] foo3 )) nil))
"Elapsed time: 76.275023 msecs"

user> (time (do (doall (convolve [1 2 3] foo4 )) nil))
"Elapsed time: 4101.002754 msecs"

Ouch. The time appears to be quadratic in the length of the second argument. We can do better than that with just a minor tweak. The problem is the performance of the drop function. Let’s get rid of it. Here’s the new version of the code.

And here are the new performance numbers, which look much better. A million points in under three seconds, with no dropping into java-isms. Sweet.

user> (def foo5 (into [] (range 100000)))

user> (def foo6 (into [] (range 1000000)))

user> (time (do (doall (convolve [1 2 3] foo3 )) nil))
"Elapsed time: 6.909401 msecs"

user> (time (do (doall (convolve [1 2 3] foo4 )) nil))
"Elapsed time: 62.894431 msecs"

user> (time (do (doall (convolve [1 2 3] foo5 )) nil))
"Elapsed time: 255.064899 msecs"

user> (time (do (doall (convolve [1 2 3] foo6) nil)))
"Elapsed time: 2784.131159 msecs"

British Lottery Puzzle

Have I mentioned that I’m a sucker for probability puzzles? I found this one at

Two teams from opposite ends of the country have been drawn to play each other in a competition. They need to determine which team plays at home and do so fairly at a distance. They decide to try to generate a 50:50 probability from the following week’s UK national lottery draw.

The lottery draw consists of drawing 7 balls from 49.

To make the situation clear, there are 49 balls, labeled 1-49. The balls are thoroughly mixed, and 7 are drawn at random.

The author of points out that a very simple (and incorrect) approach to generating a 50:50 distribution is use the parity of the first ball drawn. Let’s simulate the process in Clojure.

(defn balls [n]
  "Random shuffle of the integers 1 ... n, inclusive"
  (shuffle (range 1 (inc n))))  ;;shuffle introduced in version 1.2

(defn lottery [] (take 7 (balls 49)))

(defn trial []
  (first (lottery)))

(defn trials [n]
  (let [outcomes (take n (repeatedly trial))
        odds (count (filter odd? outcomes))]
    (float (/ odds n))))

The results of the simulation confirm the expected bias, given that 25 of the 49 balls are odd:

(trials 100000)
user> (float (/ 25 49))

So, how do we use the lottery drawing to simulate a fair coin? The author of proposes a somewhat complicated method:

We know that there are C(49,6)=13,983,816 ways of drawing 6 balls from 49. This is an even number so there's definitely a way of describing a 50% probability, even if it involves writing down 6,991,908 combinations explicitly. But can we find a more succinct way?

It turns out that such an approach is possible. A bit of trial and error shows that there's exactly a 50% chance that the lowest of the six main balls is one of {2,3,4,5,6,15,16,17,21,36,39,41,43,44}.

I'm not even going to think about that. There is a much simpler way to simulate the coin toss. Here's the code, with an explanation following.

(defn trial-adjusted []
  (let [draw (lottery)
        fst (first draw)
        snd (second draw)]
    (if (= 1 fst) snd fst)))

We go back to the parity approach, but we fix the sample space. The original approach uses 25 odd balls and 24 even ones, introducing a bias in favor of odd. We simply throw out one of the odd balls by disregarding it. I chose "1", but it does not matter which of the odd balls is thrown out. Simulation confirms the bias is gone.

user> (binding [trial trial-adjusted] (trials 100000))

The Matchbook Problem

I have been reading a lovely book by Paul Nahin, Digital Dice. The book is subtitled Computational Solutions to Practical Probability Problems. I don’t know how practical the problems are, but they are a lot of fun.

Probability theory is rife with seemingly simple problems which turn out to be less simple than first appears. When possible, I like to approach such problems from two sides: analytically and computationally. If I can come up with two solutions, I can test one against the other. If the solutions don’t agree, at least one of them is wrong. Nahin’s book encourages this approach, and provides some fine practice material. For example, Nahin offers this problem, taken from a 1970’s textbook:

A pipe smoker has two booklets of matches in his pocket, each containing 40 matches initially. Whenever a match is required, he picks one of the booklets at random, removing one match. Write a program using random numbers to simulate the situation 100 times and determine the average number of matches that can be removed until one booklet is completely empty.

This is easily simulated in terms of coin tosses. We repeatedly flip a fair two-sided coin; heads represents one book of matches, tails the other. We’ll tally the outcome of each coin toss, and the trial is over as soon as we see the 40th head or the 40th tail, whichever comes first. The outcome of the trial is the total number of coin tosses. What we want to know is the average number of coin tosses per trial.

We’ll do the simulation in Clojure. First we’ll need to model a coin toss:

(defn coin-toss []
  (if (zero? (rand-int 2))
    [1 0]
    [0 1]))

Here’s how we use the coin to simulate one trial.

(def *N* 40)

(defn trial
  ([] (trial 0 0))
  ([heads, tails]
     (if (or (= *N* tails) (= *N* heads))
       (+ heads tails)
       (let [[h t] (coin-toss)]
         (recur (+ heads h) (+ tails t))))))

We need some code to conduct repeated trials, gather the results, and compute the average.

(defn average [coll]
  (float (/ (reduce + coll) (count coll))))

(defn trials [n]
  (average (take n (repeatedly trial))))

With a simulation in hand, let's think about solving the problem mathematically. For fixed N (40, in our case), we will repeatedly flip a coin until we see the Nth head or the Nth tail. Clearly a trial will require at least N coin tosses. The most tosses required will be 2N - 1. That happens when we manage to toss N-1 heads and N-1 tails, for a total of 2N - 2 tosses. The next toss will be decisive, no matter what the outcome, ending the trial at 2N - 1 tosses.

So every trial ends in N + K tosses, where 0 <= K < N. Write P(N, K) for the probability that the trial ends after exactly N+K tosses. The expected value we are trying to compute is the sum of P(N,K) * (N+K), taken over 0 <= K < N. The task boils down to computing P(N,K). How can we calculate P(N,K)? Suppose the trial ends with N heads and K tails. The final toss had to have been heads. Then we must have had K tails out of first N+K-1 tosses. How many ways can that happen? The answer is well known: (N+K-1)!/[( N-1)! K!]. What is the probability of of any one of those outcomes? Given a fair coin it is 1/2^(N+K).

Let's implement the calculation so we can compare it to the results of the earlier simulation.

(defn factorial [n]
  (reduce * (range 1 (inc n))))

(defn choose [n k]
  (/ (factorial n)
     (factorial (- n k))
     (factorial k)))

(defn prob [n k]
  (/ (choose (dec (+ n k)) k)
     (clojure.contrib.math/expt 2 (+ n k))))

(defn expected-outcome []
  (let [n *N*
        probs (for [k (range n)] (prob n k))
        tosses (range n (+ n n))]
    (float (reduce + (map * probs tosses)))))

Now that we have both the simulation code and the calculation code, the only thing needed is a bit of scaffolding to compare the two approaches.

(def *num-trials* 10000)

(defn comparison [n]
  (binding [*N* n]
       [*N* (avg-outcome *num-trials*) (expected-outcome)]))

The comparison function is parameterized so we can vary the problem a bit. First let's try the original N=40.

user> (comparison 40)
[40 72.7887 36.442886]

The simulation and the calculation differ by a factor of about two. Why? Look back to the line that reads "Suppose the trial ends with N heads and K tails". I left out half the cases, the ones that end with N tails and K heads. The thing to do is correct the prob function:

(defn prob [n k]
  (/ (choose (dec (+ n k)) k)
     (clojure.contrib.math/expt 2 (dec (+ n k)))))

After that correction the comparison function shows that the simulation and the calculation agree nicely across a range of values of N:

user> (map comparison (range 20 60 5 ))
([20 34.9941 34.985172] [25 44.3824 44.386242] [30 53.7716 53.84531] [35 63.3641 63.348217] [40 72.8602 72.88577] [45 82.5092 82.4516] [50 92.0544 92.04108] [55 101.6011 101.65071])

In my experience, when the simulation and the calculation disagree, the simulation is usually more trustworthy. Simulations tend to be straightforward, while analytic approaches can be tricky to get right.

Cartesian product

I was browsing the 6-24 Clojure-log archive and ran across an invitation from Craig Andera:

I’ve knocked out a quick definition of something that computes the Cartesian cross product of multiple sequences. But as always when I write Clojure, I have to assume someone else can do it in less code. Anyone care to take a look?

So I took a look. As luck would have it, that morning I been thinking about the reduce function, and reduce was exactly the tool needed for Craig’s challenge. Here’s my version.

(defn cross-prod [& colls]
  (->> colls
       (reduce #(for [x %1 y %2] [x y]))
       (map flatten)))

Antoine de Saint-Exupery famously remarked that “perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away”. I think he would have appreciated Clojure.

Head to Head testing

Dilbert cartoon on agile programming

Wally’s comment is insightful. Giving a thing a name makes it possible discuss that thing. Witness the enormous discussion of Test Driven Development on the internet.

The purpose of this post is to give a name to a testing practice I have found useful, and to explain why it is useful. I call the practice Head to Head testing, or HTH. I propose HTH as a solution to a problem that arises in test driven development, and will introduce HTH by first explaining the problem. My explanation will use functional programming notation, but the underlying issues are the same in object oriented or any other form of programming. This is because the issues are mathematical in nature.

Suppose my task is to write a function to compute logarithms base 10. TDD demands that I begin by constructing some tests. OK; here they are.

(assert (= (log 1) 0))
(assert (= (log 10) 1))
(assert (= (log 100) 2))
(assert (= (log 1000) 3))

What do these tests accomplish? A function that fails these tests is certainly an inadequate log function, but passing this suite proves little. Infinitely many functions satisfy these tests. Let’s step back and consider the logic of testing.

Suppose we are testing the implementation of a function F defined on a finite domain D. An ideal test suite would have one test case for each element of D, and would exhaustively check whether the implementation of F was correct at each point in D. An implementation that passed this ideal test suite would be demonstrably correct.

The problem is that exhaustive testing is impossible in practice. Many functions have infinite domains, or domains so large that exhaustive testing is not physically feasible. Since we cannot test our implementation at each point in its domain, we are forced to settle for testing on a subset the domain. But which subset? I contend that we have now entered the realm of statistical sampling.

My test suite for the log function proposed to test at the points {1, 10, 100, 1000}. Looked at through the lens of statistical sampling, this test suite suffers from two defects. The lesser defect is that we have no idea whether a test of size N=4 is sufficient. The fatal defect is that the points chosen for testing were not chosen randomly. This makes the test suite statistically worthless.

Imagine by way of analogy that you engaged me to conduct a survey. I offer to use my friends as the sample population. You ought to conclude either that I am pulling your leg or that I am incompetent statistician. But in the programming world, this approach to sampling is commonly accepted. Test suites are hand written by programmers who write whatever test cases they deem appropriate. This flies in the face of statistical practice. A statistically valid test requires randomly sampled data, not data hand-chosen by a human being.

It could be argued that as software testers we are not in search of statistically valid tests, but of errors. And that is true as far as it goes, but the search for errors demands randomized data for two reasons. The computer can generate untold millions of random test cases for each one a tester writes write by hand. But the deeper purpose of randomization is to avoid blind spots or bias in the selection of test data. If I design the tests, presumably I choose the data based on what I think might go wrong in the software. What about all the things that never occur to me to test? In the long run, randomized tests go everywhere. They have no long term biases or blind spots.

Suppose I decide to do the right thing by testing the log function on randomly sampled data. How could I do that? It’s not hard. Here’s how to conduct a single trial. Randomly pick two positive real numbers x and y. The corresponding test is this:

(assert (= (log (* x y)
           (+ (log x) (log y)))))

The test exploits the fact that log(x * y) must equal log(x) + log(y). It would be a simple matter automate this test by constructing a stream of randomly generated pairs (x y). Doing this would make it possible to test the log function at hundreds of millions of points rather the relative few a developer or tester could test using hand written tests.

That’s all well and good, but my example is exceptionally well suited to randomized testing. In my example I could exploit a mathematical property of the log function to construct tests with known expected outcomes. It is not obvious how to do this while testing non-mathematical software. That is where Head to Head testing enters the picture.

The HTH approach is as follows. Suppose we are implementing a function F. We build not one but two independent implementations of F; call the implementations g and h. We also create a suitable of set randomized data D, and test at each point d in D whether g(d) = h(d). If g and h disagree on d, then at least one of them is in error. Notice that we do not have to trust that either g or h is correct, nor do we need to know correct value of F at d. If g and h disagree at d, at least one of them must be in error. Both must be investigated. This is a feature, not a bug. Examining the differences in the two implementations will lead us to the flaw much more quickly and easily than would a the study of a single implementation.

A nice side effect of this approach is that a failed HTH test does not result in a mere assertion of failure. We are handed a test case known to trigger the failure, a very useful thing for debugging. And it is rare to find a lone bug. Very likely running more tests will result in a growing collection data points that trigger failures.

Return for a moment to the question of sample size. How many test cases are appropriate? As software testers we have a tremendous advantage over statisticians. In general it is very difficult to determine how large a sample size N is required to create a statistical test of known power. But we are not doing mathematical statistics, we are testing software. Just make N huge. Better still, make N infinite. Run HTH tests continuously. Many advocate continuous builds; why not continuous testing? By continuous testing I don’t mean a test suite that runs as part of the build, with successful completion acting a gate on code check-in. I mean tests that run 24/7 on dedicate hardware, never completing, just searching tirelessly for bugs. Machines are cheap and bugs are expensive.

Is it practical to build two independent implementations of a function F as part of testing? Sometimes it’s not that hard to do. Other times you don’t have to, because someone else has already written the other implementation for you. Imagine you are writing statistical function. Chances are you are not the first to write that function. Test your version HTH against a commercial product, or an open source package. Let your competitor help you test. Maybe your implementation of F is multi-threaded. Run F head to head against itself on different boxes, with different thread pool sizes. Run the latest version of F against a previous version for regression testing, but do it against randomly generated data instead of a few dozen hand written tests.

But is it really practical to build two independent implementations of a function F in support of testing? I think it is. Go look at the history of your code in version control. How many times have you built and rebuilt the thing? How many times was another round of changes triggered by a bug? You are going to build more than one version. You can do reactively in response to bugs, or you can do it in advance, by design. Your choice.

HTH is a conceptual approach. You can find ways to use it if you try. And its not an either-or thing. By all means, keeping writing test cases by hand. Just don’t stop there.

These thoughts were set into motion by the recent Clojure bowling problem and the ensuing discussion of the role of TDD in functional programming.

Rebinding Mathematica keys

Out of the box, Mathematica expects the user to press the SHIFT-RETURN to trigger the evaluation of an expression. Because of my hard-wired emacs reflexes, I frequently type CONTROL-RETURN instead. Today I finally did something about that. I rebound the SHIFT-RETURN sequence.

Mathematica does not make that easy to do. I found nothing in the documentation that even suggested rebinding keys is possible. Much digging around in newsgroups led me to suspect that the answer lay in the file /usr/local/Wolfram/Mathematica/7.0/SystemFiles/FrontEnd/TextResources/X/ I made the following changes:

(* Item[KeyEvent["Return", Modifiers -> {Shift}], "HandleShiftReturn"], *)
    Item[KeyEvent["Return", Modifiers -> {Control}], "HandleShiftReturn"],

(* Item[KeyEvent["Return", Modifiers -> {Control}], "NewRow"], *)
    Item[KeyEvent["Return", Modifiers -> {Shift}], "NewRow"], 

No joy. A bit more digging and I discovered that I had to modify two files, the second being /usr/local/Wolfram/Mathematica/7.0/SystemFiles/FrontEnd/TextResources/X/ where I made the following change:

(* MenuItem["&Evaluate Cells", "HandleShiftReturn", MenuKey["Return", Modifiers->{"Shift"}]], *)
   MenuItem["&Evaluate Cells", "HandleShiftReturn", MenuKey["Return", Modifiers->{"Control"}]],

Ah, sweet relief.

The two envelope paradox

I’m a sucker for probability puzzles. Here’s an interesting and unusual one I found on the site of Amos Storkey:

You are taking part in a game show. The host introduces you to two envelopes. He explains carefully that you will get to choose one of the envelopes, and keep the money that it contains. He makes sure you understand that each envelope contains a cheque for a different sum of money, and that in fact, one contains twice as much as the other. The only problem is that you don’t know which is which.

The host offers both envelopes to you, and you may choose which one you want. There is no way of knowing which has the larger sum in, and so you pick an envelope at random (equiprobably). The host asks you to open the envelope. Nervously you reveal the contents to contain a cheque for 40,000 pounds.

The host then says you have a chance to change your mind. You may choose the other envelope if you would rather. You are an astute person, and so do a quick sum. There are two envelopes, and either could contain the larger amount. As you chose the envelope entirely at random, there is a probability of 0.5 that the larger check is the one you opened. Hence there is a probability 0.5 that the other is larger. Aha, you say. You need to calculate the expected gain due to swapping. Well the other envelope contains either 20,000 pounds or 80,000 pounds equiprobably. Hence the expected gain is 0.5×20000+0.5×80000-40000, ie the expected amount in the other envelope minus what you already have. The expected gain is therefore 10,000 pounds. So you swap.

Does that seem reasonable? Well maybe it does. If so consider this. It doesn’t matter what the money is, the outcome is the same if you follow the same line of reasoning. Suppose you opened the envelope and found N pounds in the envelope, then you would calculate your expected gain from swapping to be 0.5(N/2)+0.5(2N)-N = N/4, and as this is greater than zero, you would swap.

But if it doesn’t matter what N actually is, then you don’t actually need to open the envelope at all. Whatever is in the envelope you would choose to swap. But if you don’t open the envelope then it is no different from choosing the other envelope in the first place. Having swapped envelopes you can do the same calculation again and again, swapping envelopes back and forward ad-infinitum. And that is absurd.

That is the paradox. A simple mathematical puzzle. The question is: What is wrong? Where does the fallacy lie, and what is the problem?

Storkey proposes a somewhat involved Bayesian explanation of what is wrong. I think a much simpler explanation is possible.

Let’s go back and examine the player’s reasoning. He opened the first envelope and found 40,000 pounds. Then as Storkey puts it “You need to calculate the expected gain due to swapping. Well the other envelope contains either 20,000 pounds or 80,000 pounds equiprobably.”

No. The player is in no position to calculate expected values because he does not know what the sample space is. He does not know that 20,000 and 80,000 pounds are both possible, much less equiprobable.

To make this concrete, imagine we are the hosts, and we prepared the envelopes. We put 20,000 in one, 40,000 in the other. The player observes 40,000 in the first envelope. If from that he infers that there is a fifty percent chance of 80,000 in the other envelope he is just mistaken. There is no chance of it.

The player cannot compute expectations without knowing the sample space. We, the hosts, know the sample space and can easily compute expected values. The player has two pure strategies available, call them Hold and Swap. A player using the Hold strategy opens the envelope and then does not swap. The expected value of this strategy is 30,000. A player using the Swap strategy opens the first envelope and then swaps. The expected value of this strategy is 30,000. Any mixed strategy will be a weighted sum of Hold and Swap, and hence will have expected value of 30,000. Observing the first value does not provide the player any useful information. He cannot perform conditional probability computations of the form P(Y|X=40,000) because he does not know what values of Y are admissible.

Too lazy to flip coins

I was helping my daughter with some math homework; one of the problems was to flip three coins and tabulate the number of heads over the course of eighty trials. This struck me as excessively tedious, so I suggested we write a computer simulation instead.

Here it is, in its full glory:

coin := Random[Integer, {0, 1}]
flips := coin + coin + coin
Length /@ Split @ Sort @ Table[flips, {80}]

The code is written in the Mathematica language. The first two lines should be self explanatory. The last line is the workhorse; I'll illustrate how it works, but with only twenty trials so that results fit the screen better.

The code should be read from right to left. Table[flips, {20}] produces a list of twenty values of flip, something like this:

{2, 2, 0, 0, 3, 1, 0, 2, 1, 2, 2, 1, 2, 0, 3, 0, 0, 1, 1, 1}

This is then sorted: Sort @ Table[flips, {80}] resulting in

{0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3}

The Split operator breaks the above into runs of identical elements

{{0, 0, 0, 0, 0, 0}, {1, 1, 1, 1, 1, 1}, {2, 2, 2, 2, 2, 2}, {3, 3}}

I then map the Length operator over the list of runs, resulting in tabulated counts:

{6, 6, 6, 2}

The use of @ and /@ for function application and mapping, respectively, is syntactic genius. It doesn't get much sweeter than that.

Wolfram recently started licensing Mathematica for home use at $300. It is a bargain at that price. The academic version costs even less, somewhere around $150. The academic and the home versions are functionally identical to the much more expensive commercial version. Grab a copy and you are armed to go do some Project Euler problems.

x + x = 2x

This post is the second in a series. The first is here. The plan for this post is to introduce just enough notation to define some functions in the Mathematica language, and then use Mathematica to solve a not-entirely-trivial problem.

The central idiom in functional programming is applying a function f to an argument x. Mma offers several notations for doing so. I’ll illustrate the two I use most. By far the most common way to write f applied to x is


Notice the use of square brackets, not parentheses. In Mma, parentheses are used to control operator precedence, nothing else. The most common notational alternative to f[x] is

f @ x

This form is nice when chaining a series of functional applications. It is easier to write h @ g @ f @ x than h[ g[ f[x] ] ]. The whitespace is optional; I could have written h@g@f@x. The square bracket notation is more convenient with functions of several arguments.

The next bit of notation illustrates how to define functions. I’ll define a couple of example functions, double and recip. Their meaning should be clear.

double[x_] := 2 * x 

recip[x_] := 1/x 

Notice the underscore following the parameter. It needs to be there. I’ll say more about that shortly. For now just go with it.

Most functional languages put a lot of stock in working with lists. Mma is no different; lists abound. Mma lists are delimited by curly braces, and have comma separated values. Here is a list of integers: {2, 3, 7, 13}. The handy built-in function Range generates lists of integers. For instance, Range[1, 3] returns { 1, 2, 3 }.

One last bit of notation and we are ready to go. Here are two equivalent ways to map the function recip over the elements of a list.

recip /@ Range[1, 3]

Map[recip, Range[1, 3]]

The value of the expression above is {recip[1], recip[2], recip[3]} which is to say { 1, 1/2, 1/3 }.

As an aside, notice that Mma returned exact rational numbers rather than approximate floating point values.

The problem I have chosen as my example is the exercise 8 in Concrete Mathematics, by Graham, Knuth, and Patashnik. Great book, by the way.

The problem is to solve a recurrence. That means that we are given a recursive definition of how to compute the terms of a sequence, and need to figure out an explicit formula for the nth term. You have probably already seen recurrences, even if they weren’t called recurrences. Here’s an easy one, as a warm up.

T0 = 1
Tn = 2 Tn-1

Notice the form of the problem. The first equation is a boundary value which acts as a stopping condition on the recursion described by the second equation. So what does it mean to solve the recurrence? We have to provide an explicit formula for the nth term. A bit of thought will convince you that the solution is

Tn = 2 n

Knuth et alia propose a somewhat harder recurrence. Here it is.

T0 = a
T1 = b
Tn = (1+Tn-1)/Tn-2

How does one start to solve a recurrence? One very natural thing to do is to generate the first few terms of the sequence and look for a pattern. We’ll let Mma do the heavy lifting. Here’s the code.

t[0] := a
t[1] := b
t[n_] := (1 + t[n - 1])/t[n - 2]

The code neatly mirrors the mathematics. Notice that the definition of the function t occurs in three parts, rather than as a single expression containing conditional logic. This style of definition is used in other languages, Erlang for instance, and is a joy to use. I really miss it when it is not available.

I promised to say something about the notation of an underscore following a parameter, as after the n in the third clause of the definition of t. Roughly, the underscore indicates that we are using n as a parameter. Without the underscore, we would be defining the action of t on the symbol n. In the first two lines we define t on 0 and 1.

What’s that you say? I did not assign values to a and b? Hold that thought.

Let’s generate a few terms of the sequence. Evaluating t /@ Range[0,4] gives the following:

Yow, continued fractions. Maybe we can simplify those numerators. There’s a handy built-in called Simplify for simplifying algebraic expressions. Let use it. We’ll map Simplify over the previous result by evaluating Simplify /@ t /@ Range[0, 4]. Here’s the result.

That looks better, let’s get a few more terms. Evaluating Simplify /@ t /@ Range[0, 9] we get

The recurrence turns out to be cyclic. There are only five distinct terms which repeat periodically. So the solution is that Tn = Tn mod 5

with T0 = a, T1 = b, etc, as above.

What is x + x ? Algebra says it’s 2x, no matter what the value of x. Most programming languages treat x not as a symbol, but as an identifier for some region of memory, and demand that the memory be initialized. In that world x + x is undefined unless x is given a value. It doesn’t have to be that way. You learned how to work with symbols in grade school. Shouldn’t your computer language be able to do the same?

In Mathematica, a symbol can be assigned a value, or it can just be a symbol. I did not have to give either a or b values to be able to work with algebraic expressions involving a and b. That is a very powerful feature. It makes it possible for Mathematica to perform symbolic mathematics such as calculus. Or simple algebra. For instance, we could expand (a + b +c)^3 as follows:

Expand[(a + b + c)^3]

The result is a^3 + 3 a^2 b + 3 a b^2 + b^3 + 3 a^2 c + 6 a b c + 3 b^2 c + 3 a c^2 + 3 b c^2 + c^3.

The next installment in this series will look at the structure of mma expressions. Lisp, anyone?

The unknown functional language

Functional languages are experiencing something of a renaissance, or so I gather from reading reddit and the blogs. I have been known to dabble in lisp, erlang, and haskell. Each is beautiful language in its own way. But when I actually want to get something done, not just play, I turn to either python or mathematica.

Python has a rich ecosystem and hosts of users. It’s great. But everyone knows about it. There is no need for me to go on about python. Mathematica is not widely appreciated as powerful and expressive language, so I propose to do a series of posts illustrating some extremely elegant features.

Mathematica is a product of Wolfram Research. The word Mathematica refers to both a commercial software product, and the programming language implemented by the product. I will distinguish between the product, Mathematica, and the language, which I will call mma. This is an idiosyncratic distinction, so bear with me.

I propose to talk mostly about functional programming in mma, something I’ve enjoyed doing since version 3, too many years ago. Though I have been a programmer for many years, I was trained as a mathematician. I started programming in imperative languages (pascal, c, moving later to java). When I first discovered functional programming, it was like coming home from a long exile. Mathematicians think in terms of functions. The three fundamental objects in mathematics are numbers, sets, and functions. Of the three, functions are the most fundamental. A functional language let’s me think in something close to my native language. Enough philosophy, let’s get started.

The Mathematica “shell” is called a notebook. The notebook interface includes a read-eval-print interpreter loop, but also provides very sophisticated graphics and text layout. Here’s a bit of eye candy to illustrate the point. The first illustration shows a matrix equation.

The second illustrates a topological object known as klein bottle. Actual klein bottles, alas, cannot be embedded in our puny three dimensional space. The cut-away graphics allow one to look inside.

The notebook interface implements a host of other interesting features, including a very robust help system. The notebook communicates with the Mathematica kernel via a protocol called mathlink. This enables Mathematica to effectively separate the user interface from the system’s computational kernel. It also allows the notebook interface to run on one machine, a notebook say, and the kernel to run on some other (perhaps more powerful) machine.

So much for the intro. In the next installment, we get down to business.