This is my review of Coursera’s Machine Learning course, as taught by Andrew Ng of Stanford University. Machine Learning (hereafter, ML) was my first exposure to the world of massive open online courses. I had heard good things about the ML course, so I decided to take the plunge.

I was very curious about one thing. A course on machine learning would presumably require the prospective student to have some level of mathematical sophistication, and some ability to program. I have known very good mathematicians who could not program at all, and some very good programmers who were either indifferent to or afraid of math. It seemed unlikely that the ML course would be populated solely by students skilled in both mathematics and programming. I had heard that Professor Ng was a gifted instructor; I wanted to see how he would tackle the difficult task of teaching both math and programming to a large audience of widely varying backgrounds.

The ML course was ten weeks long. Each week featured multiple brief video lectures, say between 5 and 20 minutes long. Along with the lectures, there were weekly quizes and programming assigments. The programming language for the course was GNU Octave. There was also a discussion forum where one could ask questions, talk about the assignments, etc.

Here is a brief outline of the topics covered.

  • linear regression, 1 variable
  • linear regression, multiple variables
  • logistic regression
  • neural networks
  • design of machine learning systems
  • support vector machines
  • clustering
  • dimensionality reduction
  • anomaly detection

Along the way we had a very brief review of some ideas from linear algebra, some tutorials on the Octave language, and many valuable protips (practical advice from a seasoned machine learning practitioner).

A typical (supervised) machine learning problem represents its data as points in a high dimensional space. The task of identifying the points with some property P is cast as the problem of finding a surface (a plane, for instance) that separates the points with property P from those without. Typically there is no such surface, so the problem is to constuct an appropriate error cost function, and then find a surface that minimizes that cost. Minimization is typically handled through some flavor of gradient descent.

Professor Ng’s approach was very pragmatic. In the time available he could not begin to teach the mathematical methods of optimization, particularly in a class where neither calculus nor linear algebra was a prerequisite. Instead he presented plausible appeals to geometric intuition, and urged us to trust that appropriate Octave librabaries could be used to perform the minimizations. He put practice ahead of theory, which I believe is the correct pedagogical approach.

The heart of the course was a collection of extremely well thought out programming problems. Professor Ng and his staff prepared problem sets, presented as detailed pdf instructions along with enough Octave scaffolding so that the student programmer only had to consider the essential core of the task at hand. The grunt work was already taken care of. The problems were graded by a well tuned automated system. You submit answers to it, it lets you know whether your responses were correct.

My overall assesment? The class was outstanding. Professor Ng’s reputation is well deserved. My thanks and congratulation to him and to everyone involved. A formidable amount of IT support was required to construct the course, the forums, the video production, and the automated grading. A lot of people did good work there, they deserve recognition.

If you are interested in machine learning, the course is offered again starting Oct 14th. If you decide to go for it, here is a bit of advice.

First, brush up on your linear algebra. Nothing fancy is required, just matrix multiplication. I was amazed at the clarity that framing neural networks in terms of matrices produced. What had been a tangle of nested loops and multiple indices became much simpler. Simpler to work with, and simpler to understand.

Second, and this applies especially to programmers, take the time to learn the rudiments of Octave. Time invested up front on learning Octave, especially its methods for constructing and manipulating arrays, will be repaid in time saved during the assigments. Here’s how Doug Crockford puts it in “JavaScript: the Good Parts”

Most people … don’t even bother to learn JavaScript first, and then they are surprised when JavaScript turns out to have significant differences from SOME OTHER LANGUAGE they would rather be using, and that those differences matter.

Don’t be that guy.