Claus Brod: Closing in on closures (23 Feb 2006)

Closing in on closures (23 Feb 2006)

The other day, I battled global variables in Lisp by using this construct:

(let ((globalFoo 42))
    (defun foobar1()
      (* globalFoo globalFoo))

    (defun foobar2(newVal)
      (setf globalFoo newVal))
  )

globalFoo is neither declared nor bound within the functions foobar1 or foobar2; it is a free variable. When Lisp encounters such a variable, it will search the enclosing code (the lexical environment) for a binding of the variable; in the above case, it will find the binding established by the let statement, and all is peachy.

globalFoo's scope is limited to the functions foobar1 and foobar2; functions outside of the let statement cannot refer to the variable. But we can call foobar1 and foobar2 even after returning from the let statement, and thereby read or modify globalFoo without causing a runtime errors.

Lisp accomplishes this by creating objects called closures. A closure is a function plus a set of bindings of free variables in the function. For instance, the function foobar1 plus the binding of globalFoo to a place in memory which stores "42" is such a closure.

To illustrate this:

> (load "closure.lsp")  ;; contains the code above
T
> globalFoo     ;; can we access the variable?
*** Variable GLOBALFOO is unbound
> (foobar1)     ;; we can't, but maybe foobar1 can
1764
> (foobar2 20)  ;; set new value for globalFoo
20
> (foobar1)
400

Hmmm - what does this remind you of? We've got a variable which is shared between two functions, and only those functions have access to the variable, while outside callers have not... he who has never tried to encapsulate data in an object shall cast the first pointer!

So this is how closures might remind us of objects. But let's look at it from a different angle now - how would we implement closures in conventional languages?

Imagine that while we invoke a function, we'd keep its parameters and local variables on the heap rather than on the stack, so instead of stack frames we maintain heap frames. You could then think of a closure as:

A function pointer referring to the code to be executed
A set of references to frames on the heap, namely references to all bindings of any free variables which occur in the code of the function.

Because the "stack" frames are actually kept on the heap and we are therefore no longer obliged to follow the strict rules of the hardware stack, the contents of those frames can continue to live even beyond the scope of the executed function.

So we're actually storing a (partial) snapshot of the execution context of a function, along with the code of the function!

Let's see how we could implement this. The first obvious first-order approximation is in C++; it's a function object. A function object encapsulates a function pointer and maybe also copies of parameters needed for the function call:

  typedef bool (*fncptr)(int, float);
  fncptr foobar_fnc; // declaration

  class FunctionObject {
  private:
    int m_i;
    float m_f;
    fncptr m_fnc;
  public:
    FunctionObject(fncptr fnc, int i, float f) : m_fnc(fnc), m_f(f), m_i(i) {}
    bool operator() { m_fnc(m_i, m_f); }
  };

  FunctionObject fo(foobar_fnc, 42, 42.0);

FunctionObject captures a snapshot of a function call with its parameters. This is useful in a number of situations, as can be witnessed by trying to enumerate the many approaches to implement something like this in C++ libraries such as Boost; however, this is not a closure. We're "binding" function parameters in the function object - but those are, in the sense described earlier, not free variables anyway. On the other hand, if the code of the function referred to by the FunctionObject had any free variables, the FunctionObject wouldn't be able to bind them. So this approach won't cut it.

There are other approaches in C++, of course. For example, I recently found the Boost Lambda Library which covers at least parts of what I'm after. At first sight, however, I'm not too sure its syntax is for me. I also hear that GCC implements nested functions:

typedef void (*FNC)(void);

FNC getFNC(void)
{
  int x = 42;
  void foo(void)
  {
    printf("now in foo, x=%d\n", x);
  }
  return foo;
}

int main(void)
{
  FNC fnc = getFNC();
  fnc();
  return 0;
}

Unfortunately, extensions like this didn't make it into the standards so far. So let's move on to greener pastures. Next stop: How anonymous delegates in C# 2.0 implement closures.

Revision: r1.5 - 26 Feb 2006 - 19:51 - ClausBrod

Blog > BlogOnSoftware20060223