Capturing Iteration Variables in Go Language

15 May 2017

Named functions can be declared only at the package level, but we can use a function literal to denote a function value within any expression. A function literal is written like a function declaration, but without a name following the func keyword. It is an expression, and its value is called an anonymous function.

Function literals let us define a function at its point of use. As an example, the call to strings.Map can be rewritten as

strings.Map(func(r rune) rune { return r + 1}, "HAL-9000")

More importantly, functions defined in this way have access to the entire lexical environment, so the inner function can refer to variables from the enclosing funciton, as this example shows:

package main

import "fmt"

// squares returns a function that returns
// the next square number each time it is called.
func squares() func() int {
	var x int
	return func() int {
		x++
		return x * x
	}
}

func main() {
	f := squares()
	fmt.Println(f()) // "1"
	fmt.Println(f()) // "4"
	fmt.Println(f()) // "9"
	fmt.Println(f()) // "16"
}

The function squares returns another function, of type func() int. A call to squares creates a local variable x and returns an anonymous function that, each time it is called, increments x and return its square. A second call to squares would create a second variable x and return a new anonymous function with increments that variable.

The squares example demonstrates that function values are not just code but can have state. The anonymous inner function can access and update the local variables of the enclosing function squares. These hidden variable references are why we classify functions as reference types and why function values are not comparable. Function values like these are implemented using a technique called closures, and Go programmerrs often use this term for function values.

Here again we see an example where the lifetime of a variable is not determined by its scope: the variable x exists after squares has returned within main, even though x is hidden inside f.

package main

import "fmt"

func main() {
	var fb func(int) int
	fb = func(n int) int {
		if n == 0 || n == 1 {
			return 1
		}
		return fb(n-1) + fb(n-2)
	}
	// 0:1 1:1 2:2 3:3 4:5 5:8 6:13 7:21 8:34 9:55
	for n := 0; n < 10; n++ {
		fmt.Printf("%d:%d ", n, fb(n))
	}
}

When an anonymous function requires recursion, as in this example, we must first declare a variable, and then assign the anonymous function to that variable. Had these two steps been combined in the declaration, the function literal would not be within the scope of the variable fb so it would have no way to call itself recursively:

	// var fb func(int) int
	fb := func(n int) int {
		if n == 0 || n == 1 {
			return 1
		}
		return fb(n-1) + fb(n-2) // compile error: undefined: fb
	}

In this section, we’ll look at a pitfall of Go’s lexical scope rules that can cause surprising results. We urge you to understand the problem before proceeding, because the trap can ensnare even experienced programmers.

Consider a program that must create a set of directories and later remove them. We can use a slice of function values to hold the clean-up operations. (For brevity, we have ommited all error handling in this example.)

	var rmdirs []func()
	for _, d := range tempDirs() {
		dir := d               // NOTE: necessary!
		os.MkdirAll(dir, 0755) // creates parent directories too
		rmdirs = append(rmdirs, func() {
			os.RemoveAll(dir)
		})
	}
	// ...do some work...
	for _, rmdir := range rmdirs {
		rmdir() // clean up
	}

You may wondering why we assigned the loop variable d to a new local variable dir within the loop body, instead of just naming the loop variable dir as in this subtly incorrect variant:

	var rmdirs []func()
	for _, dir := range tempDirs() {
		os.MkdirAll(dir, 0755)
		rmdirs = append(rmdirs, func() {
			os.RemoveAll(dir) // NOTE: incorrect!
		})
	}

The reason is a consequence of the scope rules for loop variables. In this program immediately above, the for loop introduces a new lexical block in which the variable dir is declared. All function values created by this loop “capture” and share the same variable—an addressable storage location, not its value at that particular moment. The value of dir is updated in successive iterations, so by the time the cleanup functions are called, the dir variable has been updated serval times by the now-completed for loop. Thus dir holds the value from the final iteration, and consequently all calls to os.RemoveAll will attempt to remove the same directory.

Frequently, the inner variable introduced to work around this problem—dir in out example—is given the exact same name as the outer variable of which it is a copy, leading to odd-looking but crucial variable declarations like this:

	for _, dir := range tempDirs() {
		dir := dir // declares inner dir, intialized to outer dir
        // ...
	}

The rist is not uique to range-based for loops. The loop in the example below suffers from the same problem due to unitended capture of the index variable i.

	var rmdirs []func()
	dirs := tempDirs()
	for i := 0; i < len(dirs); i++ {
		os.MkdirAll(dirs[i], 0755) // OK
		rmdirs = append(rmdirs, func() {
			os.RemoveAll(dirs[i]) // NOTE: incorrect!
		})
	}

The problem of iteration variable capture is most often encountered when using the go statement or with defer since both may delay the execution of a function value until after the loop has finished. But the problem is not inherent to go or defer.

References

Alan A. A. Donovan, Brian W. Kernighan. The Go Programming Language, 2015.11.
Blocks, declarations and scope, The Go Programming Language Specification.
Anonymous functions and closures