Passing Functions in Scala

In Scala there are multiple ways to pass one function to another: by value, by name, and as a function, either anonymous or defined. The differences are subtle, and, if you’re not careful, you may end up with surprising results.

Here’s an example to illustrate how an insidious bug can easily seep into your code. Let’s compute a list of UUIDs by passing a function that generates UUIDs as an argument. Why would we want to do this and not simply write (1 to max).toList.map(_ => java.util.UUID.randomUUID.toString)? We might want to have a configurable generation function in case we want to write unit tests that do not rely on a pseudo-random representation, or it may be difficult to refactor the code. Anyway, here is some sample code you can run in the REPL:

val max = 10

def id: String = java.util.UUID.randomUUID.toString

def alternative1(u: String) = (1 to max).toList.map(_ => u)         // by value
def alternative2(u: => String) = (1 to max).toList.map(_ => u)      // by name
def alternative3(u: () => String) = (1 to max).toList.map(_ => u())

This defines a bunch of functions that we can now call:

alternative1(id).distinct.length       // 1
alternative2(id).distinct.length       // max
alternative3(id _).distinct.length     // max
alternative3(() => id).distinct.length // max

In Scala def f(): SomeType and def f: SomeType are equivalent: you may leave off the empty parentheses. It’s often customary to keep the parentheses if the function f has side effects. The idea behind this quirky rule is referential transparency: if a function has no side effects, any invocation of the function can be replaced by its result without affecting the program’s behaviour. A function without a parameter list is therefore semantically equivalent to a value. This is the reason you can type "Hi!".length rather than "Hi!".length() even though length is really a method (i.e. function defined in a class). This convention supports the uniform access principle: def, var, and val are accessed in the same way.

Note that functions with an empty parameter list can be treated as functions without a parameter list. Both are function of 0-arity. Had we defined id with an empty parameter list — def id(): String — we could have called it as either id() or id. In its current incarnation we cannot write id() though!

When you pass an argument by value, which is the default behaviour in Scala, its value is evaluated before you evaluate any of the function’s body. In the case of alternative1 this means that the UUID is generated upon entering the function, and it is constant inside the entire function body. As you can see, this means that the number of distinct UUIDs is only 1 because it is the same for all elements. If we were to pass on the argument u to additional functions inside our alternatives, these would all share the same value!

In alternative2, the argument is passed by name, which is indicated by the => in front of the type annotation. This means that its value won’t be evaluated until it is requested in the body, which may of course be inside another function. This means that there will be as many distinct UUIDs as there are elements, because id is evaluated for each element in the list.

The function alternative3 defines the input as a function, which means that it is not evaluated until called. The way id is passed is known as a function literal. A function literal is like any other literal type: ‘A’ is a Char literal, 42 is an Int literal, and so on. Likewise, an expression such as (i: Int) => 2*i is a function literal. Such literals can be used as anonymous values, that is, they are not necessarily bounded to a named variable (or value). For code that is not (supposed to be) reusable that’s quite handy. The partially applied function notation id _ turns that expression into () => id, which is also sometimes referred to as an anonymous function or lambda. If you want to know more about these, please read the chapter on functions and closures in Martin Odersky’s book.

What would happen if we defined id as a val (value) rather than a def (function)? It would be evaluated once (at the declaration site) and therefore make all subsequent calls identical, even if we passed the value by name: they would all have only one distinct UUID. The same goes for a lazy val: it would only have been evaluated once (at the very first call site).

Would using a var make any difference? No, the variable is still evaluated only once. The difference between var and val is that the former can be reassigned whereas the latter is constant throughout the code. But calling the same variable without changing it does not make a difference: it is still the same value after initialization.

And there you have it: be careful with functions that are passed around as values, especially when you intend to evaluate these multiple times inside a function’s body.