Differences in Conversions of Java Numbers in Scala 2.11, 2.12, and 2.13

There are a few subtle changes between Scala 2.11 and 2.12/2.13 when it comes to conversions between Java and Scala types that you may not be aware of: nullable boxed primitives, such as numbers.

Comparison

When dealing with Avro records in data pipelines in Scala (e.g. with Spotify’s open-source API for Apache Beam: Scio), you often need to convert back and forth between Java and Scala types, most commonly when reading from or writing to Avro. The interoperability between Java and Scala is fairly seamless thanks to the JVM. There are, however, differences from Scala 2.11 to 2.12 that may be tricky to spot if you’re not careful because it affects boxed primitive types, such as nullable boolean and numeric types.

If you want to run the following code yourself, you can either re-use the minimal sbt-based Scala REPL, or you can use Scastie in the convenience of your web browser.

import java.lang.{Boolean => JBoolean, 
  Character => JChar,
  CharSequence => JString,  
  Double => JDouble, 
  Long => JLong}

case class B(b: Boolean)
case class C(c: Char)
case class D(d: Double)
case class L(l: Long)
case class S(s: String)

val b: JBoolean = null
val c: JChar = null
val d: JDouble = null
val l: JLong = null
val s: JString = null

Let’s now run some simple operations with these values and case classes in different Scala versions:

Snippet 2.11(.12) 2.12(.8) 2.13(.0-M5)
_ == null true true true
Option(_) None None None
b.booleanValue NPE NPE NPE
c.charValue NPE NPE NPE
d.doubleValue NPE NPE NPE
l.longValue NPE NPE NPE
c.toChar NPE ? ?
d.toDouble NPE 0.0 0.0
l.toLong NPE 0 0
_.toString NPE NPE NPE
B(b) NPE B(false) B(false)
C(c) NPE C(?) C(?)
D(d) NPE D(0.0) D(0.0)
L(l) NPE L(0) L(0)
S(s) n/a n/a n/a

The wildcard ‘_’ is used to indicate the same expression is used for all five values b, c, d, l, and s. NPE stands for java.lang.NullPointerException. If it says ‘n/a’ it means it is not applicable because the code fails to compile due to type mismatches.

As you can see, the implicit conversions from Java types to Scala’s cause problems. Whereas in 2.11 the behaviour was to throw NPEs when converting boxed null values from Java to Scala, in 2.12 and onwards some of these are converted to a default value (false for booleans and zero for numbers). You can easily check that the same table holds for other number types: Byte, Short, Float, and so on.

Please note that a null ‘instance’ of java.lang.String when passed to S actually becomes S(null) even though null.asInstanceOf[java.lang.String].toString gives an NPE.

Safe Conversions

In Scala 2.11, the following function would have allowed you to capture issues with nullable Java types and wrap these in an Option:

def safeOption0[T](value: => T): Option[T] = 
  Try(value) match {
    case Success(v)    => Option(v)
    case Failure(_)    => None
  }

Because the argument is called by name rather than value (indicated by => in the parameter list), it’s not evaluated when it’s handed to the function for further processing, but rather when it’s needed within the function itself. This means value always ends up inside the Try monad, where it’s safe, even in the event of exceptions.

With it, you could have executed val safeLong: Long = safeOption0(l) and it would have given you None. In Scala 2.12 and 2.13 that does not work because of the implicit conversion needed to go from java.lang.Long to Scala’s Long. To ensure type correctness, T in safeOption0 is Long (as evidenced from the type annotation on the val), which means the argument to the function is automatically converted from java.lang.Long to Long. With the table above we can see that the argument passed to the function is essentially l.toLong, which is 0. The lack of an NPE thrown means it ends up in the first case and thus becomes Some(0).

Instead, we need a different way to handle this in Scala 2.12 and 2.13. I shall present three equivalent functions:

import scala.util.{Failure, Success, Try}

def safeOption1[T, R](value: => T)(implicit ev: T => R): Option[R] = {
  val maybeT = Try(value) match {
    case Success(null) => None
    case Success(v)    => Option(v)
    case Failure(_)    => None
  }
  maybeT.map(_.asInstanceOf[R])
}

def safeOption2[T, R](value: => T)(implicit ev: T => R): Option[R] =
  Try(value) match {
    case Success(null) => None
    case Success(v)    => Option(v)
    case Failure(_)    => None
  }

def safeOption3[T, R](value: => T)(implicit ev: T => R): Option[R] =
  try {
    if (value != null) Option(value) else None
  } catch {
    case _: Exception => None
  }

The second argument list, which reads implicit ev: T => R, states that there has to be an implicit conversion in scope from T to the return type R. We could write this with a view bound, in which case the signature is as follows: safeOption[R, T <% R](value: => T): Option[R]. View bounds have been deprecated though.

A battery of ScalaTest unit tests is easily implemented:

import org.scalatest.FlatSpec
import org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks

class SafeOptionSuite extends extends FlatSpec with ScalaCheckDrivenPropertyChecks {

  "safeOption" should "be empty when an exception is thrown during evaluation" in {
    assert(safeOption(throw new Exception("I am exceptional!")).isEmpty)
  }

  it should "be empty when given null as input" in {
    assert(safeOption(null).isEmpty)
  }

  it should "be empty for null numeric types in Java" in {
    val jInt: java.lang.Integer = null
    val jLong: java.lang.Long = null
    val jFloat: java.lang.Float = null
    val jDouble: java.lang.Double = null
    val jBoolean: java.lang.Boolean = null

    // Force implicit conversions
    val sInt: Option[Int] = safeOption(jInt)
    val sLong: Option[Long] = safeOption(jLong)
    val sFloat: Option[Float] = safeOption(jFloat)
    val sDouble: Option[Double] = safeOption(jDouble)
    val sBoolean: Option[Boolean] = safeOption(jBoolean)

    assert(sInt.isEmpty)
    assert(sLong.isEmpty)
    assert(sFloat.isEmpty)
    assert(sDouble.isEmpty)
    assert(sBoolean.isEmpty)
  }

  it should "return any valid value wrapped in an Option" in {
    forAll((i: Int) => {
      assert(safeOption(i).contains(i))
    })
  }

  it should "return any valid class wrapped in an Option" in {
    case class Data(i: Int, s: String)
    forAll((i: Int, s: String) => {
      val expected = Data(i, s)
      assert(safeOption(Data(i, s)).contains(expected))
    })
  }
}

Note that I have assumed a single method named safeOption rather than the three alternatives. So, which alternative should you pick?

Benchmarks

The third option has consistently proven to be faster with micro-benchmarks and actual production code. It also has a smaller memory footprint. The reason is obvious: safeOption3 has no unnecessary boxing and unboxing with the Try monad. That’s why I prefer that alternative, especially when running this at scale. Of course, you can rely on explicit null checks and explicit type conversions, which would be even faster. However, you often want a generic method you can rely on in different situations.

If you want to do a poor person’s benchmark on your machine, here’s some code to get you started (without the need to play with JMH):

import scala.util.Random

def elapsed[R](block: => R): Long = {
    val start = System.nanoTime()
    block
    val end = System.nanoTime()
    end - start
}

/**
  * Computes the mean while ignoring the 10 fastest and 10 slowest run times
  * 
  * @param list List of run times (nanoseconds)
  * @param mul  Multiplier to convert original list to more readable units
  */
def mean(list: List[Long], mul: Double = 1E-6): Double = {
  val sliced = list.sorted.slice(10, list.length - 10)
  mul * sliced.sum / sliced.length
}

val MaxLength = 10000

def longs: List[java.lang.Long] = {
  val jNull: java.lang.Long = null
  val list: List[java.lang.Long] = List.fill(MaxLength)(Random.nextLong.longValue)
  val withNulls: List[java.lang.Long] = list.map(i => if (i % 10 == 0) jNull else i)
  withNulls
}

val master = List.fill(100)(longs)

val opt1 = master.map(list => elapsed { list.map(safeOption1[java.lang.Long, Long](_)) })
val opt2 = master.map(list => elapsed { list.map(safeOption2[java.lang.Long, Long](_)) })
val opt3 = master.map(list => elapsed { list.map(safeOption3[java.lang.Long, Long](_)) })

(mean(opt1), mean(opt2), mean(opt3))