Associativity in Semigroups in Scala: Algebird, Cats, and Scalaz

Algebird, Cats, and Scalaz are Scala libraries that provide, among other things, type classes that are based on algebraic constructs, such as groups, rings, and monoids. Contrary to what I had initially believed, perhaps naively, these libraries do not perform compile- (or run-)time checks on the laws governing these algebraic structures, which means it’s possible to create a class that is an instance of such a type class, but it violates the laws of that type class.

Setup

If you want to run the following code interactively on your computer, you can set up a minimal sbt-based sandbox. That way, you can then either copy-paste the code in the REPL after running sbt console, or you can use a Scala worksheet in IntelliJ IDEA. Note that if you copy-paste it in the sandbox’s REPL, you need to use :paste mode for case classes and their companion objects.

Of course you can also create a standard sbt project in your favourite editor or IDE, which I have already done over here. After you have cloned the repository, you have the following commands at your disposal:

  • sbt dependencyUpdates: to list all updates to dependencies specified in build.sbt, which is helpful if you want to come back to the code after a while but want to use the lastest versions of all libraries without having to search for each libarary on Scaladex;
  • sbt build, which is an alias for sbt clean scalastyle compile test.

You can also automatically re-format the code from the command line according to the configuration provided in .scalafmt.conf.

Semigroups in Algebird

A semigroup is a set with an associative binary operator that is closed. What’s important to know is that Algebird does not check closure and associativity under the binary operation at compile-time (or run-time). How do I know?

import com.twitter.algebird.Semigroup

case class NotASemigroup(i: Int)

object NotASemigroup extends Semigroup[NotASemigroup] {
  override def plus(l: NotASemigroup, r: NotASemigroup): NotASemigroup =
    NotASemigroup(if (l.i == 1) l.i + r.i else l.i * r.i)
}

NotASemigroup satisfies the closure law, courtesy of Scala’s type system: multiplying two integers leads to another integer, and so does addition. After all, Int.MaxValue + 1 == Int.MinValue).

Associativity does, however, not hold:

((1, 2), 3) -> ((1 + 2), 3) -> (3, 3) -> 9
(1, (2, 3)) -> (1, (2 * 3)) -> (1, 6) -> 7

Here, I’ve dropped the NotASemigroup wrapper for clarity. The snippet compiles, and without a unit test the code would happily run never knowing it’s not a semigroup, which we can check in a unit test:

import org.scalacheck.Prop.forAllNoShrink
import org.scalatest.FlatSpec
import org.scalatest.prop.Checkers.check
import org.scalatest.prop.GeneratorDrivenPropertyChecks

class NotASemigroupSuite extends FlatSpec with GeneratorDrivenPropertyChecks {

  "NotASemigroup's binary operation" should "be closed" in {
    check {
      forAllNoShrink((i1: Int, i2: Int) => {
        val (a1, a2) = (NotASemigroup(i1), NotASemigroup(i2))
        NotASemigroup.plus(a1, a2).isInstanceOf[NotASemigroup]
      })
    }
  }

  it should "be associative" in {
    check {
      forAllNoShrink((i1: Int, i2: Int, i3: Int) => {
        val (a1, a2, a3) = (NotASemigroup(i1), NotASemigroup(i2), NotASemigroup(i3))
        val left = NotASemigroup.plus(NotASemigroup.plus(a1, a2), a3)
        val right = NotASemigroup.plus(a1, NotASemigroup.plus(a2, a3))
        left == right
      })
    }
  }
}

This spits out something like this:

...
[info] NotASemigroupSuite:
[info] NotASemigroup's binary operation
[info] - should be closed
[info] - should be associative *** FAILED ***
[info]   GeneratorDrivenPropertyCheckFailedException was thrown during property evaluation.
[info]    (NotASemigroupSuite.scala:18)
[info]     Falsified after 0 successful property evaluations.
[info]     Location: (NotASemigroupSuite.scala:18)
[info]     Occurred when passed generated values (
[info]       arg0 = 2147483647,
[info]       arg1 = 2147483647,
[info]       arg2 = -1
[info]     )
...

The reason I use forAllNoShrink rather than forAll is that the latter shrinks values that fail tests, which in the case of integers means it’ll reduce these. The final values may actually lead to tests succeeding (by coincidence), even though the test itself fails. To avoid this confusion, I disallow shrinkage of arbitrary values.

There is the algebird-test module, which exposes a few helpers for testing, to make it less explicit. Unfortunately, it does not (yet) include the semigroup out of the box.

You can therefore extend Semigroup[T] to your heart’s desire, but you still need to make sure your class actually forms a semigroup with respect to the binary operator defined in the companion object. Algebird does not do that for you!

Semigroups in Cats and Scalaz

Lest you think Algebird is the only library that does not perform out-of-the-box checks on associativity (as closure is trivially satisfied thanks to the type system), Cats and Scalaz behave in exactly the same way. The documentation for Cats does not say so, but Scala Exercises does:

This operation must be guaranteed to be associative.

In the repository you can find the full code to define a semigroup (and the corresponding unit test) with Cats:

object NotASemigroupCats {
  implicit val binaryOp: cats.Semigroup[NotASemigroup] =
    new cats.Semigroup[NotASemigroup] {
      override def combine(l: NotASemigroup, r: NotASemigroup): NotASemigroup =
        NotASemigroup(if (l.i == 1) l.i + r.i else l.i * r.i)
    }
}

You can then combine instances of NotASemigroup with |+| after you’ve imported cats.syntax.semigroup._ too.

The equivalent Scalaz code is as follows:

object NotASemigroupScalaz {
  implicit val binaryOp: scalaz.Semigroup[NotASemigroup] =
    new scalaz.Semigroup[NotASemigroup] {
      override def append(l: NotASemigroup, r: => NotASemigroup): NotASemigroup =
        NotASemigroup(if (l.i == 1) l.i + r.i else l.i * r.i)
    }
}

Again, |+| is available after you have imported scalaz.syntax.semigroup._. Please observe that the binary operator is called plus (Algebird), combine (Cats), or append (Scalaz).

There is a difference when compared to Algebird though: Cats has convenient, built-in checks that are wrappers around ScalaCheck. You still have to create a test class, but you do not have to remember and correctly implement the laws yourself. These can be used to automatically check the laws governing the type classes, so you don’t have to write these out yourself:

class SemigroupLawsSuite extends CatsSuite {

  import NotASemigroupCats._
  import cats.kernel.laws.discipline.SemigroupTests

  implicit val arbNotASemigroup: Arbitrary[NotASemigroup] = Arbitrary {
    Arbitrary.arbInt.arbitrary.map(NotASemigroup(_))
  }
  implicit val eqNotASemigroup: cats.kernel.Eq[NotASemigroup] =
    cats.kernel.Eq.fromUniversalEquals

  checkAll("Semigroup laws for NotASemigroup (Cats)", SemigroupTests[NotASemigroup].semigroup)
}

The implicit values are needed by SemigroupTests[T].semigroup.

Scalaz offers similar functionality for Specs2 but as of this writing not yet for ScalaTest. I am personally not a huge fan of Specs2, so please fiddle around with it yourself.