Playing with Spark in sbt
Ian Hellström | 13 January 2017 | 1 min read
Unless you have a cluster with Apache Spark installed on it at your disposal, you may want to play a bit with Spark on your own machine. The standard VMs or docker images (e.g. Cloudera, Hortonworks, IBM, MapR, Oracle) do not offer the latest and greatest. If you really want the bleeding edge of Spark, you have to install it locally yourself, roll your own Docker container, or simply use sbt.
With sbt available, create a folder in which you can play around, your ‘sandbox’.
I’ll assume you have created the folder under
On Windows, also create a sub-folder inside it for Spark’s so-called warehouse directory.
Let’s call that sub-folder ‘warehouse’.
All you have to do now is create a very simple
build.sbt file that has the following contents:
name := "sandbox" version := "0.0.1" scalaVersion := "2.11.8" val sparkVersion = "2.1.0" val grpId = "org.apache.spark" libraryDependencies ++= Seq(grpId %% "spark-core" % sparkVersion, grpId %% "spark-sql" % sparkVersion, grpId %% "spark-streaming" % sparkVersion, grpId %% "spark-mllib" % sparkVersion, grpId %% "spark-graphx" % sparkVersion)
Now you can simply run
sbt console and, sbt is going to download all the packages you need to run Spark in your command line.
A minimal setup that’s a bit like the Spark shell itself can be achieved as follows:
import org.apache.spark._ import org.apache.spark.sql._ val conf = new SparkConf() val sc = new SparkContext("local[*]", "sandbox", conf) val ss = SparkSession .builder .config("spark.sql.warehouse.dir", "/path/to/sandbox/warehouse") .getOrCreate() import ss.implicits._ // Add your code here!
It is important that you add the Spark warehouse directory option on Windows machines because otherwise Java will complain about relative paths in absolute URIs, which is caused by this bug.
Of course, this does not provide a complete development environment replete with Hadoop or any other software. It does, however, provide a very simple way to tinker with Spark in the comfort of a command line or your favourite IDE.