Here are steps to get Apache Spark running. Basically it is to install Python, JDK, and then the Spark packages. Note that I am using Arch Linux on my UDOO.
Get Python 2.7, not 3. The easiest way is to install it via pacman:
$ sudo pacman -Sy python2
Apache Sparks runs on Scala, which in turn requires a JVM. By default, most Linux distributions will install OpenJDK. However, the performance of OpenJDK Zero VM really lags behind when compared with Oracle's HotSpot engine. So we are going to get Oracle VM running on UDOO first.
Go ahead and follow the ArchWiki page to install OpenJDK 8. Although we don't want to use it, we want the Java environment to be setup properly. When done, you will find the OpenJDK VM installed under a subfolder in /usr/lib/jvm.
Then, go to Oracle's Java site, accept the license and download the JDK for ARM (I downloaded the 1.8.0_51 version). Unpack the tar.gz file and place the whole package under /usr/lib/jvm in its own folder (e.g. jdk1.8.0_51). The Arch Linux Java configuration should pick it up:
$ sudo archlinux-java status Available Java environments: java-8-openjdk (default) jdk1.8.0_51
Run the follow commands to switch to Oracle's VM:
$ sudo archlinux-java set jdk1.8.0_51 $ sudo archlinux-java status Available Java environments: java-8-openjdk jdk1.8.0_51 (default) $ java -version java version "1.8.0_51" Java(TM) SE Runtime Environment (build 1.8.0_51-b07) Java HotSpot(TM) Client VM (build 25.51-b07, mixed mode)
The preparation is done! Now go to Apache Spark website and download the pre-built package (I downloaded the 1.4.0 version, pre-built for Hadoop 2.6 and later. Some said it is better to use the Hadoop 2.4 pre-built...).
Unpack the file. Try to run the SparkPi example:
$ cd spark-1.4.0-bin-hadoop2.6 $ bin/run-example SparkPi 10
Among all the log messages, you should see an *estimation* of the Pi value:
Pi is roughly 3.14258
The Apache Spark should be fully functional by now. You can go ahead to try the Spark shell and pySpark shell etc.