To give an example on progress in society I will tell you how you can compute PI using distributed computing on Spark and Raspberry PI’s on your own little local computing cluster.

Install Raspian on your Raspberry Pi’s.

Install Java on your Raspberry Pi thus:

sudo apt-get update && sudo apt-get install oracle-java7-jdk

Install ssh on your Raspberry Pi thus:

sudo apt-get install ssh

Fetch Apache Spark to each of your Raspberry Pi’s:

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.1-bin-hadoop2.tgz

Also install Spark on your master machine, in my case my Macbook Pro. The current version of Spark (1.0.1) wants all installations of Spark to be in the same folder on all of the machines, so I put them in /usr/local/spark. To be precise, I coped the unpacked spark folder structure to /usr/local and then I made a symbolic link to this folder calling it “spark”.

This is done on all Workers (Raspberry Pi’s) and the Master Machine (Macbook Pro):


sudo mv spark-1.0.1-bin-hadoop2 /usr/local/
cd /usr/local
ln -s spark-1.0.1-bin-hadoop2 spark

I then tell the Master machine which IP to export so the workers can connect to the master, this is done by: export SPARK_MASTER_IP=10.0.1.10

I can now run the “Master Start” script on the master (Macbook Pro):


./sbin/start-master.sh

and then start the workers on each of the Raspberry Pi’s:


./bin/spark-class org.apache.spark.deploy.worker.Worker spark://10.0.1.10:7077

I can now submit jobs (for instance to calculate PI using Java) to the cluster on my master machine by:


./bin/spark-submit --master spark://10.0.1.10:7077 --class org.apache.spark.examples.JavaSparkPi lib/spark-examples-1.0.1-hadoop2.2.0.jar

Or using the Python Spark version:

./bin/spark-submit --master spark://10.0.1.10:7077 examples/src/main/python/pi.py 10

I can surf to:

http://localhost:8080

To monitor my cluster.

Raspberry Pi Spark Cluster

Raspberry Pi Spark Cluster

We can now calculate pi to:

“Pi is roughly 3.131820”

and it only takes:

116.324641 seconds, now that is progress! ;)