Archive for category Computing

Calculating PI on a Raspberry Pi Spark Cluster

To give an example on progress in society I will tell you how you can compute PI using distributed computing on Spark and Raspberry PI’s on your own little local computing cluster.

Install Raspian on your Raspberry Pi’s.

Install Java on your Raspberry Pi thus:

sudo apt-get update && sudo apt-get install oracle-java7-jdk

Install ssh on your Raspberry Pi thus:

sudo apt-get install ssh

Fetch Apache Spark to each of your Raspberry Pi’s:

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.1-bin-hadoop2.tgz

Also install Spark on your master machine, in my case my Macbook Pro. The current version of Spark (1.0.1) wants all installations of Spark to be in the same folder on all of the machines, so I put them in /usr/local/spark. To be precise, I coped the unpacked spark folder structure to /usr/local and then I made a symbolic link to this folder calling it “spark”.

This is done on all Workers (Raspberry Pi’s) and the Master Machine (Macbook Pro):


sudo mv spark-1.0.1-bin-hadoop2 /usr/local/
cd /usr/local
ln -s spark-1.0.1-bin-hadoop2 spark

I then tell the Master machine which IP to export so the workers can connect to the master, this is done by: export SPARK_MASTER_IP=10.0.1.10

I can now run the “Master Start” script on the master (Macbook Pro):


./sbin/start-master.sh

and then start the workers on each of the Raspberry Pi’s:


./bin/spark-class org.apache.spark.deploy.worker.Worker spark://10.0.1.10:7077

I can now submit jobs (for instance to calculate PI using Java) to the cluster on my master machine by:


./bin/spark-submit --master spark://10.0.1.10:7077 --class org.apache.spark.examples.JavaSparkPi lib/spark-examples-1.0.1-hadoop2.2.0.jar

Or using the Python Spark version:

./bin/spark-submit --master spark://10.0.1.10:7077 examples/src/main/python/pi.py 10

I can surf to:

http://localhost:8080

To monitor my cluster.

Raspberry Pi Spark Cluster

Raspberry Pi Spark Cluster

We can now calculate pi to:

“Pi is roughly 3.131820”

and it only takes:

116.324641 seconds, now that is progress! ;)

Tags: , ,

Convert DataArray taken from a DataFrame to an Array / Vector in Julia


julia> DataFrame(CCnt=1:10,Alpha=21:30)
10x2 DataFrame:
CCnt Alpha
[1,] 1 21
[2,] 2 22
[3,] 3 23
[4,] 4 24
[5,] 5 25
[6,] 6 26
[7,] 7 27
[8,] 8 28
[9,] 9 29
[10,] 10 30

julia> samples = DataFrame(CCnt=1:10,Alpha=21:30)
10x2 DataFrame:
CCnt Alpha
[1,] 1 21
[2,] 2 22
[3,] 3 23
[4,] 4 24
[5,] 5 25
[6,] 6 26
[7,] 7 27
[8,] 8 28
[9,] 9 29
[10,] 10 30

julia> samples[:CCnt]
10-element DataArray{Int64,1}:
1
2
3
4
5
6
7
8
9
10

julia> vector(samples[:CCnt])
10-element Array{Int64,1}:
1
2
3
4
5
6
7
8
9
10

Tags:

Add / Concat / append / rbind row to Julia DataFrame

In Julia you use vcat to add or append or concatenate a row of data to a Julia DataFrame.

Example:

julia> mydf = DataFrame(X=[0:10],Y=[100:110])
11x2 DataFrame:
X Y
[1,] 0 100
[2,] 1 101
[3,] 2 102
[4,] 3 103
[5,] 4 104
[6,] 5 105
[7,] 6 106
[8,] 7 107
[9,] 8 108
[10,] 9 109
[11,] 10 110

julia> mydf = vcat(mydf,DataFrame(X=12,Y=15))
12x2 DataFrame:
X Y
[1,] 0 100
[2,] 1 101
[3,] 2 102
[4,] 3 103
[5,] 4 104
[6,] 5 105
[7,] 6 106
[8,] 7 107
[9,] 8 108
[10,] 9 109
[11,] 10 110
[12,] 12 15

Tags:

Assign a value in Perl only if a regex matches

Sometimes (especially in one-liners) you want to assign a value only if a corresponding regex (regular expression) that picks out the value matches. I.e if it has once matched you don’t want it overwritten with undef if the regex later fails on a subsequent row in your file.

This can be solved thusly:


$var = $1 if (/Correct (\d)+ %/);

The above snippet will assign $var if the regex on the right hand side matches and picks out a value (via the capturing parenthesis on the right hand side and otherwise leave it unchanged.

Tags:

Perl one-liner to calculate an average of some value in a bunch of files

A quick and dirty one-liner (depending on the length of your lines ;)) to calculate the average of a value in a bunch of files in a directory structure.

The below one line picks out a value in each file that matches the name “Logfile*.txt” in the underlying directory structure.

In the below case, the line was in the form of:

Correctly Classified Instances 37 60.6557 %

or

Correctly Classified Instances 37 60 %

The code traverses the directory structure from the current dir and picks out the “60.6557″ and sums that over the number of files that matched and then divides with however many files that matched.


find . -name "Logfile*.txt" -exec perl -ne '($var) = (/^Correctly.*\s+((\.|\d)+)\s+%/); print "$var\n" if $var;' '{}' \; | xargs perl -e 'use List::Util qw(sum); print(sum(@ARGV)/scalar(@ARGV)); print "\n";'

OBS: Not very robust!! But it IS a one-liner! ;)

Tags:

Index a DataFrame subset on string column name in Julia


julia> using RDatasets

julia> iris = dataset("datasets", "iris")

julia> iris[iris[:Species] .== "setosa", :]
50x5 DataFrame
|-------|-------------|------------|-------------|------------|----------|
| Row # | SepalLength | SepalWidth | PetalLength | PetalWidth | Species |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | "setosa" |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | "setosa" |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | "setosa" |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | "setosa" |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | "setosa" |
| 6 | 5.4 | 3.9 | 1.7 | 0.4 | "setosa" |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | "setosa" |
| 8 | 5.0 | 3.4 | 1.5 | 0.2 | "setosa" |
| 9 | 4.4 | 2.9 | 1.4 | 0.2 | "setosa" |

Tags: , ,

Nice post on Julia meta operations

Great post about basic Julia stuff @ Julia Helps

Tags:

Convert a matrix (Array) of Any to a matrix (Array) of floats in Julia


convert(Array{Float64},array_of_Anys)

Tags:

Print array in Julia with four number of decimals

The general to limit the number of decimals is:

println((round(Array,number of decimals))

so to get four decimals:

println((round(Array,4))

Tags:

Adding missing Keyboard Shortcuts to Mac OS X and OS X applications

There is a really cool feature in Mac OS X that enables you to add missing Keyboard shortcuts in OS X applications. For instance, I was missing a shortcut for Clearing the working environment in RStudio which removes the variables in the current workspace. To add this I added a keyboard shortcut for this in the App Shortcuts section in the System Preferences -> Keyboard -> Shortcuts section

Screen Shot 2014-01-24 at 19.31.14

If you want add shortcuts for sub-menu items you do this by using a “->” between the menu item names (see the File->New File->Text File example in the above screenshot).

I also very frequently select text and want to search for that on Google in a new Safari window, so I added a new shortcut for this in the Services section. The preferences for this look like this:

Screen Shot 2014-01-24 at 19.40.26

Tags: