Convert a DataArray subset taken from a DataFrame to an multidimensional Array in Julia

julia> using RDatasets

julia> iris = dataset(“datasets”,”iris”)
julia> X = array(iris[:,1:4])
150×4 Array{Float64,2}:
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2
5.4 3.9 1.7 0.4
4.6 3.4 1.4 0.3
5.0 3.4 1.5 0.2
4.4 2.9 1.4 0.2
4.9 3.1 1.5 0.1
5.4 3.7 1.5 0.2
4.8 3.4 1.6 0.2
4.8 3.0 1.4 0.1
4.3 3.0 1.1 0.1
?
6.3 3.4 5.6 2.4
6.4 3.1 5.5 1.8
6.0 3.0 4.8 1.8
6.9 3.1 5.4 2.1
6.7 3.1 5.6 2.4
6.9 3.1 5.1 2.3
5.8 2.7 5.1 1.9
6.8 3.2 5.9 2.3
6.7 3.3 5.7 2.5
6.7 3.0 5.2 2.3
6.3 2.5 5.0 1.9
6.5 3.0 5.2 2.0
6.2 3.4 5.4 2.3
5.9 3.0 5.1 1.8

iExplorer is great!

Of some stupid reason all of my contacts on my old iPhone was not synched to my computer or iCloud but of course I backed up the old iPhone before wiping it so I had a new backup. With iExplorer (http://www.macroplant.com/iexplorer/) I could easily extract my contacts from the backup and sync them to my new iPhone and iCloud! Thanks Macroplat for a great product!

Tags:

Released Tellstick OpenHAB hacks on Github

I have done some small hacks to get tellstick events that are not picked up by the openHAB tellstick binding into openHAB. This is done by a python script that sends REST requests to openHAB to notify of changes in these devices. It can be any device that is not picked up by the tellstick binding but are picked up by the Tellstick Duo. It’s a ugly hack, but it does the job for me at the moment, hopefully one day it will not be needed. Anyway it is here:

Tellstick Openhab hack on Github

Enjoy!

Calculating PI on a Raspberry Pi Spark Cluster

To give an example on progress in society I will tell you how you can compute PI using distributed computing on Spark and Raspberry PI’s on your own little local computing cluster.

Install Raspian on your Raspberry Pi’s.

Install Java on your Raspberry Pi thus:

sudo apt-get update && sudo apt-get install oracle-java7-jdk

Install ssh on your Raspberry Pi thus:

sudo apt-get install ssh

Fetch Apache Spark to each of your Raspberry Pi’s:

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.1-bin-hadoop2.tgz

Also install Spark on your master machine, in my case my Macbook Pro. The current version of Spark (1.0.1) wants all installations of Spark to be in the same folder on all of the machines, so I put them in /usr/local/spark. To be precise, I coped the unpacked spark folder structure to /usr/local and then I made a symbolic link to this folder calling it “spark”.

This is done on all Workers (Raspberry Pi’s) and the Master Machine (Macbook Pro):


sudo mv spark-1.0.1-bin-hadoop2 /usr/local/
cd /usr/local
ln -s spark-1.0.1-bin-hadoop2 spark

I then tell the Master machine which IP to export so the workers can connect to the master, this is done by: export SPARK_MASTER_IP=10.0.1.10

I can now run the “Master Start” script on the master (Macbook Pro):


./sbin/start-master.sh

and then start the workers on each of the Raspberry Pi’s:


./bin/spark-class org.apache.spark.deploy.worker.Worker spark://10.0.1.10:7077

I can now submit jobs (for instance to calculate PI using Java) to the cluster on my master machine by:


./bin/spark-submit --master spark://10.0.1.10:7077 --class org.apache.spark.examples.JavaSparkPi lib/spark-examples-1.0.1-hadoop2.2.0.jar

Or using the Python Spark version:

./bin/spark-submit --master spark://10.0.1.10:7077 examples/src/main/python/pi.py 10

I can surf to:

http://localhost:8080

To monitor my cluster.

Raspberry Pi Spark Cluster

Raspberry Pi Spark Cluster

We can now calculate pi to:

“Pi is roughly 3.131820”

and it only takes:

116.324641 seconds, now that is progress! ;)

Tags: , ,

A Bayesian analysis of the Monty Hall problem

The famous Monty Hall show where a contestant has a chance to win a car that is placed behind one of three closed doors have given rise to the so called Monty Hall problem. What is the problem? Let us discuss the setup before delving into the problem statement. The show Lets Make a Deal works in short as follows. The contestant is lead in to the show where three closed doors are placed, behind one of the closed doors is a luxury car and behind the others a goat. The contestant gets to pick one door. The door is not yet opened, Monty Hall then walks in and opens one of the other two doors, of course Monty knows where the car is placed so he opens a door which he knows has a goat behind it. Monty then offers the contestant a deal, the contestant is allowed to switch door to the other unopened door if he/she wants to. The Monty Hall problem is; should the contestant switch doors? Most people intuitively thinks it does not matter, that it is a fifty-fifty chance now (there are two closed doors), and most (probably due to the Endowment effect) does not want to switch. There are many intricate twists, variations and additional considerations to this problem, but we will here only discuss the simple setup as stated above.

We model the problem in the following probabilistic way:

The random variable C can take on three values C=1 or C=2 or C=3 and means respectively, the car is behind door 1, door 2 or door 3.

The random variable X can take on three values X=1 or X=2 or X=3 and means respectively, the contestant picked door 1, door 2 or door 3.

The random variable Y can take on three values Y=1 or Y=2 or Y=3 and means respectively, Monty opened door 1, door 2 or door 3.

Monty Hall example

Lets now assume that the car is behind door 3 and that the contestant (of course not knowing the car is behind door 3) randomly picks door 1. This leaves Monty no choice but to open door 2 (According to the image above, courtesy Wikipedia), now the question is, should the contestant switch to door 3? To solve the Monty Hall problem we would like to calculate the probability that the car is behind door 1 and 3 respectively, given that Monty opened door 2!

Lets start by calculating the probability that the car is behind door three. If we know this probability, then, by the laws of probability we also know the probability that the car is behind door one, since this is just 1 minus the probability that the car is behind door three (since Monty has opened door two and we know the car is not there).

With the above information we can formulate the above problem mathematically according to the following formula:

$$p(C=3|X=1,Y=2)$$ We call this equation Eq 1.

Which means:

(What is) the probability that the car is behind door three (C=3) given that the contestant picked door one (X=1) and that Monty opened door two (Y=2)?

Eq 1 can be re-expressed by Bayes rule as:

$$p(C=3|X=1,Y=2) = \frac{p(X=1,Y=2|C=3)p(C=3)}{p(X=1,Y=2)}$$

We call this equation Eq 2. Which should be read as:

the probability that the car is behind door 3 given that the contestant picked door 1 and that Monty opened door 2, is the same as, the likelihood that the contestant picked door 1 and Monty opened door 2 given that the car was behind door 3, times the prior probability that the car was behind door 3, all of this divided by the marginal probability that contestant picked door 1 and Monty opened door 2

This mathematical equivalence was proven by reverend Thomas Bayes and later also Laplace.

Eq 2 has three components on the right side of the equation, let’s look at them separately:

$p(X=1,Y=2|C=3)$ This is the part that says:

the probability that the contestant picked door 1 and Monty opened door 2 given that the car was behind door 3

What is this likelihood? Well, if the car is behind door three (C=3) which was given (we know this, but not the contestant), Monty will definitely not open that door (the contestant would obviously switch to that door then :) ). Monty will also not open the door that the contestant picked, so Y is in this case completely controlled by which door the contestant picks (the variables are not independent, i.e they are dependent). If the contestant picks door 1, Monty will definitely open door 2 since he knows (it was given in the setup of the example) that the car is behind door 3. So what is the probability that the contestant picks door 1 given that we know, but not the contestant, that the car is behind door 3? Well, the contestant does not know anything so he/she supposedly just picks one door by random chance, so picking door 1 has a one in three chance (1/3). So $p(X=1,Y=2|C=3) = 1/3$.

The equation:

$$p(X=1,Y=2|C=3)$$

can further be expanded via the rules of probability as follows:

$$p(X=1,Y=2|C=3) = p(Y=2|C=3,X=1) * p(X=1)$$

Which is a pure mathematical fact. This reformulation may actually make this case more clear. It is expressed in language as

the probability that Monty opens door 2 given that the contestant picked door 1 and that the car is behind door 3, times the prior probability that the contestant picks door 1

And what is this probability? Well if the contestant picked door 1 and the car is behind door 3, Monty has no choice but to open door 2, so this probability is one (1). The prior probability that the contestant picks door 1 is again one in three (1/3). This is another way to show that part one of the right hand side of Eq 2 equals 1/3.

$$p(X=1,Y=2|C=3) = p(Y=2|C=3,X=1) * p(X=1) = 1 * 1/3 = 1/3$$

Let’s then look at the next part of Eq 2 p(C=3). What is this probability? Well, we must assume that the game show randomly picks a position for the car, so this probability is one in three (1/3).

$$p(C=3) = 1/3$$

Now there is only one part of Eq 2 left and that is the denominator:

$$p(X=1,Y=2)$$

Which is perhaps a bit tricky to think about. This says:

The probability that the contestant picks door 1 and that Monty opens door 2, irrespective of where the car is! That is, in this part it is not given that the car is behind door 3!

What is this probability then? Well, the contestant has a one in three chance of picking door 1, and in that case Monty can only open door 2 or door 3, so the probability of Monty opening door 2 is then one in two (1/2). Remember, in this case he does not know where the car is, but he knows the contestant picked door 1, so he can only choose door 2 or 3. This gives $1/3 * 1/2 = 1/6$. Also this part can be further expanded by the rules of probability as:

$$p(X=1,Y=2) = p(Y=2|X=1)p(X=1)$$

Where p(Y=2|X=1) = 1/2 (if the contestant picks door 1 there is a fifty-fifty chance that Monty opens door 2 (again, since he does not know where the car is located in this part of the formula). And p(X=1) is one in three (1/3) as usual, which equals $1/2 * 1/3 = 1/6$. So now we have all the parts we need to calculate our final answer for what the probability that the car is behind door 3 is, given that the contestant picked door 1 and Monty opened door 2.

$$p(X=1,Y=2|C=3) = 1/3$$

$$p(C=3) = 1/3$$

$$p(X=1,Y=2) = 1/6$$

Which means that:

$$p(C=3|X=1,Y=2) = \frac{p(X=1,Y=2|C=3)p(C=3)}{p(X=1,Y=2)} = \frac{1/3 * 1/3}{1/6} = \frac{1/9}{1/6} = \frac{6}{9} = \frac{2}{3}$$

So the probability that the car is behind door 3 is 2/3, this in turn means that the probability that the car is behind door 1 is 1 – 2/3 = 1/3. For the concreteness of this discussion we picked some doors and placed the car and the contestants choices, but the calculations are the same for any placement so hopefully now you should be convinced (but I’m sure some of you are not) that the contestant should always switch doors, since there is a 2/3 chance that the car is behind the door that he/she did not originally pick!

I hasten to add that, of course this means that sometimes (in 1/3 of the cases) the contestant will be switching to a loosing door!

Tags: , ,

Convert DataArray taken from a DataFrame to an Array / Vector in Julia


julia> DataFrame(CCnt=1:10,Alpha=21:30)
10x2 DataFrame:
CCnt Alpha
[1,] 1 21
[2,] 2 22
[3,] 3 23
[4,] 4 24
[5,] 5 25
[6,] 6 26
[7,] 7 27
[8,] 8 28
[9,] 9 29
[10,] 10 30

julia> samples = DataFrame(CCnt=1:10,Alpha=21:30)
10x2 DataFrame:
CCnt Alpha
[1,] 1 21
[2,] 2 22
[3,] 3 23
[4,] 4 24
[5,] 5 25
[6,] 6 26
[7,] 7 27
[8,] 8 28
[9,] 9 29
[10,] 10 30

julia> samples[:CCnt]
10-element DataArray{Int64,1}:
1
2
3
4
5
6
7
8
9
10

julia> vector(samples[:CCnt])
10-element Array{Int64,1}:
1
2
3
4
5
6
7
8
9
10

Tags:

Add / Concat / append / rbind row to Julia DataFrame

In Julia you use vcat to add or append or concatenate a row of data to a Julia DataFrame.

Example:

julia> mydf = DataFrame(X=[0:10],Y=[100:110])
11x2 DataFrame:
X Y
[1,] 0 100
[2,] 1 101
[3,] 2 102
[4,] 3 103
[5,] 4 104
[6,] 5 105
[7,] 6 106
[8,] 7 107
[9,] 8 108
[10,] 9 109
[11,] 10 110

julia> mydf = vcat(mydf,DataFrame(X=12,Y=15))
12x2 DataFrame:
X Y
[1,] 0 100
[2,] 1 101
[3,] 2 102
[4,] 3 103
[5,] 4 104
[6,] 5 105
[7,] 6 106
[8,] 7 107
[9,] 8 108
[10,] 9 109
[11,] 10 110
[12,] 12 15

Tags:

Assign a value in Perl only if a regex matches

Sometimes (especially in one-liners) you want to assign a value only if a corresponding regex (regular expression) that picks out the value matches. I.e if it has once matched you don’t want it overwritten with undef if the regex later fails on a subsequent row in your file.

This can be solved thusly:


$var = $1 if (/Correct (\d)+ %/);

The above snippet will assign $var if the regex on the right hand side matches and picks out a value (via the capturing parenthesis on the right hand side and otherwise leave it unchanged.

Tags:

Perl one-liner to calculate an average of some value in a bunch of files

A quick and dirty one-liner (depending on the length of your lines ;)) to calculate the average of a value in a bunch of files in a directory structure.

The below one line picks out a value in each file that matches the name “Logfile*.txt” in the underlying directory structure.

In the below case, the line was in the form of:

Correctly Classified Instances 37 60.6557 %

or

Correctly Classified Instances 37 60 %

The code traverses the directory structure from the current dir and picks out the “60.6557″ and sums that over the number of files that matched and then divides with however many files that matched.


find . -name "Logfile*.txt" -exec perl -ne '($var) = (/^Correctly.*\s+((\.|\d)+)\s+%/); print "$var\n" if $var;' '{}' \; | xargs perl -e 'use List::Util qw(sum); print(sum(@ARGV)/scalar(@ARGV)); print "\n";'

OBS: Not very robust!! But it IS a one-liner! ;)

Tags:

Index a DataFrame subset on string column name in Julia


julia> using RDatasets

julia> iris = dataset("datasets", "iris")

julia> iris[iris[:Species] .== "setosa", :]
50x5 DataFrame
|-------|-------------|------------|-------------|------------|----------|
| Row # | SepalLength | SepalWidth | PetalLength | PetalWidth | Species |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | "setosa" |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | "setosa" |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | "setosa" |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | "setosa" |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | "setosa" |
| 6 | 5.4 | 3.9 | 1.7 | 0.4 | "setosa" |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | "setosa" |
| 8 | 5.0 | 3.4 | 1.5 | 0.2 | "setosa" |
| 9 | 4.4 | 2.9 | 1.4 | 0.2 | "setosa" |

Tags: , ,