julia> iris = dataset(“datasets”,”iris”)

julia> X = array(iris[:,1:4])

150×4 Array{Float64,2}:

5.1 3.5 1.4 0.2

4.9 3.0 1.4 0.2

4.7 3.2 1.3 0.2

4.6 3.1 1.5 0.2

5.0 3.6 1.4 0.2

5.4 3.9 1.7 0.4

4.6 3.4 1.4 0.3

5.0 3.4 1.5 0.2

4.4 2.9 1.4 0.2

4.9 3.1 1.5 0.1

5.4 3.7 1.5 0.2

4.8 3.4 1.6 0.2

4.8 3.0 1.4 0.1

4.3 3.0 1.1 0.1

?

6.3 3.4 5.6 2.4

6.4 3.1 5.5 1.8

6.0 3.0 4.8 1.8

6.9 3.1 5.4 2.1

6.7 3.1 5.6 2.4

6.9 3.1 5.1 2.3

5.8 2.7 5.1 1.9

6.8 3.2 5.9 2.3

6.7 3.3 5.7 2.5

6.7 3.0 5.2 2.3

6.3 2.5 5.0 1.9

6.5 3.0 5.2 2.0

6.2 3.4 5.4 2.3

5.9 3.0 5.1 1.8

Tellstick Openhab hack on Github

Enjoy!

]]>Install Raspian on your Raspberry Pi’s.

Install Java on your Raspberry Pi thus:

`sudo apt-get update && sudo apt-get install oracle-java7-jdk`

Install ssh on your Raspberry Pi thus:

`sudo apt-get install ssh`

Fetch Apache Spark to each of your Raspberry Pi’s:

`wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.1-bin-hadoop2.tgz`

Also install Spark on your master machine, in my case my Macbook Pro. The current version of Spark (1.0.1) wants all installations of Spark to be in the same folder on all of the machines, so I put them in `/usr/local/spark`

. To be precise, I coped the unpacked spark folder structure to /usr/local and then I made a symbolic link to this folder calling it “spark”.

This is done on all Workers (Raspberry Pi’s) and the Master Machine (Macbook Pro):

sudo mv spark-1.0.1-bin-hadoop2 /usr/local/

cd /usr/local

ln -s spark-1.0.1-bin-hadoop2 spark

I then tell the Master machine which IP to export so the workers can connect to the master, this is done by: `export SPARK_MASTER_IP=10.0.1.10`

I can now run the “Master Start” script on the master (Macbook Pro):

./sbin/start-master.sh

and then start the workers on each of the Raspberry Pi’s:

./bin/spark-class org.apache.spark.deploy.worker.Worker spark://10.0.1.10:7077

I can now submit jobs (for instance to calculate PI using Java) to the cluster on my master machine by:

./bin/spark-submit --master spark://10.0.1.10:7077 --class org.apache.spark.examples.JavaSparkPi lib/spark-examples-1.0.1-hadoop2.2.0.jar

Or using the Python Spark version:

./bin/spark-submit --master spark://10.0.1.10:7077 examples/src/main/python/pi.py 10

I can surf to:

```
```http://localhost:8080

To monitor my cluster.

We can now calculate pi to:

“Pi is roughly 3.131820”

and it only takes:

116.324641 seconds, now that is progress!

]]>We model the problem in the following probabilistic way:

The random variable C can take on three values C=1 or C=2 or C=3 and means respectively, the car is behind door 1, door 2 or door 3.

The random variable X can take on three values X=1 or X=2 or X=3 and means respectively, the contestant picked door 1, door 2 or door 3.

The random variable Y can take on three values Y=1 or Y=2 or Y=3 and means respectively, Monty opened door 1, door 2 or door 3.

Lets now assume that the car is behind door 3 and that the contestant (of course not knowing the car is behind door 3) randomly picks door 1. This leaves Monty no choice but to open door 2 (According to the image above, courtesy Wikipedia), now the question is, should the contestant switch to door 3? To solve the Monty Hall problem we would like to calculate the probability that the car is behind door 1 and 3 respectively, **given that Monty opened door 2!**

Lets start by calculating the probability that the car is behind door three. If we know this probability, then, by the laws of probability we also know the probability that the car is behind door one, since this is just 1 minus the probability that the car is behind door three (since Monty has opened door two and we know the car is not there).

With the above information we can formulate the above problem mathematically according to the following formula:

$$p(C=3|X=1,Y=2)$$ We call this equation Eq 1.

Which means:

(What is) the probability that the car is behind door three (C=3) given that the contestant picked door one (X=1) and that Monty opened door two (Y=2)?

Eq 1 can be re-expressed by Bayes rule as:

$$p(C=3|X=1,Y=2) = \frac{p(X=1,Y=2|C=3)p(C=3)}{p(X=1,Y=2)}$$

We call this equation Eq 2. Which should be read as:

the probability that the car is behind door 3 given that the contestant picked door 1 and that Monty opened door 2,

is the same as, the likelihood that the contestant picked door 1 and Monty opened door 2 given that the car was behind door 3, times theprior probabilitythat the car was behind door 3, all of this divided by the marginal probability that contestant picked door 1 and Monty opened door 2

This mathematical equivalence was proven by reverend Thomas Bayes and later also Laplace.

Eq 2 has three components on the right side of the equation, let’s look at them separately:

$p(X=1,Y=2|C=3)$ This is the part that says:

the probability that the contestant picked door 1 and Monty opened door 2 given that the car was behind door 3

What is this likelihood? Well, if the car is behind door three (C=3) which was given (we know this, but not the contestant), Monty will definitely not open that door (the contestant would obviously switch to that door then ). Monty will also not open the door that the contestant picked, so Y is in this case completely controlled by which door the contestant picks (the variables are not independent, i.e they are dependent). If the contestant picks door 1, Monty will definitely open door 2 since he knows (it was given in the setup of the example) that the car is behind door 3. So what is the probability that the contestant picks door 1 given that we know, but not the contestant, that the car is behind door 3? Well, the contestant does not know anything so he/she supposedly just picks one door by random chance, so picking door 1 has a one in three chance (1/3). So $p(X=1,Y=2|C=3) = 1/3$.

The equation:

$$p(X=1,Y=2|C=3)$$

can further be expanded via the rules of probability as follows:

$$p(X=1,Y=2|C=3) = p(Y=2|C=3,X=1) * p(X=1)$$

Which is a pure mathematical fact. This reformulation may actually make this case more clear. It is expressed in language as

the probability that Monty opens door 2 given that the contestant picked door 1 and that the car is behind door 3, times the prior probability that the contestant picks door 1

And what is this probability? Well if the contestant picked door 1 and the car is behind door 3, Monty has no choice but to open door 2, so this probability is one (1). The prior probability that the contestant picks door 1 is again one in three (1/3). This is another way to show that part one of the right hand side of Eq 2 equals 1/3.

$$p(X=1,Y=2|C=3) = p(Y=2|C=3,X=1) * p(X=1) = 1 * 1/3 = 1/3$$

Let’s then look at the next part of Eq 2 p(C=3). What is this probability? Well, we must assume that the game show randomly picks a position for the car, so this probability is one in three (1/3).

$$p(C=3) = 1/3$$

Now there is only one part of Eq 2 left and that is the denominator:

$$p(X=1,Y=2)$$

Which is perhaps a bit tricky to think about. This says:

The probability that the contestant picks door 1 and that Monty opens door 2,

irrespective of where the car is!That is, in this part it isnotgiven that the car is behind door 3!

What is this probability then? Well, the contestant has a one in three chance of picking door 1, and in that case Monty can only open door 2 or door 3, so the probability of Monty opening door 2 is then one in two (1/2). Remember, in this case he does not know where the car is, but he knows the contestant picked door 1, so he can only choose door 2 or 3. This gives $1/3 * 1/2 = 1/6$. Also this part can be further expanded by the rules of probability as:

$$p(X=1,Y=2) = p(Y=2|X=1)p(X=1)$$

Where p(Y=2|X=1) = 1/2 (if the contestant picks door 1 there is a fifty-fifty chance that Monty opens door 2 (again, since he does not know where the car is located in this part of the formula). And p(X=1) is one in three (1/3) as usual, which equals $1/2 * 1/3 = 1/6$. So now we have all the parts we need to calculate our final answer for what the probability that the car is behind door 3 is, given that the contestant picked door 1 and Monty opened door 2.

$$p(X=1,Y=2|C=3) = 1/3$$

$$p(C=3) = 1/3$$

$$p(X=1,Y=2) = 1/6$$

Which means that:

$$p(C=3|X=1,Y=2) = \frac{p(X=1,Y=2|C=3)p(C=3)}{p(X=1,Y=2)} = \frac{1/3 * 1/3}{1/6} = \frac{1/9}{1/6} = \frac{6}{9} = \frac{2}{3}$$

So the probability that the car is behind door 3 is 2/3, this in turn means that the probability that the car is behind door 1 is 1 – 2/3 = 1/3. For the concreteness of this discussion we picked some doors and placed the car and the contestants choices, but the calculations are the same for any placement so hopefully now you should be convinced (but I’m sure some of you are not) that the contestant should always switch doors, since there is a 2/3 chance that the car is behind the door that he/she did not originally pick!

I hasten to add that, of course this means that sometimes (in 1/3 of the cases) the contestant will be switching to a loosing door!

]]>

julia> DataFrame(CCnt=1:10,Alpha=21:30)

10x2 DataFrame:

CCnt Alpha

[1,] 1 21

[2,] 2 22

[3,] 3 23

[4,] 4 24

[5,] 5 25

[6,] 6 26

[7,] 7 27

[8,] 8 28

[9,] 9 29

[10,] 10 30
julia> samples = DataFrame(CCnt=1:10,Alpha=21:30)

10x2 DataFrame:

CCnt Alpha

[1,] 1 21

[2,] 2 22

[3,] 3 23

[4,] 4 24

[5,] 5 25

[6,] 6 26

[7,] 7 27

[8,] 8 28

[9,] 9 29

[10,] 10 30

julia> samples[:CCnt]

10-element DataArray{Int64,1}:

1

2

3

4

5

6

7

8

9

10

`julia> vector(samples[:CCnt])`

10-element Array{Int64,1}:

1

2

3

4

5

6

7

8

9

10

Example:

julia> mydf = DataFrame(X=[0:10],Y=[100:110])

11x2 DataFrame:

X Y

[1,] 0 100

[2,] 1 101

[3,] 2 102

[4,] 3 103

[5,] 4 104

[6,] 5 105

[7,] 6 106

[8,] 7 107

[9,] 8 108

[10,] 9 109

[11,] 10 110

```
```

`julia> mydf = vcat(mydf,DataFrame(X=12,Y=15))`

12x2 DataFrame:

X Y

[1,] 0 100

[2,] 1 101

[3,] 2 102

[4,] 3 103

[5,] 4 104

[6,] 5 105

[7,] 6 106

[8,] 7 107

[9,] 8 108

[10,] 9 109

[11,] 10 110

[12,] 12 15

This can be solved thusly:

$var = $1 if (/Correct (\d)+ %/);

The above snippet will assign $var if the regex on the right hand side matches and picks out a value (via the capturing parenthesis on the right hand side and otherwise leave it unchanged.

]]>The below one line picks out a value in each file that matches the name “Logfile*.txt” in the underlying directory structure.

In the below case, the line was in the form of:

Correctly Classified Instances 37 60.6557 %

or

Correctly Classified Instances 37 60 %

The code traverses the directory structure from the current dir and picks out the “60.6557″ and sums that over the number of files that matched and then divides with however many files that matched.

find . -name "Logfile*.txt" -exec perl -ne '($var) = (/^Correctly.*\s+((\.|\d)+)\s+%/); print "$var\n" if $var;' '{}' \; | xargs perl -e 'use List::Util qw(sum); print(sum(@ARGV)/scalar(@ARGV)); print "\n";'

OBS: Not very robust!! But it IS a one-liner!

]]>

julia> using RDatasets
julia> iris = dataset("datasets", "iris")

`julia> iris[iris[:Species] .== "setosa", :]`

50x5 DataFrame

|-------|-------------|------------|-------------|------------|----------|

| Row # | SepalLength | SepalWidth | PetalLength | PetalWidth | Species |

| 1 | 5.1 | 3.5 | 1.4 | 0.2 | "setosa" |

| 2 | 4.9 | 3.0 | 1.4 | 0.2 | "setosa" |

| 3 | 4.7 | 3.2 | 1.3 | 0.2 | "setosa" |

| 4 | 4.6 | 3.1 | 1.5 | 0.2 | "setosa" |

| 5 | 5.0 | 3.6 | 1.4 | 0.2 | "setosa" |

| 6 | 5.4 | 3.9 | 1.7 | 0.4 | "setosa" |

| 7 | 4.6 | 3.4 | 1.4 | 0.3 | "setosa" |

| 8 | 5.0 | 3.4 | 1.5 | 0.2 | "setosa" |

| 9 | 4.4 | 2.9 | 1.4 | 0.2 | "setosa" |