Forty-Two, and now?: May 2013

Friday, May 31, 2013

Reasons behind the proposed puzzle and its solution

Hello, I am referring to the previous posts about the puzzle I proposed, along with some possible algorithmic solution.

I also drafted a proposal on how to split the problem across multiple concurrent processes.

At this point maybe I should write a few words on why we would want to do this and when it could become useful.
There are different paradigms that allow to split a problem into multiple sub problems with the purpose of leveraging on multiple available processors.

The Seti@home project efficiently does exactly this.

In my view we can classify these kind of problems in two main categories :
1) Problems that can be "hashed" into subsets (or "buckets") of data
2) Problems that cannot be immediately hashed since the state f(n+1) depends on f(n), therefore I could not ask a process to compute f(n+1) unless it already computed f(n)

Let's take some simple examples :

The equation that allows us to represent a line can be expressed as y = a +bx
This means that we can calculate directly y(x) for any value of x (being x a real number), so such problem falls easily into my category 1)
However, when we split the workload on different processors, we normally consider finite increments (this is common in real life problems), so that the variable x is a Natural Number (an integer)

Some other problems, which are common when we run simulations, require an iterative approach.
By that I mean that in order to calculate a function (or better a series) y(x) we need to apply a set of rules that take into account previous values of the serie itself.
A typical, well known example, could be the Fibonacci number series.
These fall in my category 2) and may be important in computing simulated scenarios.

Still there is a distinction to be made within category 2 :
Series like the Fibonacci one have one possible values at each i step, such as f(i) is defined and has a single solution.
When exploring simulated scenarios, normally f(i) presents several possible solutions, as visible in the description of the proposed puzzle.
While this is indeed making the problem harder to tackle, it also opens (eventually) up to parallel processing.
In fact, since f(i) has multiple values, then f(i+1) is also going to have multiple values (in general more than f(i) unless f is converging) meaning that each value f(i) can be considered the start of a new problem and then individually assigned to a separate processor.

What could be a practical example of a problem benefiting from this approach?

I am thinking about models used to simulate a pandemic spread, or the evolution of a meteorological event. Any model that aims to test all the possible evolution of something which is not directly predictable using a mathematical function.

I don't think I invented this approach, I am sure there is a lot of great work done already, I am just curious to try it myself

Saturday, May 25, 2013

Towel Day 2013

Any Vogon spaceship passing by lately?

... seriously, we are waiting here!

Friday, May 24, 2013

Puzzle, moving the solution to parallel

The puzzle problem illustrated here and for which I proposed a simple algorithm here, can probably be assigned to multiple CPUs at the same time in a way that the solution is found in parallel.

In my original post I already explained that the solution search process can be "initialized" using the "step" array which encodes the list of "moves" executed in the first "n" steps.

This means that the problem can be visualized as follows :

Each level of this tree represents one step.
So, for step 1 (after root) there are 8 possible positions to be tested, each one of which has 8 possible destinations in step 2 etc.

"Slicing"the problem means, at any level, to restrict the search from move n>=1 to move m <=8.
Then slices of the problem may be distributed to different CPUs to be computed at the same time.

If we try to implement this concept in the example algorithm, we need to account for three things :

1) The initial part of the solution, consisting of a step array from 1 to i that locates the beginning of the search for the given process
2) The initial move to be tested at step i+1
3) The final move to be tested at step i+1

Technically this could be implemented by making step a 2 dimensional array initially set as :

{{0,8},{0.8},{0,8}....} being 0 the last move attempted (none) and 8 the last to be attempted for that particular step.

When initializing the solution for a giving CPU, by transmitting the step array, we could use the last element in the array as the boundary of the solution to be searched.

Let's imagine a process P1 started to attempt a solution for the (5x5) puzzle and after 3 steps, it decides to call for reinforcements, asking a second cpu to take over part of the problem.

In our example the current situation is represented by the image below

this also means that the step array contains the following data :

Where the Last Move is the move that allowed us to move from A1 to C2 (Step 1), then from C2 to E1 (Step 2) etc according to the numbering visible below.

We know that the Max Move is 8 because the current process (P1) is asked to investigate the FULL solution (so, from 1 to 8 for every step).
At this point P1 wants to ask a new process P2 to address part of the problem starting from step 4.
Let's imagine that P1 decides to split the work from step 4 onward into two sets of moves S1= [1..4] and S2=[5..8].
It will run itself S1 and ask P2 to take care of S2.

P1 needs to prepare the initial step array for P2, basically telling it which part of the problem has already been taken care of.
The first 3 steps will need to be "locked", so the Max Move will be set equal to Last Move, plus the additional new step 4 will have Last Move = 4 (this si the last move defined in S1) and Max Move = 8

-> Step(P2) = {{3,3},{2,2},{5,5},{4,8}}

At the same time P1 needs to set a constraining Max Move for step 4

-> Step(P1) = {{3,8},{2,8},{5,8},{0,4}}

The example used only two different processes, but the concept stands for any number of processes that could be engaged at any step.

Thursday, May 9, 2013

Puzzle algorithm example

For those that have seen my previous post and are eventually interested to see a simple example of the described algorithm, here you can find a bare-bone implementation in Java.
You can configure the size of the square by setting the values of sizeX and sizeY.
Be warned that solutions for squares up to 7x7 come out pretty quickly on an i5 PC, from 8x8 on the magnitude of the problem requires LONG times of execution before the first solution is found.

This is an example of the solution found for a 5x5 problem :

As it is designed now, the algorithm does not implement any kind of optimization and it stops when it finds the first solution (but it can be easily modified to find ALL the solutions) or when it runs out of options.

The reason why I am proposing this java implementation is because it is easily readable and implements a basic execution logic which allows me to easily experiment trying to add some parallelism.

The final solution is not intended to be in Java

public class LatinPuzzle {

private int stepCounter;
private int step[];
private int cell[][];
static final int moves[][] =
{{1,-2},{2,-1},{2,1},{1,2},
{-1,2},{-2,1},{-2,-1},{-1,-2}};
static final int sizeX = 6;
static final int sizeY = 6;
private int pSize= sizeX * sizeY;
// debugging/control info
private long needUpdate =0;
private int low = 100;
private int high=0;


void init()
{
step = new int[pSize];
cell = new int[pSize][2];
stepCounter = 0;
cell[0][0]=1;
cell[0][1]=1;
step[0]=0;
}

private boolean isValid(int x,int y)
{
if ((x<1)||(y<1)||(x>sizeX)||(y>sizeY))
  return false;
int scan =0;
while (scan<stepCounter)
{
  if ((cell[scan][0]==x)&&(cell[scan][1]==y))
return false;
  scan++;
}
return true;
}

private void move(int n)
{
step[stepCounter]=n;
if (n>=8) return;
int x = cell[stepCounter][0]+moves[n][0];
int y = cell[stepCounter][1]+moves[n][1];
if (isValid(x,y))
{
stepCounter++;
cell[stepCounter][0] = x;
cell[stepCounter][1] = y;
}
}

private boolean backOne()
{
step[stepCounter--]=0;
if (stepCounter<1) return false; else return true;
}

public void tryPuzzle()
{
boolean running = true;
while ((stepCounter<(pSize-1))&&(running))
{
if (step[stepCounter]>=8) running = backOne();
else
move(step[stepCounter]+1);
// debugging/control start
if (stepCounter>high) high = stepCounter;
if (stepCounter<low) low = stepCounter;

if (needUpdate++ % 10000000 == 0)
{
System.out.println(""+needUpdate+","+stepCounter+" "+low+"-"+high);
low = pSize;
high = 0;
}
// debugging/control end

}
// final output
if (stepCounter>=(pSize-1))
{
System.out.println(""+needUpdate+", Success !! "+stepCounter+".");
for (int i=0;i<stepCounter;i++)
System.out.print(""+step[i]+".");
System.out.println();
for (int i=0;i<=stepCounter;i++)
System.out.print(""+cell[i][0]+","+cell[i][1]+" ");
} else
  System.out.println(""+needUpdate+",failed "+stepCounter+".");
}

public static void main(String[] args)
{
LatinPuzzle lp = new LatinPuzzle();
lp.init();
lp.tryPuzzle();
}
}

Wednesday, May 8, 2013

BeagleBone Black - Unboxing

I just received my Beaglebone Black.
It looks cool, it's black...

Besides the color it looks like the previous (white) version, but you can immediately spot the presence of an HDMI connector.
This is probably a nice addition, being able, out of the box, to connect to an HDMI monitor, not sure I will use it since up to now I used my bone as a remote terminal only.
But still I think that keyboard + mouse + monitor might eventually get you out of the problem in the case you mess up with your network configuration.

The box it comes in is (I got mine from Adafruit, because I think LadyAda is really cool) a neat pretty soft box which also contains a MINI usb cable.

Checking the side of the board, you can see the micro HDMI connector (close to the SD slot, below the pcb) with the standard usb and mini SD.

No SD is included because the black has an internal flash that can store the SO, the SD connector is still available so you can use another SD for storage.

I think this is a neat feature because, besides lowering the total cost (no need for the SD card) it also allows you to store your stuff separately from the OS which in the end reduces the chances you mess up the OS filesystem (that happened to me, was logging data on the SD of the bone, power went off... and the file system was broken, had to recreate the SD.

The other side remains pretty much the same as the previous version : 5V connector, a mini usb and ethernet.

The main differences with the white bone are :
1) Price is about half (45$ vs 89$)
2) Added HDMI
3) Added internal flash 2Gb (with performance boost x8 vs x4 bits)
4) Memory was 256MB DDR2, now 512MB DDR3 (!!)
5) CPU (AM3359) clock was 720MHz, now 1GHz ( thanks to the Sitara XAM3359AZCZ100)

Also, the white bone CPU was scaling down to 500MHz when powered via USB instead of the 5V connector, apparently (from the specs) this is not happening for the black, which should stay at 1000MHz.

I will start soon playing with this new toy, which apparently comes configured to start X immediately (not the way I like it, but it is normally easy to set it up).

One thing that really triggers my curiosity is that it is supposed to support StarterWare, a set of free TI libraries (kudos to TI, you manufactured my first computer -TI99/4A- 20+ years ago and still surprise me every month. Count me in your fan club :) ) supposed to give you API level access to the hardware without the need of an OS.

Will keep you posted

Tuesday, May 7, 2013

Latin, Puzzles and Parallel Computing

First of all I have to apologize for being absent for so long, I was actually busy with some new interests, which obviously are now generating the urge to post about them.

No, this is not about Latin, sorry for those that arrived here with some wicked Google search about some dead language.

In a galaxy far, far away and long time ago, I attended a "scientific high school" (Liceo Scientifico), that happened in Milano, Italy.
When you attend that kind of school, you also have to study Latin... yeah, I know...

As you can easily imagine, me and most of my friends did not like the subject as we believed that it had nothing to do with science, which actually was our main interest.
Latin for us was good to come out of situations where you did not have a real answer, in the end it did not make any sense, but it sounded pretty damn cool.
Our teacher said it helped us to exercise our brain and would have provided us a good forma mentis, I honestly thought that if she had to use Latin to bullshit us, she was actually proving our point.
Still, we had no other choice than suck it up for five long years.
We also had to study religion and the teacher tried to convince us that exorcism is a true thing, but I was too busy playing fetch across the rainbow with my unicorn to pay attention to that.

The point is that in order to survive the immense boredom generated by those studies, we started playing puzzles during the lectures.
Our preferred one was this one :

The rules are quite simple : in a 10x10 square, starting top left, you move from cell to cell using the chess horse move and you record the progressive number of your move.

You cannot go outside the square and cannot enter twice in the same cell.

Winning condition is to fill in all the 100 numbers.

It is NOT easy, but eventually one of us solved the puzzle after about 2 or 3 years of Latin.

If you think this is a boring activity, you probably never studied Latin irregular verbs.

Unfortunately, the guy who solved the puzzle, failed Latin at the same time... but hey, it was TOTALLY worth it.

Still, after a while -maybe 1 month- it became "not so interesting" to me and I thought it would have been more interesting to write an algorithm to solve it.
The first prototype was written in basic on a Sinclair QL, it never worked tho because I did not know the QL at all, and did not own one, I just did a few tests on the computer of a schoolmate brother.
I was 15 at that time and my (self taught) programming skills were still unfortunately limited to TI basic and some Z80 assembly.

Nonetheless that exercise helped me to formulate an algorithmic solution based on the following ideas (trying to remember it) :

1) "Moves" can be encoded in a way that, starting from a position X, the next position can be one of the 8 shown below

This means that we can create a function move(n) [with n between 1 and 8] that returns a 2 element array with dX and dY being the increments on the X and Y axis,

2) The new coordinates are calculated and an isValid(x,y) function returns true if the the move lands in a valid location.
Non valid locations are those outside the square (X<1 , X>10 , Y<1, Y>10) or those already occupied by previous steps

3) An array step[100] of integers will record at each step the move that brought me there and a second array cell[100][2] will store the X,Y coordinates of the positions

4) a stepCounter keeps track of current step

5) I verify if I am in a winning position (stepCounter == 100)

Logic :

When I start I immediately try the move(1), if isValid returns false, I try the move 2, until I reach the move 8.
In order to check if a cell is taken, I simply scan the cell[] array (there are some possible optimizations here, but the concept stands).
When I reach move 8 and still cannot place the move, I need to decrement stepCounter and try the next available move step[--stepCounter]

This is clearly a brute force algorithm, not particularly smart, but the beauty is that it can be easily generalized to solve different puzzles, I successfully used it to solve "find the exit of the maze" puzzles too.

Unfortunately the first working version of it came some time later (still before my friend did solve the puzzle manually tho :) ) running on a Ast Bravo 286 , which was my first MS-DOS PC running a 16 bit processor at an incredible speed of 10MHz.
A better version was coded in Turbo Pascal 3, later on improved with TP5.5 and finally refined with BP7 including some parts in assembly.

I also implemented part of it in Scheme Lisp, using a functional approach with recursive calls instead of the imperative style of the first versions, however I came out with the impression that these kind of problems should be solved with low level languages since code execution speed matters a lot.

By the time I had the BP7 version of the code, I had a second computer, a 486 DX 50MHz which proved to be way faster than my old 286, but the idea was : can I have both two computers work at the same time, on the SAME solution?

I did not have a network connecting the two PCs, just a serial cable, and fantasized on how to have them working in parallel exchanging information to balance dynamically the load.
I implemented instead a simpler approach : the BP7 version was able to save it's state to disk and eventually resume the execution.
If you think, saving the state is pretty easy : you just need the step[] array, anything else can be reconstructed.
This also meant that feeding a pre-filled step array to the program would make it take care of a PORTION of the executions.
I know it was not much, but it was a first step (my first step) into parallelism.
After all Latin was useful for me as I would have never faced this challenge without the boredom that "Rosa, Rosae, Rosae..." can generate, plus, as a bonus, now I can say basic profanities in the same language that the Pope speaks!

Recently I have been fiddling around parallel computing (for data mining and similar stuff), played a bit with map-reduce and some basic implementations in Python.

Map reduce is a great solution for some problems, provided you can pre- partition the data in a way that a master node sends packages of data to mapper (and finally reducer) client processes.
The issue I see is that you cannot (I believe) use it for a generic calculation where the step n depends on the steps [0..n-1], situation that can be easily represented by the puzzle I illustrated.

Now, I have two questions in my mind :
1) Do we have real problems to be solved that can be eventually attacked with an approach similar to a generalized version of my puzzle algorithm?
2) Which kind of balancing mechanism can be implemented to dynamically manage executions in parallel?

I have hard time figuring out point 1 (damn, I should have payed more attention to those Latin lectures, I am sure they would have been useful here!!), but I have some ideas for point 2.

The idea is :
what about experimenting with real hardware devices without an OS on them, extremely flexible in a way that we could eventually implement a low level communication protocol between them... and extremely cheap?
Are you thinking what I am thinking?
Cheap microcontrollers scaling up to ARM Sitara microprocessors (MSP430G2 up to BeagleBone Black with a Sitara @1GHz: 45$ )
For the actual balancing mechanism, I think it should be possible to work with a topology with no master and having peer clients exchanging directly information (maybe using SPI), I have some basic ideas, but I need to run some simulations first.

OK, this post is too long now, so I will follow up maybe next week in another post, meanwhile feel free to send your ideas, suggestions...

... and I will refrain from closing with a Latin quote.

[update : Example algorithm (in Java) here]

Search Microcontrollers