Traversing INT array in two ways

Traversing INT array in two ways - c

Traversing INT array in two ways is a robotic funny code (in C).
I have an array of positions like this: int pos[] = {0, 45, 90, 135, 180, 135, 90, 45};
These positions are used to move a servo motor.
45 90 135
\ | /
\ | /
\ | /
0 ----------- 180
In main loop() I check distance from an obstacle, and if it's < xx Cm my servo must rotate to next step (next array position) until it finds a free way ( > xx Cm ).
My main is easy:
int main (int argc, const char * argv[]) { for (;;) find(); }
and my core function (find) is this:
void find() {
for ( i=0; i<sizeof(pos); i++ ) // Traversing position array
{
distance = rand() % 7; // Simulate obstacle distance
move( pos[i] ); // Simulate movements
if (i==sizeof(pos)) { i=1; } // Try to reset the "i" counter. PROBLEM!
if ( distance<=5 ) continue; // Is there an obstacle?
sleep(2); // Debug sleep
find(); // Similar recursion
}
}
I don't know what is wrong in this code, but I need to move servo until is there not an obstacle.
Example:
At position 90 I find an obstacle. I want to loop array from left to right and viceversa controlling distance every step. If I don't find a freeway, print("ko") else print("ok").
How do I fix this code to work correctly?

You really want i < sizeof(pos) / sizeof(*pos) rather than i < sizeof(pos). The size of an array is not the number of its elements but rather the total byte count it occupies in memory.
sizeof(pos) yields 8 * sizeof(int). If an int is 4 bytes, you are looping 32 times instead of 8.
Also, i == sizeof(pos) will never be true in the body of the loop because the condition of the for statement limits i to sizeof(pos) - 1.

If I understand your question correctly, you want the servo to make a sweeping movement from left to right and then back from right to left. Measuring the distance to an object that can be in front of the robot at each angle. If there is a free way ahead of the robot, the find method returns.
int pos[] = {0, 45, 90, 135, 180, -1};
void find()
{
int i = 0;
int direction = 1;
do {
move(pos[i]);
i += direction;
if (pos[i+direction] == -1) direction = -1;
if (i==0) direction = 1;
} while(measure_distance() <= 5);
}
Instead of recursion, there is a while loop that only exits when there is a distance greater then 5.
The 'pos' array has a sentinel at the end (-1). This is an invalid angle and can be used to find the end of the array. There is no need to calculate the number of elements.
The left-right, right-left movement comes from using the 'direction' variable. It is rather easy to detect either the beginning (i==0) or the end of the 'pos' array (pos[i+1] == -1), at which point we reverse the direction.
There is also no need to repeat the angles after 180 degrees. The sequence we get is:
0 45 90 135 180 135 90 45 0 45 90 ...
We can even reduce the code with one line...
...
if (pos[i+direction] == -1 || i == 0) direction *= -1;
...
Cheers,
Johan

Don't forget to initialize your rand function using
/* initialize random seed: */
srand ( time(NULL) );
distance = rand() % 7;

It's probably good practise to say:
#define POSLENGTH 8
and then iterate using i<POSLENGTH: as others have pointed out using sizeof(pos) is probably not going to work.
Also, arrays in C are 0 based: elements go 0,1,2,3,...n-1.
So, you need to say:
if (i==POSLENGTH-1) i=0;

Try using a while loop instead of a for loop. Increment the value when there is no obstacle and break when you find an obstruction:
{
......
......
i = rand()%7;
move( pos[i]);
if (i<5)
break;
else
continue;
.......
.......
}
This will randomly choose the position until you get an obstacle and also there will be no need to reset it as the loop will automatically break on encountering an obstacle.

Related

implementing object detection like openCV

I'm trying to implement the Viola-Jones algorithm for object detection using Haar cascades (like openCV's implementation) in C, to detect faces. I writing the C code in a Vivado HLS compatible way, so I can port the the implementation to an FPGA. My main goal is to learn as much as possible, rather than just getting it to work. I would also appreciate any help with improving my question.
I basically started reading G. Bradski's Learning openCV, watched some online tutorials and got started writing the code. Sure enough its not detecting faces and I don't know why. At this point I care more about understanding my mistakes rather than beeing able to detect faces.
My Implementation Steps
I'm not sure how much detail is appropriate, but to keep it short:
Extracting Haar cascade data from haarcascade_frontalface_default.xml to C readable structures (huge arrays)
Writing a function to create an integral image of any given 8bit greyscale image of size 24x24 (same size as listed in the cascade)
Applying knowledge from this great post to make the necessary calculations
My Testing Scheme
Implementing a python script to detect faces using the openCV library with the same Haar cascade as mentioned above to create golden data, a detected face is cut out (ensuring 24x24 size) from the image and stored.
Stored images are converted to one dimensional C arrays, containing pixel values row-wise: img = {row0col0, row0col1, row1col0, row1col1, ... }
integral image is calculated and face detection applied
Result
Faces pass only 6 from 25 stages of the Haar cascade and are therefore not detected by my implementation, where I know they should have been detected since the python script with openCV and the same Haar cascade did indeed detect them.
My Code
/*
* This is detectFace.c
*/
#include <stdio.h>
#include "detectFace.h"
// define constants based on Haar cascade in use
// Each feature is made of max 3 rects
//#define FEAT_NO 1 // max no. of features (= 2912 for face_default.xml)
#define RECTS_IN_FEAT 3 // max no. of rect's per feature
//#define INTS_IN_RECT 5 // no. of int's needed to describe a rect
// each node has one feature (bijective relation) and three doubles
#define STAGE_NO 25 // no. of stages
#define NODE_NO 211 // no of nodes per stage, corresponds to FEAT_NO since each Node has always one feature in haarcascade_frontalface_default.xml
//#define ELMNT_IN_NODE 3 // no. of doubles needed to describe a node
// constants for frame size
#define WIN_WIDTH 24 // width = height =24
//int detectFace(int features[FEAT_NO][RECTS_IN_FEAT][INTS_IN_RECT], double stages[STAGE_NO][NODE_NO][ELMNT_IN_NODE], double stageThresh[STAGE_NO], int ii[24][24]){
int detectFace(
int ii[576],
int stageNum,
int stageOrga[25],
float stageThresholds[25],
float nodes[8739],
int featOrga[2913],
int rectangles[6383][5])
{
int passedStages = 0; // number of stages passed in this run
int faceDetected = 0; // turns to 1 if face is detected and to 0 if its not detected
// Debug:
int nodesUsed = 0; // number of floats out of nodes[] processed, use to skip to the unprocessed floats
int rectsUsed = 0; // number of rects processed
int droppedInStage0 = 0;
// loop through all stages
int i;
detectFace_label1:
for (i = 0; i < STAGE_NO; i++)
{
double tmp = 0.0; //variable to accumulate node-values, to then compare to stage threshold
int nodeNum = stageOrga[i]; // get number of nodes for this stage from stageOrga using stage index i
// loop through nodes inside each stage
// NOTE: it is assumed that each node maps to one corresponding feature. Ex: node[0] has feat[0) and node[1] has feat[1]
// because this is how it is written in the haarcascade_frontalface_default.xml
int j;
detectFace_label0:
for (j = 0; j < NODE_NO; j++)
{
// a node is defined by 3 values:
double nodeThresh = nodes[nodesUsed]; // the first value is the node threshold
double lValue = nodes[nodesUsed + 1]; // the second value is the left value
double rValue = nodes[nodesUsed + 2]; // the third value is the right value
int sum = 0; // contains the weighted value of rectangles in one Haar feature
// loop through rect's in a feature, some have 2 and some have 3 rect's.
// Each node always refers to one feature in a way that node0 maps to feature0 and node1 to feature1 (The XML file is build like that)
//int rectNum = featOrga[j]; // get number of rects for current feature using current node index j
int k;
detectFace_label2:
for (k = 0; k < RECTS_IN_FEAT; k++)
{
int x = 0, y = 0, width = 0, height = 0, weight = 0, coordUpL = 0, coordUpR = 0, coordDownL = 0, coordDownR = 0;
// a rect is defined by 5 values:
x = rectangles[rectsUsed][0]; // the first value is the x coordinate of the top left corner pixel
y = rectangles[rectsUsed][1]; // the second value is the y coordinate of the top left corner pixel
width = rectangles[rectsUsed][2]; // the third value is the width of the current rectangle
height = rectangles[rectsUsed][3]; // the fourth value is the height of this rectangle
weight = rectangles[rectsUsed][4]; // the fifth value is the weight of this rectangle
// calculating 1-Dim index for points of interest. Formula: index = width * row + column, assuming values are stored in row order
coordUpL = ((WIN_WIDTH * y) - WIN_WIDTH) + (x - 1);
coordUpR = coordUpL + width;
coordDownL = coordUpL + (height * WIN_WIDTH);
coordDownR = coordDownL + width;
// calculate the area sum according to Viola-Jones
//sum += (ii[x][y] + ii[x+width][y+height] - ii[x][y+height] - ii[x+width][y]) * weight;
sum += (ii[coordUpL] + ii[coordDownR] - ii[coordUpR] - ii[coordDownL]) * weight;
// Debug: counting the number of actual rectangles used
rectsUsed++; //
}
// decide whether the result of the feature calculation reaches the node threshold
if (sum < nodeThresh)
{
tmp += lValue; // add left value to tmp if node threshold was not reached
}
else
{
tmp += rValue; // // add right value to tmp if node threshold was reached
}
nodesUsed = nodesUsed + 3; // one node is processed, increase nodesUsed by number of floats needed to represent a node (3)¬
}
//######## at this point we went through each node in the current stage #######
// check if threshold of current stage was reached
if (tmp < stageThresholds[i])
{
faceDetected = 0; // if any stage threshold is not reached the operation is done and no face is present
// Debug: show in which stage the frame was dropped
printf("Face detection failed in stage %d \n", i);
//i = stageNum; // breaks out this loop, because i is supposed to stay smaller than STAGE_NO
}
else
{
passedStages++; // stage threshold is reached, therefore passedStages will count up
}
}
//######## at this point we went through all stages ###############################
//----------------------------------------------------------------------------------
// if the number of passed stages reaches the total number of stages, a face is detected
if (passedStages == stageNum)
{
faceDetected = 1; // one symbolizes that the input is a face
}
else
{
faceDetected = 0; // zero symbolizes that the input is not a face
};
return faceDetected;
}

Arrays and for loops trouble

I am doing an array exercise and I almost finished it.I have trouble finishing the last part.I create two arrays that store coursework points and exam points and then using a third array I calculate the module result(it is determined by both exam and coursework points). I got this part working and assuming I have 5 modules the output is 5 numbers.However I want to calculate my stage mark so if I have 5 modules I get their marks,add them together and then divide them by 5.Here is my problem I am using for loop because that is the only way it will work(as far as I know) so given that I already have my module result I use this for loop to calculate the stage result:
for(int i = 0; i < module_result.length; i++)
{
sum = sum + module_result[i];
System.out.println(sum/5);
}
I saw in this site similar question and I used the code in the answers.I can use enhanced for loop as well.
So given that coursework array={45,70,60,55,80} and exam array={83,72,45,25,89} my module results are 64,71,60,87. By using the above for loop I get anticipated outcome:
10
22
32
37
52
So I get my result. It is 52. But I don't want the rest of the numbers.
My question is how can I get just that number(52). I guess it is not possible by using for loop because it will inevitably is going to loop 5 times not one. I thought about using while loop but I don't see how I will get much different outcome.

I'm not sure if I totally understand the question, but I think this is what you're going for:
for(int i = 0; i < module_result.length; i++)
{
sum = sum + module_result[i];
}
System.out.println(sum/5);
All you have to do is simply move the println statement outside of the loop (if I understand the question correctly).

If you just want to print out the last number, just do a condition in the for loop that would print out at the index just before the length like this:
for (int i = 0; i < module_result.length; i++) {
if (i == module_result.length - 1) {
// print results
}
else {
// Do calculations
}
}
OK here is my code.
public int[] computeResult(int []courseWork,int[] examMarks ){
int[] module_result = new int[6];
for(int i=0;i<module_result.length;i++){//CALCULATE EACH MODULE
module_result[i]= ((courseWork[i] * cw[i]) + (examMarks[i] * (100 - cw[i]))) / 100;//THIS LINE IS SIMPLY A CONDITION HOW TO CALCULATE A MODULE YOU DON'T NEED TO KNOW WHAT IS HAPPENING INSIDE
}
int sum = 0;
for(int i = 0; i < module_result.length; i++)//USE THIS FOR LOOP TO ADD THE MODULES TOGETHER.
{
sum = sum + module_result[i];
// Add this extra line
// This allows you to only print out the value when you reach the end
if (i == module_result.length - 1) {
System.out.println(sum/6);
}
}
return module_result;
}
However here is what happens-the first for loop calculate the module results.Say they are as follows in the output console:
64
71
60
31
87
33
Next the second for loop is adding them together-first is 64,next loop at 71 to 64 and you get 135,next add to 135 the next module result 60 and so on until I get the total sum of all 6 modules which is just in this example 346 and then divide it by 6 to get my stage result.So I need in my output console just 346/6.Nothing else no zeros no nothing.
What my current code does is this-the second loop star running,it already knows my module results(they have been calculated) and so it starts- the first one is 64,divide it by 6 I get outcome 10,then the loop add 71 to 64 get 135 and divide it by 6 and so on until it reaches the number 346 and divide it by 6.So I get this output:
10
22
32
37
52
57
I don't need 10,22,32,37 and 52 they hold no meaning.I just need 57.What your solution will give me is this outcome:
0
0
0
0
0
57
It still gives unnecessary numbers.

Logic Help: Calculating primes of a range of numbers with multiple threads

I've been tasked to calculate prime numbers using multiple threads. The user provides a range of numbers, and how many threads to use to find the prime numbers.
So for example, the range is 2 to 100 inclusive and we want to use 3 threads.
What I want to happen is the following:
Thread 1 found 2 3 5 7 11 13 17 19 23 29 31
Thread 2 found 37 41 43 47 53 59 61 67
Thread 3 found 71 73 79 83 89 97
Each thread is going to need to be supplied a start range and an end range. I have determined that a data structure would be the most appropriate for this, such as:
typedef struct Data{
int start;
int end;
}Data;
I can't figure out how to create these sub-ranges. I know I'll have to divide the max end range by how many threads we want, and getting the first threads range is easy but I'm lost after that. This seems like a simple issue that I am making entirely too complicated.
This is where I'm at:
int start, end, nthreads, div;
start = atoi(argv[1]);
end = atoi (argv[2]);
nthreads = atoi (argv[3]);
div = end/nthreads;
for (int i = 0; i < nthreads; i++){
if (i == 0){
//first iteration, start is start and end is the div
data.start = start;
data.end = div;
}
else{
//this is where I am lost
data.start = ??
data.end = div*i;
}
//create the thread with our data
//calculations is handled in threadCreate()
pthread_create(&tid, NULL, threadCreate, (void*)data);
}

From your code, you want to remove the if (i==0) and simply set this:
data.start = start + div * i
data.end = start + div * (i + 1) - 1;
Be cautious though, you may want to redefine div as (end-start) / nbthreads.
You will then quickly realize that some ranges are harder to find prime numbers.
I would recommend using OpenMP which is made exactly for this kind of usage. This would simplify your threading by automatically splitting your for loop. This would allow you to easily use a dynamic task assignation scheme which will automatically balance thread assignation of easier and harder ranges in order to end the computation faster.

Note that in your else you are putting
data.end=div*i;
but you initialized you i with 0 so when else statement would be executed for the first time then data.end will have the value equals to div. And take
div = (end-start)/nthreads;
Try something like
else{
data.start = (div)*(i))+1;
data.end = data.start+div-1;
}

Depends on how you want to go about it:
a) split into even-sized blocks
b) split into interleaved blocks
e.g. start = 10, end = 90, n = 3 threads
you could either go:
thread1 = 10,11,12,13,...
thread2 = 40,41,42,43,...
thread3 = 70,71,72,73,...
or
thread1 = 10, 13, 16, 19, ...
thread2 = 11, 14, 17, 20, ...
thread3 = 12, 15, 18, 21, ...

My OpenCL code changes the output based on a seemingly noop

I'm running the same OpenCL kernel code on an Intel CPU and on a NVIDIA GPU and the results are wrong on the first but right on the latter; the strange thing is that if I do some seemingly irrelevant changes the output works as expected in both cases.
The goal of the function is to calculate the matrix multiplication between A (triangular) and B (regular), where the position of A in the operation is determined by the value of the variable left. The bug only appears when left is true and when the for loop iterates at least twice.
Here is a fragment of the code omitting some bits that shouldn't affect for the sake of clarity.
__kernel void blas_strmm(int left, int upper, int nota, int unit, int row, int dim, int m, int n,
float alpha, __global const float *a, __global const float *b, __global float *c) {
/* [...] */
int ty = get_local_id(1);
int y = ty + BLOCK_SIZE * get_group_id(1);
int by = y;
__local float Bs[BLOCK_SIZE][BLOCK_SIZE];
/* [...] */
for(int i=start; i<end; i+=BLOCK_SIZE) {
if(left) {
ay = i+ty;
bx = i+tx;
}
else {
ax = i+tx;
by = i+ty;
}
barrier(CLK_LOCAL_MEM_FENCE);
/* [...] (Load As) */
if(bx >= m || by >= n)
Bs[tx][ty] = 0;
else
Bs[tx][ty] = b[bx*n+by];
barrier(CLK_LOCAL_MEM_FENCE);
/* [...] (Calculate Csub) */
}
if(y < n && x < (left ? row : m)) // In bounds
c[x*n+y] = alpha*Csub;
}
Now it gets weird.
As you can see, by always equals y if left is true. I checked (with some printfs, mind you) and left is always true, and the code on the else branch inside the loop is never executed. Nevertheless, if I remove or comment out the by = i+ty line there, the code works. Why? I don't know yet, but I though it might be something related to by not having the expected value assigned.
My train of thought took me to check if there was ever a discrepancy between by and y, as they should have the same value always; I added a line that checked if by != y but that comparison always returned false, as expected. So I went on and changed the appearance of by for y so the line
if(bx >= m || by >= n)
transformed into
if(bx >= m || y >= n)
and it worked again, even though I'm still using the variable by properly three lines below.
With an open mind I tried some other things and I got to the point that the code works if I add the following line inside the loop, as long as it is situated at any point after the initial if/else and before the if condition that I mentioned just before.
if(y >= n) left = 1;
The code inside (left = 1) can be substituted for anything (a printf, another useless assignation, etc.), but the condition is a bit more restrictive. Here are some examples that make the code output the correct values:
if(y >= n) left = 1;
if(y < n) left = 1;
if(y+1 < n+1) left = 1;
if(n > y) left = 1;
And some that don't work, note that m = n in the particular example that I'm testing:
if(y >= n+1) left = 1;
if(y > n) left = 1;
if(y >= m) left = 1;
/* etc. */
That's the point where I am now. I have added a line that shouldn't affect the program at all but it makes it work. This magic solution is not satisfactory to me and I would like to know what's happening inside my CPU and why.
Just to be sure I'm not forgetting anything, here is the full function code and a gist with example inputs and outputs.
Thank you very much.
Solution
Both users DarkZeros and sharpneli were right about their assumptions: the barriers inside the for loop weren't being hit the right amount of times. In particular, there was a bug involving the very first element of each local group that made it run one iteration less than the rest, provoking an undefined behaviour. It was painfully obvious to see in hindsight.
Thank you all for your answers and time.

Have you checked that the get_local_size always returns the correct value?
You said "In short, the full length of the matrix is divided in local blocks of BLOCK_SIZE and run in parallel; ". Remember that OpenCL allows any concurrency only within a workgroup. So if you call enqueueNDrange with global size of [32,32] and local size of [16,16] it is possible that the first thread block runs from start to finish, then the second one, then third etc. You cannot synchronize between workgroups.
What are your EnqueueNDRange call(s)? Example of the calls required to get your example output would be heavily appreciated (mostly interested in the global and local size arguments).
(I'd ask this in a comment but I am a new user).
E (Had an answer, upon verification did not have it, still need more info):
http://multicore.doc.ic.ac.uk/tools/GPUVerify/
By using that I got a complaint that a barrier could be reached by a nonuniform control flow.
It all depends on what values dim, nota and upper get. Could you provide some examples?
I did some testing. Assuming left = 1. nota != upper and dim = 32, row as 16 or 32 or whatnot, still worked and got the following result:
...
gid0: 2 gid1: 0 lid0: 14 lid1: 13 start: 0 end: 32
gid0: 2 gid1: 0 lid0: 14 lid1: 14 start: 0 end: 32
gid0: 2 gid1: 0 lid0: 14 lid1: 15 start: 0 end: 32
gid0: 2 gid1: 0 lid0: 15 lid1: 0 start: 0 end: 48
gid0: 2 gid1: 0 lid0: 15 lid1: 1 start: 0 end: 48
gid0: 2 gid1: 0 lid0: 15 lid1: 2 start: 0 end: 48
...
So if my assumptions about the variable values are even close to correct you have barrier divergence issue there. Some threads encounter a barrier which another threads never will. I'm surprised it did not deadlock.

The first thing I see it can terribly fail, is that you are using barriers inside a for loop.
If all the threads do not enter the same amount of times the for loop. Then the results are undefined completely. And you clearly state the problem only occurs if the for loop runs more than once.
Do you ensure this condition?

How to choose a (valid) random adjacent point in an array of integers in C?

Assume that we have an array of integers (3x3) depicted as follows:
+-+-+-+
| |1| |
+-+-+-+
|0|x|1|
+-+-+-+
| |0| |
+-+-+-+
(0,1) above is set to 1 and (1,0) is 0 etc.
Now assume that I find myself at (1,1) (at x), what would be the easiest method for me to come up with all the directions I can take (say all that have the value 0) and then among those choose one?
What I'm having trouble with is actually the step between choosing all valid directions and then choosing among those. I can do the two steps seperately fairly easily but I don't have a solution that feels elegant which combines the two.
E.g. I can multiply the value of each cell by a value representing 1,2,4 and 8 and or them all together. this would give me what directions I can take, but how to choose between them? Also I can easily randomize a number between 1 and 4 to choose a direction but if that direction is "taken" then I have to randomize again but excluding the direction that failed.
Any ideas?

The fastest solution is likely the last one you posted -- choose directions randomly, repeating until you get a valid one. That will take at most four tries (the worst case is when there is only one valid neighbor). Something more elegant is to iterate through all possible directions, updating a variable randomly at each valid neighbor, such as this pseudocode:
c = 1
r = invalid
for i in neighbors:
if (valid[i]):
if (rand() <= 1. / c): r = i
++c
and then r is the answer (c is the number of valid neighbors found so far).

Here's a very neat trick in pseudocode
Initialise your "current result" to nil
Initialise a "number found" to 0
Loop through all the possible directions. If it is valid then:
increment "number found"
set "current result" to the direction with probability 1/"number found"
At the end of this, you will have a valid direction (or nil if not found). If there are multiple valid directions, they will all be chosen with equal probability.

Assumed:
Each location has a set number of valid target locations (some location may have fewer valid targets, a chess-knight has fewer valid targets when placed in a corner than when in the middle of the board.)
You want to pick a random target from all available, valid moves.
Algorithm:
Create a bit-array with one one bit representing each valid target. (In the original example you would create a four bit array.)
For each valid target determine if the location is empty; set the corresponding bit in the bit-array to 1 if empty.
If bit-array > 0 then number_of_targets = SUM(bit-array), else return(No Valid Moves).
Pick random number between 1 and number_of_targets.
return(the location associated with the nth set bit in the bit-array)
Example using the the original question:
X has four valid moves. We create a 4-bit array and fill it in with '1' for each empty location; starting with the cell directly above and moving in a clockwise direction we end up with :0:0:1:1:
The sum of bits tells us we have two places we can move. Our random selection will choose either '1' or '2'. We move through the bit-array until we find the nth set bit and move to that location.
This algorithm will work for any system with any number of valid targets (not limited to 2-D). You can replace the Random number selector with a function that recursively returns the best move (MIN-MAX algorithm.)

A slighly contrived way might be this (pseudo-code):
Build the bit-mask as you describe, based on which neighbors are open.
Use that bit-mask as the index into an array of:
struct RandomData
{
size_t num_directions;
struct { signed int dx, dy; } deltas[4];
} random_data[16];
where num_directions is the number of open neighbors, and deltas[] tells you how to get to each neighbor.
This has a lot of fiddly data, but it does away with the looping and branching.
UPDATE: Okay, for some reason I had problems letting this idea go. I blame a certain amount of indoctrination about "data-driven programming" at work, since this very simple problem made me "get" the thought of data-driven-ness a bit better. Which is always nice.
Anyway, here's a complete, tested and working implementation of the random-stepping function using the above ideas:
/* Directions are ordered from north and going clockwise, and assigned to bits:
*
* 3 2 1 0
* WEST | SOUTH | EAST | NORTH
* 8 4 2 1
*/
static void random_walk(unsigned int *px, unsigned int *py, unsigned max_x, unsigned int max_y)
{
const unsigned int x = *px, y = *py;
const unsigned int dirs = ((x > 0) << 3) | ((y < max_y) << 2) |
((x < max_x) << 1) | (y > 0);
static const struct
{
size_t num_dirs;
struct { int dx, dy; } deltas[4];
} step_info[] = {
#define STEP_NORTH { 0, -1 }
#define STEP_EAST { 1, 0 }
#define STEP_SOUTH { 0, 1 }
#define STEP_WEST { -1, 0 }
{ 0 },
{ 1, { STEP_NORTH } },
{ 1, { STEP_EAST } },
{ 2, { STEP_NORTH, STEP_EAST } },
{ 1, { STEP_SOUTH } },
{ 2, { STEP_NORTH, STEP_SOUTH } },
{ 2, { STEP_EAST, STEP_SOUTH } },
{ 3, { STEP_NORTH, STEP_EAST, STEP_SOUTH } },
{ 1, { STEP_WEST } },
{ 2, { STEP_NORTH, STEP_WEST } },
{ 2, { STEP_EAST, STEP_WEST } },
{ 3, { STEP_NORTH, STEP_EAST, STEP_WEST } },
{ 2, { STEP_SOUTH, STEP_WEST } },
{ 3, { STEP_NORTH, STEP_SOUTH, STEP_WEST } },
{ 3, { STEP_EAST, STEP_SOUTH, STEP_WEST } },
{ 4, { STEP_NORTH, STEP_EAST, STEP_SOUTH, STEP_WEST } }
};
const unsigned int step = rand() % step_info[dirs].num_dirs;
*px = x + step_info[dirs].deltas[step].dx;
*py = y + step_info[dirs].deltas[step].dy;
}
int main(void)
{
unsigned int w = 16, h = 16, x = w / 2, y = h / 2, i;
struct timeval t1, t2;
double seconds;
srand(time(NULL));
gettimeofday(&t1, NULL);
for(i = 0; i < 100000000; i++)
{
random_walk(&x, &y, w - 1, h - 1);
}
gettimeofday(&t2, NULL);
seconds = (t2.tv_sec - t1.tv_sec) + 1e-6 * (t2.tv_usec - t1.tv_usec);
printf("Took %u steps, final position is (%u,%u) after %.2g seconds => %.1f Msteps/second\n", i, x, y, seconds, (i / 1e6) / seconds);
return EXIT_SUCCESS;
}
Some explanations might be in order, the above is pretty opaque until you "get" it, I guess:
The interface to the function itself should be clear. Note that width and height of the grid are represented as "max_x" and "max_y", to save on some constant-subtractions when checking if the current position is on the border or not.
The variable dirs is set to a bit-mask of the "open" directions to walk in. For an empty grid, this is always 0x0f unless you're on a border. This could be made to handle walls by testing the map, of course.
The step_info array collects information about which steps are available to take from each of the 16 possible combinations of open directions. When reading the initializations (one per line) of each struct, think of that struct's index in binary, and convert that to bits in dirs.
The STEP_NORTH macro (and friends) cut down on the typing, and make it way clearer what's going on.
I like how the "meat" of random_walk() is just four almost-clear expressions, it's refreshing to not see a single if in there.
When compiled with gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 on my 2.4 GHz x86_64 system, using optimization level -O3, the performance seems to be just short of 36 million steps per second. Reading the assembly the core logic is branch-free. Of course there's a call to rand(), I didn't feel like going all the way and implementing a local random number generator to have inlined.
NOTE: This doesn't solve the exact question asked, but I felt the technique was worth expanding on.