Manhattan distance is over estimating and making me crazy - c

I'm implementing a-star algorithm with Manhattan distance to solve the 8-puzzle (in C). It seems to work very well and passes a lot of unit tests but it fails to find the shortest path in one case (it finds 27 steps instead of 25).
When I change the heuristic function to Hamming distance it finds in 25 steps.
Also finds in 25 steps when I make the Manhattan distance function to return a half of the actual cost.
That's why I believe the problem lies somewhere in Manhattan distance function and it is over estimating the cost (hence inadmissible). I thought maybe something else is going wrong in the C program so I wrote a little Python script to test and verify the output of the Manhattan distance function only and they both produce the exact same result.
I'm really confused because the heuristic function seems to be the only point of failure and it seems to be correct at the same time.
You can try this solver and put the tile order like "2,6,1,0,7,8,3,5,4"
Choose the algorithm Manhattan distance and it finds in 25 steps.
Now change it to Manhattan distance + linear conflict and it finds 27 steps.
But my Manhattan distance (without linear conflict) finds in 27 steps.
Here's my general algorithm:
manhattan_distance = 0
iterate over all tiles
if the tile is not the blank tile:
find the coordinates of this tile on the goal board
manhattan_distance += abs(x - goal_x) + abs(y - goal_y)
I think if there was something very badly wrong with some important part it wouldn't pass all 25+ previous tests so this might be some sort of edge case.
Here's commented Manhattan distance function in C:
int ManhattanDistance(Puzzle p, State b){
State goal = getFinalState(p);
int size = getSize(b);
int distance = 0;
if (getSize(goal) == size){ // both states are the same size
int i, j;
for(i=0; i<size; i++){
for(j=0; j<size; j++){ // iterate over all tiles
int a = getStateValue(b, i, j); // what is the number on this tile?
if (a != 'B'){ // if it's not the blank tile
int final_cordinates[2];
getTileCoords(goal, a, final_cordinates); // find the coordinates on the other board
int final_i = final_cordinates[0];
int final_j = final_cordinates[1];
distance += abs(i - final_i) + abs(j - final_j);
}
}
}
}
return distance;
}
Please help me.
EDIT: As discussed in comments, the code provided for opening nodes can be found here

The problem seems to be not in your heuristic function, but in the algorithm itself. From your description of the problem, and the fact that it occures only on some specific cases, I believe it has to do with the re-opening of a closed vertice, once you find a better path to it.
While reading the code you have provided [in comments], I think I understood where the problem lays, in line 20:
if(getG(current) + 1 < getG(children[i])){
This is wrong! You are checking if g(current) + 1 < g(children[i]), you actually want to check for: f(current) + 1 + h(children[i]) < g(children[i]), since you want to check this value with the heuristic function of children[i], and not of current!
Note that it is identical as to set f(children[i]) = min{f(children[i]),f(current)+1}, and then adding h(children[i]) to get the g value.

Related

Neural network for linear regression

I found this great source that matched the exact model I needed: http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
The important bits go like this.
You have a plot x->y. Each x-value is the sum of "features" or how I'll denote them, z.
So a regression line for the x->y plot would go h(SUM(z(subscript-i)) where h(x) is the regression line (function)
In this NN the idea is that each z-value gets assigned a weight in a way that minimizes the least squared error.
The gradient function is used to update weights to minimize error. I believe I may be back propagating incorrectly -- where I update the weights.
So I wrote some code, but my weights aren't being correctly updated.
I may have simply misunderstood a spec from that Stanford post, so that's where I need your help. Can anyone verify I have correctly implemented this NN?
My h(x) function was a simple linear regression on the initial data. In other words, the idea is that the NN will adjust weights so that all data points shift closer to this linear regression.
for (epoch = 0; epoch < 10000; epoch++){
//loop number of games
for (game = 1; game < 39; game++){
sum = 0;
int temp1 = 0;
int temp2 = 0;
//loop number of inputs
for (i = 0; i < 10; i++){
//compute sum = x
temp1 += inputs[game][i] * weights[i];
}
for (i = 10; i < 20; i++){
temp2 += inputs[game][i] * weights[i];
}
sum = temp1 - temp2;
//compute error
error += .5 * (5.1136 * (sum) + 1.7238 - targets[game]) * (5.1136 * (sum) + 1.7238 - targets[game]);
printf("error = %G\n", error);
//backpropogate
for (i = 0; i < 20; i++){
weights[i] = sum * (5.1136 * (sum) + 1.7238 - targets[game]); //POSSIBLE ERROR HERE
}
}
printf("Epoch = %d\n", epoch);
printf("Error = %G\n", error);
}
Please check out Andrew Ng's Coursera. He is the professor of Machine Learning at Stanford and can explain the concept of Linear Regression to you better than any pretty much anyone else. You can learn the essentials for linear regression in the first lesson.
For linear regression, you are trying to minimize the cost function, which in this case is the sum of squared errors (predicted value - actual value)^2 and is achieved by gradient descent. Solving a problem like this does not require a Neural Network and using one would be rather inefficient.
For this problem, only two values are needed. If you think back to the equation for a line, y = mx + b, there are really only two aspects of a line that you need: The slope and the y-intercept. In linear regression you are looking for the slope and y-intercept that best fits the data.
In this problem, the two values can be represented by theta0 and theta1. theta0 is the y-intercept and theta1 is the slope.
This is the update function for Linear Regression:
Here, theta is a 2 x 1 dimensional vector with theta0 and theta1 inside of it. What you are doing is taking theta and subtracting the mean of the sum of errors multiplied by a learning rate alpha (usually small, like 0.1).
Let's say the real perfect fit for the line is at y = 2x + 3, but our current slope and y-intercept are both at 0. Therefore, the sum of errors will be negative, and when theta is subtracted from a negative number, theta will increase, moving your prediction closer to the correct value. And vice versa for positive numbers. This is a basic example of gradient descent, where you are descending down a slope to minimize the cost (or error) of the model.
This is the type of model you should be trying to implement in your model instead of a Neural Network, which is more complex. Try to gain an understanding of linear and logistic regression with gradient descent before moving on to Neural Networks.
Implementing a linear regression algorithm in C can be rather challenging, especially without vectorization. If you are looking to learn about how a linear regression algorithm works and aren't specifically looking to use C to make it, I recommend using something like MatLab or Octave (a free alternative) to implement it instead. After all, the examples from the post you found use the same format.

Generating a connected graph and checking if it has eulerian cycle

So, I wanted to have some fun with graphs and now it's driving me crazy.
First, I generate a connected graph with a given number of edges. This is the easy part, which became my curse. Basically, it works as intended, but the results I'm getting are quite bizarre (well, maybe they're not, and I'm the issue here). The algorithm for generating the graph is fairly simple.
I have two arrays, one of them is filled with numbers from 0 to n - 1, and the other is empty.
At the beginning I shuffle the first one move its last element to the empty one.
Then, in a loop, I'm creating an edge between the last element of the first array and a random element from the second one and after that I, again, move the last element from the first array to the other one.
After that part is done, I have to create random edges between the vertexes until I get as many as I need. This is, again, very easy. I just random two numbers in the range from 0 to n - 1 and if there is no edge between these vertexes, I create one.
This is the code:
void generate(int n, double d) {
initMatrix(n); // <- creates an adjacency matrix n x n, filled with 0s
int *array1 = malloc(n * sizeof(int));
int *array2 = malloc(n * sizeof(int));
int j = n - 1, k = 0;
for (int i = 0; i < n; ++i) {
array1[i] = i;
array2[i] = 0;
}
shuffle(array1, 0, n); // <- Fisher-Yates shuffle
array2[k++] = array1[j--];
int edges = d * n * (n - 1) * .5;
if (edges % 2) {
++edges;
}
while (j >= 0) {
int r = rand() % k;
createEdge(array1[j], array2[r]);
array2[k++] = array1[j--];
--edges;
}
free(array1);
free(array2);
while (edges) {
int a = rand() % n;
int b = rand() % n;
if (a == b || checkEdge(a, b)) {
continue;
}
createEdge(a, b);
--edges;
}
}
Now, if I print it out, it's a fine graph. Then I want to find a Hammiltonian cycle. This part works. Then I get to my bane - Eulerian cycle. What's the problem?
Well, first I check if all vertexes are even. And they are not. Always. Every single time, unless I choose to generate a complete graph.
I now feel destroyed by my own code. Is something wrong? Or is it supposed to be like this? I knew that Eulerian circuits would be rare, but not that rare. Please, help.
Let's analyze the probability for having euleran cycle, and for simplicity - let's do it for all graphs with n vertices, no matter number of edges.
Given a graph G of size n, choose one arbitrary vertex. The probability of it's degree being even is roughly 1/2 (assuming for each u1,u2, P((v,u1) exists) = P((v,u2) exists)).
Now, remove v from G, and create a new graph G' with n-1 vertices, and without all edges connected to v.
Similarly, for any arbitrary vertex v' in G' - if (v,v') was an edge on G', we need d(v') to be odd. Otherwise, we need d(v') to be even (both in G'). Either way, probability of it is still roughly ~1/2. (independent from previous degree of v).
....
For the ith round, let #(v) be the number of discarded edges until reaching the current graph that are connected to v. If #(v) is odd, the probability of its current degree being odd is ~1/2, and if #(v) is even, the probability of its current degree being even is also ~1/2, and we remain with current probability of ~1/2
We can now understand how it works, and make a recurrence formula for the probability of the graph being eulerian cyclic:
P(n) ~= 1/2*P(n-1)
P(1) = 1
This is going to give us P(n) ~= 2^-n, which is very unlikely for reasonable n.
Note, 1/2 is just a rough estimation (and is correct when n->infinity), probability is in fact a bit higher, but it is still exponential in -n - which makes it very unlikely for reasonable size graphs.

Feasibility of non-self-intersecting path according to array constraints

I have two arrays, each containing a different ordering of the same set of integers. Each integer is a label for a point in which two closed paths intersect in the plane. The two arrays are interpreted as giving the circular ordering (in clockwise order) of points along each of two closed paths in the plane, with no particular starting point. The two paths intersect with each other as many times as there are points in the arrays, but a path may not self-intersect at all. How do I determine, from these two arrays, whether it is possible to draw the two paths in the plane without self-crossings? (The integer labels have no inherent meaning.)
Example 1: A = {3,4,2,1,10,7} and B = {1,2,4,10,7,3}: it is possible
Example 2: A = {2,3,0,10,8,11} and B = {10,2,3,8,11,0}: it is not possible.
Try it by drawing a circle, with 6 points labelled around it according to A, then attempt to connect the 6 points in a second closed path, according to the ordering in B, without crossing the new line you are drawing. (I believe it makes no difference to the possibility/impossibility of drawing the line whether you start by exiting or entering the first loop.) You will be able to do it for example 1, but not for example 2.
I am currently using a very elaborate method where I look at adjacent pairs in one array, e.g. in Example 1, array A is divided into {3,4}, {2,1}, {10,7}, then I find the groupings in the array B as partitioned by the two members listed in each case:
{3,4} --> {{1,2}, {10,7}}
{2,1} --> {{4,10,7,3}, {}}
{10,7} --> {{3,1,2,4}, {}}
and check that each pair on the left-hand-side finds itself in the same grouping of the right-hand-side partition in each of the other 2 rows. Then I do the same, offset by one position:
{4,2} --> {{10,7,3,1}, {}}
{1,10} --> {{2,4}, {7,3}}
{7,3} --> {{1,2,4,10}, {}}
Everything checks out here.
In Example 2, though, the method shows that it is impossible to draw the path. Among the "offset by 1" pairs from array A we find {10,8} causes a partition of array B into {{2,3}, {11,0}}. But we need 11 and 2 to be in the same grouping, as they are the next pair of points in array A.
This idea is unwieldy, and my implementation is even more unwieldy. I'm not even 100% convinced it always works. Could anyone suggest an algorithm for deciding? Target language is C, if that matters.
EDIT: I've added an illustration here: http://imgur.com/TS8xDIk. Here the paths to be reconciled share points 0, 1, 2 and 3. On the black path they are visited in order (A = {0,1,2,3}). On the blue path we have B = {0,2,1,3}. You can see on the left-hand side that this is impossible--the blue path will have to self-intersect in order to do it (or have additional intersections with the black path, which is also not allowed).
On the right-hand side is an illustration of the same problem interpreted as a graph with edges, responding to the suggestion that the problem boils down to a check for planarity. Well, as you can see, it's quite possible to form a planar graph from this collection of edges, but we cannot read the graph as two closed paths with n intersections--the blue path has "intersections" with the other path that don't actually cross. The paths are required to cross from inside to outside or vice-versa at each node, they cannot simply kiss and turn back.
I hope this clarifies the problem and I apologise for any lack of clarity the first time around.
By the way introducing coordinates would be a complete red herring: any point can be given any coordinates, and the problem remains the same. In a sense it is topological more than geometrical. Thanks for any additional suggestions on how to accomplish this feasibility check.
SECOND EDIT to show my current code. Like in the suggestion below by svinja, I first reduced the two arrays to a permutation of 0..2n-1. The input to the function is two arrays (which contain different orderings of the same 2n integers) and the length of these arrays. I am a hobbyist with no training in programming so I expect you will find several infelicities in the approach to coding. The idea is to return 1 if the arrays A and B are in a permutational relationship that allows the path to be drawn, and 0 if not.
int isGoodPerm(int A[], int B[], int len)
{
int i,j,a,b;
int P[max_len];
for (i=0; i<len; i++)
for (j=0; j<len; j++)
if (B[j] == A[i])
{
P[i] = j;
break;
}
for (i=0; i<len; i++)
{
if (P[i] < P[(i+1)%len])
{
a = P[i];
b = P[(i+1)%len];
}
else
{
a = P[(i+1)%len];
b = P[i];
}
for (j=i+2; j<i+len; j+=2)
if ((P[j%len] > a && P[j%len] < b) != (P[(j+1)%len] > a && P[(j+1)%len] < b))
return 0;
}
return 1;
}
I'm actually still testing another part of this project, and have only tested this part in isolation. I tweaked a couple of things when pasting it into the larger codebase and have copied that version--I hope I didn't introduce any errors.
I think the long question is hiding the true intent. I might be missing something, but it looks like the only thing you really need to check is if the points in an array can be drawn without self-intersecting. I'm assuming you can map the integers to the actual coordinates. If so, you might find the solution posed by the related math.statckexchange site here describing either the determinant-based method or the Bentley-Ottman algorithm for crossings to be helpful.
I am not sure if this is correct, but as nobody is posting an answer, here it is:
We can convert any instance of this problem to one where the first path is (0, 1, 2, ... N). In your example 2, this would be (0, 1, 2, 3, 4, 5) and (3, 0, 1, 4, 5, 2). I only mention this because I do this conversion in my code to simplify further code.
Now, imagine the first path are points on a circle. I think we can assume this without loss of generality. I also assume we can start the second path either inside or outside of the circle, if one works the other should, too. If I am wrong about either, the algorithm is certainly wrong.
So we always start by connecting the first and second point of the second path on the, let's say, outside. If we connect 2 points X and Y which are not right next to each other on the circle, we divide the remaining points into group A - the ones from X to Y clockwise, and group B - the ones from Y to X clockwise. Now we remember that points from group A can no longer be connected to points from group B on the outside part.
After this, we continue connecting the second and third point of the second path, but we are now on the inside. So we check "can we connect X and Y on the inside?" if we can't, we return false. If we can, we again find groups A and B and remember that none of them can be connected to each other, but now on the inside.
Now we're back on the outside, and we connect the third and fourth point of the second path... And so on.
Here is an image that shows how it works, for your examples 1 and 2:
And here is the code (in C#, but should be easy to translate):
static bool Check(List<int> path1, List<int> path2)
{
// Translate into a problem where the first path is (0, 1, 2, ... N}
var path = new List<int>();
foreach (var path2Element in path2)
path.Add(path1.IndexOf(path2Element));
var N = path.Count;
var blocked = new bool[N, N, 2];
var subspace = 0;
var currentElementIndex = 0;
var nextElementIndex = 1;
for (int step = 1; step <= N; step++)
{
var currentElement = path[currentElementIndex];
var nextElement = path[nextElementIndex];
// If we're blocked before finishing, return false
if (blocked[currentElement, nextElement, subspace])
return false;
// Mark appropriate pairs as blocked
for (int i = (currentElement + 1) % N; i != nextElement; i = (i + 1) % N)
for (int j = (nextElement + 1) % N; j != currentElement; j = (j + 1) % N)
blocked[i, j, subspace] = blocked[j, i, subspace] = true;
// Move to the next edge
currentElementIndex = (currentElementIndex + 1) % N;
nextElementIndex = (nextElementIndex + 1) % N;
// Outside -> Inside, or Inside -> Outside
subspace = (2 - subspace) / 2;
}
return true;
}
Old answer:
I am not sure I understood this problem correctly, but if I have, I think this can be reduced to planarity testing. I will use your example 2 for the numbers:
Create graph G1 from the first array; it has edges 2-3, 3-0, 10-8, 8-11, 11-2
Create graph G2 from the second array; 10-2, 2-3, 3-8, 8-11, 11-0, 0-10
Create graph G whose set of edges is the union of the sets of edges of G1 and G2: 2-3, 3-0, 10-8, 8-11, 11-2, 10-2, 3-8, 11-0, 0-10
Check if G is planar.
This is if I correctly interpreted the question in the sense that the second path must not cross itself but must not cross the first path either (except for the unavoidable 1 intersection per vertex due to shared vertices). If this is not the case, then Example 2 does have solutions (note how the 11-2 and 8-10 edges are crossed by the second path).

Divide times in two boxes and find the minimum difference

Started to learn recursion and I am stuck with this simple problem. I believe that there are more optimized ways to do this but first I'm trying to learn the bruteforce approach.
I have bag A and bag B and have n items each one with some time (a float with two decimal places). The idea is to distribute the items by the two bags and obtain the minimum difference in the two bags. The idea is to try all possible outcomes.
I thought only in one bag (lets say bag A) since the other bag will contain all the items that are not in the bag A and therefore the difference will be the absolute value of total times sum - 2 * sum of the items time that are in the bag A.
I'm calling my recursive function like this:
min = total_time;
recursive(0, items_number - 1, 0);
And the code for the function is this:
void recursive(int index, int step, float sum) {
sum += items_time[index];
float difference = fabs(total_time - 2 * sum);
if (min > difference) {
min = difference;
}
if (!(min == 0.00 || step == 1 || sum > middle_time)) {
int i;
for (i = 0; i < items_number; i++) {
if (i != index) {
recursive(i, step - 1, sum);
}
}
}
}
Imagine I have 4 items with the times 1.23, 2.17 , 2.95 , 2.31
I'm getting the result 0.30. I believe that this is the correct result but I'm almost certain that if it is is pure change because If I try with bigger cases the program stops after a while. Probably because the recursion tree gets to bigger.
Can someone point me in some direction?
Okay, after the clarification, let me (hopefully) point you to a direction:
Let's assume that you know what n is, mentioned in n items. In your example, it was 2n is 4, making n = 2. Let's pick another n, let it be 3 this time, and our times shall be:
1.00
2.00
3.00
4.00
5.00
6.00
Now, we can already tell what the answer is; what you had said is all correct, optimally each of the bags will have their n = 3 times summed up to middle_time, which is 21 / 2 = 10.5 in this case. Since integers may never sum up to numbers with decimal points, 10.5 : 10.5 may never be achieved in this example, but 10 : 11 can, and you can have 10 through 6.00 + 3.00 + 1.00 (3 elements), so... yeah, the answer is simply 1.
How would you let a computer calculate it? Well; recall what I said at the beginning:
Let us assume that you know what n is.
In that case a naive programmer would probably simply put all those inside 2 or 3 nested for loops. 2 if he/she knew that the other half will be determined when you pick a half (by simply fixing the very first element in our group, since that element is to be included in one of the groups), like you also know; 3 if he/she didn't know that. Let's make it with 2:
...
float difference;
int i;
for ( i = 1; i < items_number; i++ ) {
sum = items_time[0] + items_time[i];
int j;
for ( j = i + 1; j < items_number; j++ ) {
sum += items_time[j];
difference = fabs( total_time - 2 * sum );
if ( min > difference ) {
min = difference;
}
}
}
...
Let me comment about the code a little for faster understanding: On the first cycle, it will add up the 0th time, the 1st time and then the 2nd time as you may see; then it will do the same check you had made (calculate the difference and compare the it with min). Let us call this the 012 group. The next group that will be checked will be 013, then 014, then 015; then 023, and so on... Each possible combination that will split the 6 into two 3s will be checked.
This operation shouldn't be any tiresome for the computer to issue. Even with this simple approach, the maximum amount of tries will be the amount of combinations of 3 you could have with 6 unique elements divided by 2. In maths, people denote this as C(6, 3), which evaluates to (6 * 5 * 4) / (3 * 2 * 1) = 20; divided by 2, so it's 10.
My guess is that the computer wouldn't make it a problem even if n was 10, making the amount of combinations as high as C(20, 10) / 2 = 92 378. It would, however, be a problem for you to write down 9 nested for loops by hand...
Anyway, the good thing is, you can recursively nest these loops. Here I will end my guidance. Since you apparently are studying for the recursion already, it wouldn't be good for me to offer a solution at this point. I can assure you that it is do-able.
Also the version I have made on my end can do it within a second for up to items_number = 22, without having made any optimizations; simply with brute force. That makes 352 716 combinations, and my machine is just a simple Windows tablet...
Your problem is called the Partition Problem. It is NP-hard and after some point, it will take a very long time to complete: the tree gets exponentially bigger as the number of cases to test grows.
The partition problem is well known and well documented over the internet. There exists some optimized solution
Your approach is not the naive brute-force approach, which would just walk through the list of items and put it into bag A and bag B recursively, chosing the case with the minimum difference, for example:
double recurse(double arr[], int n, double l, double r)
{
double ll, rr;
if (n == 0) return fabs(l - r);
ll = recurse(arr + 1, n - 1, l + *arr, r);
rr = recurse(arr + 1, n - 1, l, r + *arr);
if (ll > rr) return rr;
return ll;
}
(This code is very naive - it doesn't quite early on clearly non-optimal cases and it also wastes time by calculating every case twice with bags A and B swapped. it is brute force, however.)
You maximum recursion depth is the numer of items n, you call the recursive function 2^n - 1 times.
In your code, you can put the same item into a bag over and over:
for (i = 0; i < number_of_pizzas; i++) {
if (i != index) {
recursive(i, step - 1, sum);
}
}
This loop prevents you from treating the current item, but will happily treat items that have been put into the bag in earlier recursions for a second (or third) time. If you want to use that approach, you must keep a state of which item is in which bag.
Also, I don't understand your step. You start with step - 1 and stop recursion when step == 1. That means you are considering n - 2 items. I understand that the other items are in the other bag, but that's a weird condition that won't let you find the solution to, say, {8.0, 2.4, 2.4, 2.8}.

Generating a random cubic graph with uniform probability (or less)

While this may look like homework, I assure you it's not. It stems from some homework assignment I did, though.
Let's call an undirected graph without self-edges "cubic" if every vertex has degree exactly three. Given a positive integer N I'd like to generate a random cubic graph on N vertices. I'd like for it to have uniform probability, that is, if there are M cubic graphs on N vertices the probability of generating each one is 1/M. A weaker condition that is still fine is that every cubic graph has non-zero probability.
I feel there's a quick and smart way to do this, but so far I've been unsuccessful.
I am a bad coder, please bear with this awful code:
PRE: edges = (3*nodes)/2, nodes is even, the constants are selected in such a way that the hash works (BIG_PRIME is bigger than edges, SMALL_PRIME is bigger than nodes, LOAD_FACTOR is small).
void random_cubic_graph() {
int i, j, k, count;
int *degree;
char guard;
count = 0;
degree = (int*) calloc(nodes, sizeof(int));
while (count < edges) {
/* Try a new edge at random */
guard = 0;
i = rand() % nodes;
j = rand() % nodes;
/* Checks if it is a self-edge */
if (i == j)
guard = 1;
/* Checks that the degrees are 3 or less */
if (degree[i] > 2 || degree[j] > 2)
guard = 1;
/* Checks that the edge was not already selected with an hash */
k = 0;
while(A[(j + k*BIG_PRIME) % (LOAD_FACTOR*edges)] != 0) {
if (A[(j + k*BIG_PRIME) % (LOAD_FACTOR*edges)] % SMALL_PRIME == j)
if ((A[(j + k*BIG_PRIME) % (LOAD_FACTOR*edges)] - j) / SMALL_PRIME == i)
guard = 1;
k++;
}
if (guard == 0)
A[(j + k*BIG_PRIME) % (LOAD_FACTOR*edges)] = hash(i,j);
k = 0;
while(A[(i + k*BIG_PRIME) % (LOAD_FACTOR*edges)] != 0) {
if (A[(i + k*BIG_PRIME) % (LOAD_FACTOR*edges)] % SMALL_PRIME == i)
if ((A[(i + k*BIG_PRIME) % (LOAD_FACTOR*edges)] - i) / SMALL_PRIME == j)
guard = 1;
k++;
}
if (guard == 0)
A[(i + k*BIG_PRIME) % (LOAD_FACTOR*edges)] = hash(j,i);
/* If all checks were passed, increment the count, print the edge, increment the degrees. */
if (guard == 0) {
count++;
printf("%d\t%d\n", i, j);
degree[i]++;
degree[j]++;
}
}
The problem is that its final edge that has to be selected might be a self-edge. That happens when N - 1 vertices have already degree 3, only 1 has degree 1. Thus the algorithm might not terminate. Moreover, I'm not entirely convinced that the probability is uniform.
There's probably much to improve in my code, but can you suggest a better algorithm to implement?
Assume N is even. (Otherwise there cannot be a cubic graph on N vertices).
You can do the following:
Take 3N points and divide them into N groups of 3 points each.
Now pair up these 3N points randomly (note: 3N is even). i.e. Marry two points off randomly and form 3N/2 marriages).
If there is a pairing between group i and group j, create an edge between i and j. This gives a graph on N vertices.
If this random pairing does not create any multiple edges or loops, you have a cubic graph.
If not try again. This runs in expected linear time and generates a uniform distribution.
Note: all cubic graphs on N vertices are generated by this method (responding to Hamish's comments).
To see this:
Let G be a cubic graph on N vertices.
Let the vertices be, 1, 2, ...N.
Let the three neighbours of j be A(j), B(j) and C(j).
For each j, construct the group of ordered pairs { (j, A(j)), (j, B(j)), (j, C(j)) }.
This gives us 3N ordered pairs. We pair them up: (u,v) is paired with (v,u).
Thus any cubic graph corresponds to a pairing and vice versa...
More information on this algorithm and faster algorithms can be found here: Generating Random Regular Graphs Quickly.
Warning: I make a lot of intuitive-but-maybe-wrong claims in this answer. You should definitely prove them if you intend to use this idea.
Enumerating Cubic Graphs
When dealing with a random choice, a good starting point is to figure out how to enumerate over all of your possible elements. This might reveal some of the structure, and lead you to an algorithm.
Here is my strategy for enumerating cubic graphs: pick the first vertex, and iterate over all possible choices of three adjacent vertices. During those iterations, recurse on the next vertex, with the caveat that you keep track of how many edges are needed for each vertex degree to reach 3. Continue in that fashion until the lowest level is reached. Now you have your first cubic graph. Undo the recently added edges and continue to the next possibility until there are none left. There are a few implementation details you need to consider, but generally straight-forward.
Generalize Enumeration into Choice
Once you can enumerate all the elements, it is trivial to make a random choice. For example, you can scan the list once to compute its size then pick a random number in [0, size) then scan the sequence again to get the element at that offset. This is incredibly inefficient, taking at LEAST time proportional to the O(n^3) number of cubic graphs, but it works.
Sacrifice Uniform Probability for Efficiency
The obvious speed-up here is to make random edge choices at each level, instead of iterating over each possibility. Unfortunately, this will favor some graphs because of how your early choices affect the availability of later choices. Taking into account the need to track the remaining free vertices, you should be able to achieve O(n log n) time and O(n) space. Significantly better than the enumerating algorithm.
...
It's probably possible to do better. Probably a lot better. But this should get you started.
Another term for cubic graph is 3-regular graph or trivalent graph.
Your problem needs a little more clarification because "the number of cubic graphs" could mean the number of cubic graphs on 2n nodes that are non-isomorphic to one another or the number of (non-isomorphic) cubic graphs on 2n labelled nodes. The former is given by integer sequence A005638, and it is likely a non-trivial problem to uniformly pick a random isomorphism class of cubic graphs efficiently (i.e. not listing them all out and then picking one class). The latter is given by A002829.
There is an article on Wikipedia about random regular graphs that you should take a look at.

Resources