DFS: Existence of Spanning Tree - c

Can a depth-first-search be used to determine if some graph is connected given access to an arrays of edges and vertices of unknown size, with only the starting vertex as the input data?
struct node {
int parent, rank;
};
typedef struct node node;
struct edge {
int fromvertex, tovertex;
float weight;
};
typedef struct edge edge;
node* nodes;
edge* edges;
int hasspantree(int startvertex)
{
//find spanning tree?
}
Nodes and edges are assigned in a function that runs before the depth-first search, as so:
scanf("%d", nodecount);
scanf("%d", edgecount);
if ((nodes = malloc(*nodecount * sizeof(node))) == NULL) {
printf("nodes malloc failed"); exit(1);
}
if((edges = malloc(*edgecount * sizeof(edge))) == NULL) {
printf("edges malloc failed"); exit(1);
}
I can do it given this function declaration:
int hasspantree(int startvertex, int edgecount, int nodecount)
But I'd like to be able to do it with the previous declaration.

The short answer is, given your data (the starting node and 2 lists of unknown size), that is impossible to perform the DFS, simply because you don't know your graph (you don't know what is in memory graph-related data or garbage, as you don't know how to stop). So you cannot analyse an unknown graph. The question is whether the size of the graph (by the size of the graph I mean the size of the 2 arrays that define the graph in your case) is explicit (and it isn't, as your function doesn't take a size parameter) or implicit (the data structure contains the size or has a particular way of indicating it). So, you can use a terminator (a NULL pointer, similar to the use of the NUL character in strings) to indicate the end of the array. Also, you could have a global variable, but we all know that's not a good practice.

Related

Managing duplicates in a binary tree with memory efficiency

I have a self balancing key-value binary tree (similar to Tarjan's Zip Tree) where there will be duplication of keys. To ensure O(log N) performance the only thing I can come up with is to maintain three pointers per node; a less than, a greater than, and an "equals". The equals pointer is a pointer to a linked-list of members having the same key.
This seems memory inefficient to me because I'll have an extra 8 bytes per node in the whole tree to handle the infrequent duplicate occurrences. Is there a better way that doesn't involve "cheats" like bit banging the left or right pointers for use as a flag?
When you have a collision insertion, allocate new buffer, copy new data.
Hash the new data pointer down to one or two bytes. You'll need a hash that only returns zero on zero input!
Store the hash value in your node. This field would be zero if there are no collision data, so you are O(log KeyCount) for all keys without extra data elements. You're worst case is log KeyCount plus whatever your hashing algorithm yields on lookups, which might be a constant close to 1 additional step until your table has to be resized.
Obviously, choice of hashing algorithm is critical here. Look for one that is good with pointer values on whatever architecture you are targeting. You may need different hashes for different architectures.
You can carry this even further by using only one byte hash values that get you the hash table that you then use the key hash (can be a larger integer) to find the pointer to the additional data. When a hash table fills up, insert a new one into the parent table. I'll leave the math to you.
Regarding data locality. Since the node data are large, you already don't have good node record to actual data locality anyway. This scheme doesn't change that, except in the case where you have multiple data nodes for a particular key, in which case, you'd likely have cache miss getting to the correct index of a variable array embedded in the node. This scheme avoids having to reallocate the nodes on collisions, and probably won't have a severe impact on your cache miss rate.
I usually use this setup when i do a binary search tree, it skips in an array the duplicates values:
#include <stdio.h>
#include <stdlib.h>
#define SIZE 13
typedef struct Node
{
struct Node * right;
struct Node * left;
int value;
}TNode;
typedef TNode * Nodo;
void bst(int data, Nodo * p )
{
Nodo pp = *p;
if(pp == NULL)
{
pp = (Nodo)malloc(sizeof(struct Node));
pp->right = NULL;
pp->left = NULL;
pp->value = data;
*p = pp;
}
else if(data == pp->value)
{
return;
}
else if(data > pp->value)
{
bst(data, &pp->right);
}
else
{
bst(data, &pp->left);
}
}
void displayDesc(Nodo p)
{
if(p != NULL)
{
displayDesc(p->right);
printf("%d\n", p->value);
displayDesc(p->left);
}
}
void displayAsc(Nodo p)
{
if(p != NULL)
{
displayAsc(p->left);
printf("%d\n", p->value);
displayAsc(p->right);
}
}
int main()
{
int arr[SIZE] = {4,1,0,7,5,88,8,9,55,42,0,5,6};
Nodo head = NULL;
for(int i = 0; i < SIZE; i++)
{
bst(arr[i], &head);
}
displayAsc(head);
exit(0);
}

Creating a graph by using adjacency matrix

Here is my stucture
struct node{
int V,E;
int **adj;
};
Here is my code to create a graph:
struct node* create()
{
int i,j,x,y;
struct node *G=malloc(sizeof(struct node));
printf("Write the number of vertex and edges\n");
scanf("%d%d",&G->V,&G->E);
G->adj=malloc(sizeof(int)*(G->V * G->V));
if(!G->adj){
printf("Out of memory\n");
return;
}
for(i=0;i<G->V;i++)
for(j=0;j<G->V;j++)
G->adj[i][j]=0;
printf("\nWrite the source node and destination: ");
for(i=0;i<G->E;i++){
scanf("%d%d",&x,&y);
G->adj[x][y]=1;
G->adj[y][x]=1;
}
return(G);
}
and I am storing the pointer returned by this function in another pointer like this:
int main()
{
struct node *G=create();
}
When I compile the program, I'm asked for the number of vertex and edges but as soon as I enter the values, my program crashes. I want to know the reason. Is this because of memory allocation failure?
C99 style variable length arrays are only useful with local variables or arguments. So, you have the two classic C methods available to implement a 2D array.
Array of pointers:
That's what your struct looks like. With that, you need separate allocations for the pointer array and data:
G->adj = calloc(G->V, sizeof (int*));
assert(G->adj != NULL); /* need assert.h for this */
for (i=0; i<G-V; ++i)
{
G->adj[i] = calloc(G->V, sizeof (int));
assert(G->adj[i] != NULL);
}
It's a bit easier on memory to do one bulk allocation of the array data and then use pointer arithmetic to set the G->adj[] pointers, but the above gives the idea that, as far as C is concerned, each row is a separate array.
Just one bulk array, with explicit cell location calculation done on each access. This is what C does internally with nested arrays.
Change the type of adj to just int* and then:
G->adj = calloc(G->V * G->V, sizeof (int));
assert(G->adj != NULL);
That's it. Now when you access an element, use G->adj[i*G->V + j], instead of G->adj[i][j]. Macros may help with readability.

C Function returning pointer to garbage memory [duplicate]

This question already has answers here:
How to access a local variable from a different function using pointers?
(10 answers)
Closed 5 years ago.
I am writing a program that, given a set of inputs and outputs, figures out what the equation is. The way the program works is by randomly generating binary trees and putting them through a genetic algorithm to see which is the best.
All the functions I have written work individually, but there is either one or two that do not.
In the program I use two structs, one for a node in the binary tree and the other to keep track of how accurate each tree is given the data (its fitness):
struct node {
char value;
struct node *left, *right;
};
struct individual {
struct node *genome;
double fitness;
};
One function I use to randomly create trees is a subtree crossover function, which randomly merges two trees, returning two trees that are sort of a mixture of each other. The function is as follows:
struct node **subtree_crossover(struct node parent1, struct node parent2) {
struct node *xo_nodes[2];
for (int i = 0; i < 2; i++) {
struct node *parent = (i ? &parent2 : &parent1);
// Find the subtree at the crossover point
xo_nodes[i] = get_node_at_index(&parent, random_index);
}
else {
// Swap the nodes
struct node tmp = *xo_nodes[0];
*xo_nodes[0] = *xo_nodes[1];
*xo_nodes[1] = tmp;
}
struct node **parents = malloc(sizeof(struct node *) * 2);
parents[0] = &parent1;
parents[1] = &parent2;
return parents;
}
Another function used one that takes two populations (list of individuals) and selects the best from both, returning the next population. It is as follows:
struct individual *generational_replacement(struct individual *new_population,
int size, struct individual *old_population) {
int elite_size = 3;
struct individual *population = malloc(sizeof(struct individual) * (elite_size + size));
int i;
for (i = 0; i < size; i++) {
population[i] = new_population[i];
}
for (i; i < elite_size; i++) {
population[i] = old_population[i];
}
sort_population(population);
population = realloc(population, sizeof(struct individual) * size);
return population;
}
Then there is the function that essentially is the main part of the program. This functions loops through a population, randomly modifies them and chooses the best among them across multiple generations. From this, it selects the best individual (the highest fitness) and returns it. It is as follows:
struct individual *search_loop(struct individual *population) {
int pop_size = 10;
int tourn_size = 3;
int new_pop_i = 0;
int generation = 1
struct individual *new_population = malloc(sizeof(struct individual) * pop_size);
while (generation < 10) {
while (new_pop_i < pop_size) {
// Insert code where random subtrees are chosen
struct node **nodes = subtree_crossover(random_subtree_1, random_subtree_2);
// Insert code to add the trees to new_population
}
population = generational_replacement(new_population, pop_size, population);
// Insert code to sort population by fitness value
}
return &population[0];
}
The issue I am having is that the search_loop function returns a pointer to an individual that is filled with garbage values. To narrow down the causes, I began to comment out code. By commenting out either subtree_crossover() or generational_replacement() the function returns a valid individual. Based on this, my guess is that the error is caused by either subtree_crossover() or generational_replacement().
Obviously, this is a heavily reduced version of the code I am using, but I believe it still will show the error that I am getting. If you would like to view the full source code, look in the development branch of this project: https://github.com/dyingpie1/pony_gp_c/tree/Development
Any help would be greatly appreciated. I have been trying to figure this out for multiple days.
Your subtree_crossover() function is taking two nodes as values. The function will receive copies, which will then live on the stack until the function exits, at which point they will become invalid. Unfortunately, the function later sticks their addresses into an array that it returns. Therefore, the result of subtree_crossover() is going to contain two invalid pointers to garbage data.
You could initialize parents as a struct node * instead of a struct node **, and make it twice the size of a struct node. Then, you could just copy the nodes into the array. This would avoid the issue. Alternatively, you could copy the nodes onto the heap, so that you could return a struct node **. You'd then have to remember to eventually free the copies, though.

how to check if the pointer of a particular data structure is pointing to another node of the same data structure

This is the structure I defined for my B+ Tree. The function I display returns the number of the not NULL pointers of a node. The problem I'm encountering is that one of the pointers is pointing to:
7:i = 4
8: node->pointer[i] = (struct bptree *) 0x1
when checked in gdb which is neither a NULL pointer or a pointer to a Bptree node. When actually the answer should be 3. So is there a way to see if the pointer is pointing a Bptree structure or any data structure it is supposed to point to. N is the order of the B+ Tree
struct bptree
{
char **key;
int nokeys;
struct bptree* pointer[N];
int root;
int leaf;
struct bptree* parent;
};
typedef struct bptree Bptree;
int noofpointers(Bptree *node)
{
int i = 0;
if(node == NULL)
return i;
while(node->pointer[i] != NULL)
i++;
return i;
}
Not in general, no. Pointers don't carry any additional meta information besides their value, at run-time. You cannot inspect memory and figure out what it is holding (you can try by decorating your data with magic numbers).
Also, your code is scary since if all the pointers are used, it will run out of bounds. You must make sure i is less than N in the loop, and exit when all pointers have been checked.

Dijkstra's algorithm (updating the heap)

I am implementing Dijkstra's algorithm using a heap data structure. I also use an array that keeps track of the "probable minimum distances" of the nodes. The problem is when I am updating the array, how to update the corresponding values in the heap?
ok here's the code
typedef struct temp
{
int nodeTag;
int weight;
struct temp *next;
}myStruct; //this structure corresponds to the elements of the linked list
typedef struct temp *link;
typedef struct
{
int nodeTag; //each node has an integer nodeTag associated with it
link l;
}head; //the head of the elements of the adjacency list
typedef struct {
head *adjList;
int numNodes;
int numEdges;
} Graph;
typedef struct {
int possibleMinWeight;
int minFound; //minFound==1 if true min is found
} dummy;
dummy *dijkstraSSSP(Graph G, int nodeTag)
{
minHeap H=createEmptyHeap(G.numNodes);
while(i=0;i<G.numNodes;i++)
{
if(i!=nodeTag)
H.A[i].priority=INFINITY;
else
H.A[i].priority=0;
H.A[i].nodeTag=i;
}
convertIntoHeap(H);
int min;
dummy *A=(dummy *)malloc(sizeof(int)*G.numNodes);
A[nodeTag-1].possibleMinWeight=0;
A[nodeTag-1].minFound=1;
while(!isEmpty(H))
{
element e=findMin(H); H=deleteMin(H);
A[e.nodeTag-1].minFound=1;
link l=G.adjList[e.nodeTag-1].l;
while(l!=NULL)
{
if(A[l->nodeTag-1].minFound==0); //its true minimum distance is yet to be found
{
if(A[l->nodeTag-1].possibleMinWeight>A[x-1].possibleMinWeight+(l->weight))
A[l->nodeTag-1].possibleMinWeight=A[x-1]+(l->weight);
}
l=l->next;
}
}
return A;
}
To write DecreaseKey, you need the priority-queue implementation to maintain a map from nodeTags to locations in the queue. That means updating this map whenever the binary-heap data structure calls for a swap or perhaps choosing a pointer-based implementation like pairing heaps that never moves nodes in memory.
Unless you have a large, somewhat dense graph, DecreaseKey isn't worth it; just insert a node multiple times and ignore duplicate results from ExtractMin. (To detect duplicates: every time I've implemented Dijkstra, I've needed either the distances or the tree. In my programming languages of choice, it's easy enough to shake loose a bit from either array to remember whether each node has been visited.)

Resources