I have a graph structure in C and want to make a deep copy of it (including nodes and edges).
The structure looks like this:
struct li_list {
struct li_node n;
};
struct li_node {
struct li_node *next, *prev;
};
struct gr_graph {
struct li_list nodes;
int nodecount;
};
struct gr_node {
struct li_node node;
struct gr_graph *graph;
int pred_count, succ_count;
struct li_list pred, succ;
};
struct gr_edge {
struct li_node succ, pred;
struct gr_node *from, *to;
unsigned long marks;
};
These structs do not exist as themselves, but "inherited" in another struct, like this:
struct ex_node {
struct gr_node _; // "Superclass"
int id;
struct ex_node *union_find_parent;
...
}
Is there an elegant solution of creating a deep copy such a structure, including updating references to the copies?
Note: Members of nested structs do not point to the root struct it contains, but to their related nested struct (for instance, ex_node._.pred.n.next points to a ex_edge._.pred). This implies tedious pointer arithmetic when these must be updated.
My solution up to now is
Memcopy all structs
Iterate through all copies
Call a bunch of macros for all fields that contain references (Due to missing RTTI in C, I probably won't come around this)
The macros use
offsetof to calculate the address of the root struct
Retrieve the address of the copied equivalent
offsetof to make the pointer point to the correct nested struct
Is there any easier way to do this? I am also afraid that I forget to add a macro call when I add more fields.
I don't think you can do a deep-copy per se, as the pointers will have a memory address assigned to the pointers, the best way I can think of a deep-copy is to simply allocate a new graph structure and copy the data (not pointers) and build it up from there by mallocing the new pointers and adjust the pointers in the ex_node structure. That would be a more thorough solution...
Hope this helps,
Best regards,
Tom.
Sounds ok. My $0.02:
Not sure why you need both li_list and li_node. Further, don't you need a data member for li_node?
The overall structure looks a bit complex (of course, I don't know your requirements) and smells of C++ style design (pardon me, if I am wrong)
memcpy is not required. A simple assignment suffices.
Define a function pointer member for each structure with pointer members, so that you can do:
So:
struct foo {
int datum;
int *p;
foo_copy pfoo;
};
typedef void (*foo_copy)(const struct foo *src, struct foo *dst);
void foo_cp(const struct foo *src, struct foo *dst)
{
*dst = *src; // copy non-pointer data
dst->p = malloc(sizeof *dst->p);
dst->p = *src->p;
}
// somewhere else
struct foo s;
// initalize
struct foo *t = malloc(sizeof *t);
s.copy(&s, &t);
and nested types call appropriate member copy methods ...
memcpy all structs and create a sorted list where each entry contains address of original struct and address of copy of struct.
Now iterate through all copies. For each pointer variables in all copied structs, search pointer in the sorted list and replace it with the address of its copy.
Yes, there is an elegant solution using spanning trees and the decorator pattern.
-First, build a spanning tree of the graph. You can use a DFS (Depth First Search)
or a BFS(Breadth First Search) to achieve this. Use the decorator pattern to give
each each visited node a unique identifier.
-Next, (or at the same time) traverse the spanning tree from start to finish
and begin building your second tree by allocating new nodes and connecting
the edges that form the spanning tree.
-Finally, take one more pass through the spanning tree, and using the synchronized
identifiers, connect the remaining missing edges in the new graph, so that they match the
connectivity of the old graph.
(e.g. If node5 in graph1 has edges connecting to node7 and node 11, then
use the ordering of graph2 to connect its node5 to its node7 and 11.)
Related
I'd like to understand the difference between using a pointer and a value when it comes to referencing a struct inside another struct.
By that I mean, I can have those two declarations:
struct foo {
int bar;
};
struct fred {
struct foo barney;
struct foo *wilma;
}
It appears I can get the same behavior from both barney and wilma entries, as long as I de-reference accordingly when I access them. The barney case intuitively feels “wrong” but I cannot say why.
Am I just relying on some C undefined behavior? If not, what would be the reason(s) to opt for one style over the other?
The following code shows how I come to the conclusion both use cases are equivalent; neither clang nor gcc complain about anything.
#include <stdio.h>
#include <stdlib.h>
struct a_number {
int i;
};
struct s_w_ptr {
struct a_number *n;
};
struct s_w_val {
struct a_number n;
};
void store_via_ptr(struct s_w_ptr *swp, struct s_w_val *swv) {
struct a_number *i = malloc(sizeof(i));
i->i = 1;
swp->n = i;
swv->n = *i;
}
void store_via_val(struct s_w_ptr *swp, struct s_w_val *swv) {
struct a_number j;
j.i = 2;
swp->n = &j;
swv->n = j;
}
int main(void) {
struct s_w_ptr *swp = malloc(sizeof(swp));
struct s_w_val *swv = malloc(sizeof(swv));
store_via_ptr(swp, swv);
printf("p: %d | v: %d\n", swp->n->i, swv->n.i);
store_via_val(swp, swv);
printf("p: %d | v: %d\n", swp->n->i, swv->n.i);
}
It's perfectly valid to have both struct members in a struct and have pointers to struct in a struct. They must be used differently but both are legal.
Why have a struct in a struct ?
One reason is to group things together. For instance:
struct car
{
struct motor motor; // a struct with several members describing the motor
struct wheel wheel; // a struct with several members describing the wheels
...
}
struct car myCar = {....initializer...};
myCar.wheel = SomeOtherWheelModel; // Replace wheels in a single assign
myCar.wheel.pressure = 2.1; // Change a single wheel member
Why have a struct pointer in a struct?
One very obvious reason is that is can be used as an array of N structs by using dynamic allocation of N times the struct size.
Another typical example is linked lists where you have a pointer to a struct of the same type as the struct containing the pointer.
There are several advantages of having a struct in a struct instead of having a pointer to struct in a struct:
It requires less memory allocation. In the case where you have a pointer to a struct in a struct, the compiler will allocate memory to store the pointer to the struct within the parent struct and separately allocate the memory for the child struct.
Additional instructions are typically required to access the contents of the child struct. For example consider that the program is reading the contents of the child struct. If a struct within a struct is used, the program will apply an offset to the address of the variable and read the contents of that memory location. In the case of a pointer to a struct in a struct, the program will actually apply an offset to the parent struct variable address, fetch the address of the child struct, then read from memory the contents of the child struct.
A separate variable needs to be declared for both the parent and child struct and if an initializer is used, then a separate initializer is needed. In the case of a struct in a struct only one variable must be declared and a single initializer is used.
In cases where dynamic memory allocation is used, the developer must remember to deallocate memory for both the child and parent objects before the variables fall out of scope. In the case of struct in a struct the memory must be freed for only one variable.
Lastly, as is shown in the example, if a pointer is used, Null checking may be necessary to ensure that the pointer to the child struct has been initialized.
The primary advantages of having a pointer to a struct in a struct would be if you needed to replace the child struct with another struct within the program, such as a linked list. A less common case might be if the child struct can be of more than one type. In this case you might use a void * type for the child. I may also use a pointer within a struct to point to an array in case where the array pointed to may vary in size between instances.
Based on my knowledge the case shown in the example above, I would be inclined to use a struct in a struct, since both objects are of fixed size and type and since it appears that they would not need to be separated.
C structures can be used to group related data, such as the title of a book, its author, its assigned book number, and so on. But much of what we use structures for is creating data structures (in a different sense of the word “structure”) in memory.
Consider that the book’s author has a name, a date of birth, other biographical information, a list of books they have written, and more. We could include in the struct book a struct author that would contain all this information. But, if the author has written a hundred books, we could have 100 copies of all that information, one copy in each struct book. Further, we cannot continue the “contain the data inside the structure directly” model with the struct author, because it cannot contain a struct book for each book the author publishes if those struct book members also have to contain the struct author for the author—every object would have to contain itself.
It is more efficient to create one struct author and have each struct book for that author to link to their struct author.
Another example is that we use pointers to create data structures for efficient access to data. If we are reading data for thousands of items and want to keep them sorted by name, one option is to allocate memory for some number of structures, read the data, and sort the data. When new data is read and we have used all the memory we allocated, we allocate new memory, copy all the old data to the new memory if necessary, and move some of the data so we can insert the new data in its proper place. However, we have many better options than that. We can use linked lists, binary trees, other kinds of trees, and hash tables.
These data structures effectively require using pointers. A binary tree will have a root node, and each node contains two pointers, one to a subtree of nodes that are earlier than it in the sorting order and another to a subtree of nodes that are later than it. We can look up items in the tree by following pointers to earlier or later nodes to find the right position. And we can insert items by changing a few pointers. If the tree happens to become unbalanced, we can rearrange nodes in the tree by changing pointers. The bulk of the data in the nodes does not have to be changed or copied, just some pointers.
We can also use pointers to have multiple structures for the same data. All the data about books could be stored in one place, and a tree ordered by name could contain nodes in which each node contained a pointer to the book structure and two pointers to subtrees. We could have one tree like this ordered by title of the book and another tree ordered by the name of the author and another tree ordered by the assigned book number. Then we can efficiently look up a book by title or author or number, but there is only one master copy of the complete book data, in the struct book objects. The look-up data is in the tree, which contains only pointers. That is much more efficient than copying all of the struct book data for each tree.
So the reasons we choose between use structures or pointers as members is not whether the C syntax allows us to refer to the data or not—we can get to the data in both cases. The reasons are because one method requires embedding data, which is inflexible and requires copying data, and the other method is flexible and efficient.
Let's consider at first this function
void store_via_ptr(struct s_w_ptr *swp, struct s_w_val *swv) {
struct a_number *i = malloc(sizeof(i));
i->i = 1;
swp->n = i;
swv->n = *i;
}
This declaration
struct a_number *i = malloc(sizeof(i));
is equivalent to the following declaration
struct a_number *i = malloc(sizeof( struct a_number * ));
So in general the function can invoke undefined behavior when sizeof( struct a_number ) is greater than sizeof( struct a_number * ).
It seems you mean
struct a_number *i = malloc(sizeof( *i ) );
^^^
If you will split the function in two functions for each its parameter like
void store_via_ptr1( struct s_w_ptr *swp ) {
struct a_number *i = malloc(sizeof( *i ) );
i->i = 1;
swp->n = i;
}
and
void store_via_ptr( struct s_w_val *swv ) {
struct a_number *i = malloc(sizeof( *i));
i->i = 1;
swv->n = *i;
}
then in the first function the object pointed to by the pointer swp will need to remember to free the allocated memory within the function. Otherwise there will be a memory leak.
The second function already produces a memory leak because the allocated memory was not freed.
Now let's consider the second function
void store_via_val(struct s_w_ptr *swp, struct s_w_val *swv) {
struct a_number j;
j.i = 2;
swp->n = &j;
swv->n = j;
}
Here the pointer swp->n will point to a local object j. So after exiting the function this pointer will be invalid because the pointed object will not be alive.
So the both functions are incorrect. Instead you could write the following functions
int store_via_ptr(struct s_w_ptr *swp ) {
swp->n = malloc( sizeof( *swp->n ) );
int success = swp->n != NULL;
if ( success ) swp->n->i = 1;
return success;
}
and
void store_via_val( struct s_w_val *swv ) {
swv->n.i = 2;
}
When to include a whole object of a structure type in another object of a structure type or to use a pointer to an object of a structure type within other object of a structure type depends on the design and context where such objects are used.
For example consider a structure struct Point
struct Point
{
int x;
int y;
};
In this case if you want to declare a structure struct Rectangle then it is natural to define it like
struct Rectangle
{
struct Point top_left;
struct Point bottom_right;
};
On the other hand, if you have a two-sided singly-linked list then it can look like
struct Node
{
int value;
struct Node *next;
};
struct List
{
struct Node *head;
struct Node *tail;
};
Two problems:
In store_via_ptr you allocate memory for i dynamically. When you use s_w_val you copy the structure, and then leave the pointer. Which means the pointer will be lost and can't be passed to free later.
In store_via_val you make swp->n point to the local variable j. A variable whose life-time will end when the function returns, leaving you with an invalid pointer.
The first problem might lead to a memory leak (something you never care about in your simple example problem).
The second problem is worse, since it will lead to undefined behavior when you dereference the pointer swp->n.
Unrelated to that, in the main function you don't need to allocate memory dynamically for the structures. You could just have defined them as plain structure objects and used the pointer-to operator & when calling the functions.
I'm making an RPN Calculator that uses stacks and queues implemented as a linked list structure. The problem is that the structure needs to be able to handle both symbols and numbers (including a data type I created for mixed numbers). For this my teacher recommended using a void pointer type to store the data for each node as shown here:
typedef struct node {
void* data;
struct node* next;
} Node;
I'm not sure how to
dereference the pointer to use its address value or
add a node with different data types (int, char, or mixedNum)
For this my teacher recommended using a void pointer type to store the data for each node
Don't use void* unless you really need to, it's a bare untyped pointer to an arbitrary block of memory, and there are many pitfalls and potential bugs working that way.
the structure needs to be able to handle both symbols and numbers (including a data type I created for mixed numbers).
That sounds like a job for a union, where you have another member tagging the type. Try something like this:
/* A tag to for the union type */
typedef enum
{
Number,
Symbol
} NodeType;
/* A Node can contain a symbol or numeric value */
typedef union
{
char* symbol;
float number;
} NodeData;
/* A linked list */
typedef struct Node
{
NodeType type;
NodeData data;
struct Node* next;
} Node;
I suggest you create functions to do things like create a new symbol node, create a numeric node, setting the tag accordingly. You will probably want to strdup the symbol name so you own a copy of the string. A function to free a node (including freeing the string if it is of that type), etc, etc.
As others, have already mentioned, a union is a better choice for this situation where you only need a few data types.
Although this is true, void* is more versatile and is a must if you want to create a more "general" linked list that can handle even custom types without you declaring it explicitly.
So, first of all, when implementing it with void*, it is quite common to use some typedefs that makes the code more readable. These are:
typedef void* Pointer
typedef struct node* ListNode
typedef ListNode* List
This will allow you to create a list simply by:
List list;
So, to answer to the question:
1) Let's assume you have a ListNode with an int* inside. To dereference it, you have to (int*)(ListNode->data). That casts Pointer to int*. If you want to get the actual value, you can always add an asterist before it.
2) The insert function would look something like this:
void insert(List list, Pointer data);
Since the data you give it is of type Pointer, you can put whatever you want there. For instance, if you want to put an int, you could do so by:
int a = 5;
insert(my_list, &a);
The standard struct for list node is:
struct node {
int x;
struct node *next;
};
But, what would happen if we defined a node without a pointer, like this:
struct node {
int x;
struct node next;
};
?
I assume that the main problem would be not knowing where the list ends, since there wouldn't be a NULL pointer. But apart from that is there any other effects to be taken into consideration?
What would happen if we defined a node without a pointer, like this:
struct node {
int x;
struct node next;
};
This declares a structure with unterminated recursion. Hence the declaration is invalid and is rejected by the compiler.
Let's calculate this:
sizeof(struct node)
Well, we have an int, possibly some padding and sizeof(struct node). Putting it into one formula:
sizeof(struct node) = sizeof(int) + padding + sizeof(struct node)
This cannot be solved.
Thinking about it less theoretically, it would be a structure containing an infinite number of itself.
Languages that don't have value semantics but use reference semantics instead, like Haskell, allow this kind of data structures (types). I'm oversimplifying a lot here, but think of every structure member (record field) as a pointer, then it's probably clear why or works there:
data List = EndOfList | Node Int List
A struct type may not contain an instance of itself as a member for two reasons.
First, the type definition isn't complete until the closing } of the struct type; until the type definition is complete, the compiler won't know how much space to allocate for that member. Secondly, a struct type that contains an instance of itself would be infinitely large.
A struct type may contain a pointer to itself as a member since pointers to incomplete types are allowed, and all struct pointer types have the same size and representation.
You can create linked lists without using pointers; I did it Fortran 77, which didn't have pointer types. You simply use an array as your storage, and use array indices as your "pointers".
I'm trying to program a network in C. I have nodes which are linked to each other and I 'd like to do that by making the struct member point to another member (not to another node, because I want to preserve the identity of the links).
The code I made to do that is something like:
struct node{
int k; //number of links
struct node.link **link; //<- wrong
};
but this is not right as node is not a variable but a type of variable (this is already discussed as an error in another QA: first you have to define a variable of node type and then apply the .link, but this doesn't help here). There's also a QA called "Struct member point at another struct member" but they don't do it from definition and it is not so clear how to generalize it (at least for me).
Is it a correct way to do this?
The problem is that the C language doesn't let you create the type you want. You need a type T with the property *T has the same type as T. You can't do that. (Well, function pointers have that property, but that's an irrelevant technicality.)
You have to introduce a new name. C only lets you do this with structs or similar constructions.
struct link {
struct link *ptr;
};
struct node {
int k;
struct link *link;
};
This will get you what you want. Now, in order to go from a struct link * to a struct node *, you'll have to do some pointer math:
struct node *node_from_link(struct link *link) {
return (struct node *) ((char *) link - offsetof(struct node, link));
}
This is also provided by the container_of macro, which is not part of the C standard, but you can find a definition for it online.
Or, you could just go the traditional route.
// Usually easier to do it this way...
struct node {
int k;
struct node *link;
};
Is this what you are after?
struct Node
{
int k; //number of links
void* link;
};
struct Node* create()
{
struct Node* node = malloc(sizeof(struct Node));
node->k = 0;
node->link = 0;
return node;
}
void link(struct Node* from, struct Node* to)
{
from->link = &(to->link);
}
int main()
{
struct Node* child = create();
struct Node* parent = create();
link(parent, child);
return 0;
}
I've used void* for the link for the reason expressed by Dietrich: you want a pointer to the link to be the same type as the link. This effectively means a cast, so why not just use a generic pointer?
Membership in a structure, generalized or specific, is not an attribute of C data types. There is therefore no way to declare a pointer that can only point to a structure member, and not to any other variable of compatible type.
On the other hand, you don't need to do anything special to declare a pointer that can point to a member of another structure. You just need a pointer to that member's data type, and structure membership is irrelevant to that data type.
For example, you can have
struct node {
int k; /* number of links */
struct node **links; /* points to a dynamic array of node pointers */
struct node **one_link; /* points to a node pointer from another node */
};
In that case, it might make sense to do something like this:
struct node *n1 = /* ... */;
struct node *n2 = /* ... */;
n2->one_link = &(n1->links[3]);
Overall, though, I think this is kind of convoluted. There is probably a better way to structure your data.
Update:
Based on your description of what you're after:
[...] links are bidirectional, if I destroy one link (say the one that links node 1 to node 3) I'll need to destroy the node 1 link AND the corresponding link from node 3. Then I need to know more than just who is link to who. I need to know which link they are using.
there are at least two possible solutions, depending on details of how your nodes are structured. If they are structured like I show above, with an array (dynamic or not) of pointers to other nodes, then your general idea simply won't work. That's because the position of each link within an array of links will change as you delete other links (supposing that you close the gaps). Instead, you can just scan:
struct node {
int k; /* number of links */
struct node **links; /* points to a dynamic array of node pointers */
struct node *parent; /* points to a node that links to this one */
};
void delete_node(struct node *n) {
if (n->parent) {
int i;
for (i = 0; i < n->parent->k; i += 1) {
if (n->parent->links[i] == n) {
/* ... delete the ith element of n->parent->links ... */
break;
}
}
}
/* ... clean up node n ... */
}
If one node's links to others are stored in separate members, on the other hand, then you could indeed provide a double-pointer by which to remove links from the parent, but the presence of member k in your original structure tells me that's not your situation.
Ok, this is how I finally solved it in my program:
typedef struct node{
int k; //connectivity
struct link **enlace; //vector of LINKs
}NODE;
typedef struct link{
NODE *node1;
NODE *node2;
}LINK;
Basicly, I defined two structures: one is the NODE type, which contains the information of how connected is the node and a vector of LINKs, and the other is the structure LINK which contains the information of the link itself, I mean which nodes the link connects.
With these two I'm able to create the network of nodes with a connectivity following a Poisson distribution, and then destroy each link one by one, choosing one link at random from a list and then redirecting the pointers of each node to NULL.
I am a C beginner with quite a lot of OOP experience (C#) and I am having trouble understanding how some notion of "polymorphism" can be achieved in C.
Right now, I am thinking how to capture the logical structure of a file system using structs. I have a folder that contains both folders and files. Folders in this folder can contain another files and folders, etc.
My approach:
typedef enum { file, folder } node_type;
struct node;
typedef struct {
node_type type;
char *name;
struct node *next;
struct node *children;
} node;
Is this the best I can do? I have found a lot of posts on "polymorphism in C", but I would like to see how a polymorphic data structure like this can be built cleanly and efficiently (in terms of memory wasted on unused members of those structures).
Thanks.
I hope I understand what you want - I'm unsure but I guess you want to do something like that:
typedef struct
{
int type; // file or folder?
} Item;
typedef struct
{
struct A;
// data related to a file
} File;
typedef struct
{
struct A;
// data related to a folder - like pointer to list of Item
} Folder;
As long as both structure follow the same memory mapping (same variables) and adds to it as a child, you'll be able to use the pointer properly in both structs.
Check this one out as well: How can I simulate OO-style polymorphism in C?
Edit: I'm not sure about the syntax above (took it from the link above). I'm used to writing it this way instead:
typedef struct
{
int type;
// data for file
} File;
typedef struct
{
int type;
// data for folder - list, etc
} Folder;
C has no intrinsic notion of polymorphism.
You will end up implementing the mechanisms that you want from scratch. That's not a bad thing. It gives you a lot more flexibility. For example, C++ virtual methods are hard-wired per class, you can't change method pointers per-instance.
Here are a few ideas:
Your node_type field provides a way to do a runtime type query. Going further, you can pack multiple types into one struct using a discriminated (or tagged) union: http://en.wikipedia.org/wiki/Tagged_union. I'm not sure whether a variant type qualifies as OO though.
Polymorphism is usually about behavior. You could store function pointers ("methods") in the struct, with pointers to different functions providing different behavior for different object instances. The C++ way of doing things is for each class to have a table of function pointers, then each object instance references the table for its class (incidentally the table pointers can also play the role of your node_type for RTTI). This is called a virtual method table.
Data inheritance means that subclasses contain all of the base class' data members plus some extra stuff. In C the easiest way to do this is by embedding the base class struct at the head of the derived class struct. That way a pointer to derived is a pointer to base.
typedef struct BaseClass {
int baseMember;
} BaseClass;
typedef struct DerivedClass {
BaseClass base;
int derivedMember;
} DerivedClass;
You could do worse than read "Inside the C++ Object Model" by Stanley B. Lippman. For example, this will help if you want to get an idea of how to implement multiple inheritance.
Here's an illustration of old-school C polymorphism, based on ancient memories of X/Motif.
If you just want a discriminated union (or even just a typed structure with a child pointer that may be null), it's probably simpler in your case.
enum NodeType { TFile, TFolder };
struct Node {
enum NodeType type;
const char *name;
struct Node *next;
};
struct FileNode {
struct Node base_;
};
struct FolderNode {
struct Node base_;
struct Node *children;
/* assuming children are linked with their next pointers ... */
};
Here are the constructors - I'll leave populating the linked lists as an exercise for the reader ...
struct Node* create_file(const char *name) {
struct FileNode *file = malloc(sizeof(*file));
file->base_.type = TFile;
file->base_.name = name; /* strdup? */
file->base_.next = NULL;
return &file->base_;
}
struct Node* create_folder(const char *name) {
struct FolderNode *folder = malloc(sizeof(*folder));
folder->base_.type = TFolder;
folder->base_.name = name;
folder->base_.next = NULL;
folder->children = NULL;
return &folder->base_;
}
Now we can walk a hierarchy, checking the type of each node and responding appropriately. This relies on the first member subobject having zero offset to the parent - if that doesn't hold (or you need multiple inheritance), you have to use offsetof to convert between base and "derived" types.
void walk(struct Node *root,
void (*on_file)(struct FileNode *),
void (*on_folder)(struct FolderNode *))
{
struct Node *cur = root;
struct FileNode *file;
struct FolderNode *folder;
for (; cur != NULL; cur = cur->next) {
switch (cur->type) {
case TFile:
file = (struct FileNode *)cur;
on_file(file);
break;
case TFolder:
folder = (struct FolderNode *)cur;
on_folder(folder);
walk(folder->children, on_file, on_folder);
break;
}
}
}
Note that we have a sort-of-polymorphic base type, but instead of switching on the type enumeration we could have a more completely polymorphic setup with virtual functions. Just add a function pointer to Node, something like:
void (*visit)(struct Node *self,
void (*on_file)(struct FileNode *),
void (*on_folder)(struct FolderNode *));
and have create_file and create_folder set it to an appropriate function (say, visit_file or visit_folder). Then, instead of switching on the enumerated type, walk would just call
cur->visit(cur, on_file, on_folder);