How would you find if one of the pointers in a linked list is corrupted or not ?
Introduce a magic value in your node structures. Initialize it upon new node allocation. Before every access, check if the node structure that the pointer points to contains the valid magic. If the pointer points at an unreadable data block, your program will crash. For that, on Windows there's API VirtualQuery() - call that before reading, and make sure the pointer points at readable data.
There are several possibilities.
If the list is doubly linked, it's possible to check the back pointer from what a front pointer points to, or vice versa.
If you have some idea as to the range of expected memory addresses, you can check. This is particularly true of the linked list is allocated from a limited number of chunks of memory, rather than having each node allocated independently.
If the nodes have some recognizable data in them, you can run down the list and check for recognizable data.
This looks to me like one of those questions where the interviewer isn't expecting a snappy answer, but rather an analysis of the question including further questions from you.
It's sort of a pain, but you can record the values of each pointer as you come across them with your debugger and verify that it's consistent with what you'd expect to find (if you'd expect a pointer to be NULL, make sure it's NULL. if you'd expect a pointer to refer to an already existing object, verify that that object's address has that value, etc.).
Yuo could keep a doubly linked list. Then you can check that node->child->parent == node (although if node->child has become corrupt this has a reasonable chance of causing an exception)
Several debuggers / bound-checkers will do this for you, but a cheap and quick solution to this question is to
Alter the structure of the list's nodes to include one additional char[n] field (or more typically two, one as the first the other as the last fields in the structure, hence allowing bounds-checking in addition to pointer corruption).
Initiallize these fields with a short (but long enough...) constant string such as "VaL1D-LiST-NODE 1234" when the nodes are created.
Check that the values read in this(these) field(s) match the expected text, each time a node is dereferenced, and before using the node in earnest.
When the field(s)' value do not match this is either the indication that:
the pointer is invalid (it never pointed to a list node)
something else is overwriting the node structure (the pointer is "valid" but the data it points to has been corrupted).
Related
(Context: The system I am working on already maintains a form of garbage collection. I'm working on compaction.)
Most compaction algorithms follow a basic structure:
Find first object
Move object to beginning of heap
Find second object
Move second object to address right after first object
Rinse and repeat
This algorithm is followed in section 2.2 of this paper except using two pointers, denoted "from" and "to". Essentially the FROM pointer traverses the heap until it finds live objects. Then it moves said object to the TO pointer. Then TO is incremented accordingly.
The algorithm is simple, but I have yet to find much information on how these pointers determine what is a "live object". This article discusses the creation of a basic mark-and-sweep garbage collector that runs through the stack, recursively going to each reference and marking them as live. The article however requires a linked list of ALL objects ever allocated. However, this is because the author is more or less creating their own VM.
My question is, is there a way of traversing a heap in C and identifying whether the current object is a live object? Is there a similar linked list of all allocated objects already in C that I could use? Or will I require more overhead?
My question is, is there a way of traversing a heap in C and identifying whether the current object is a live object?
At a high level, the process is looking at all active pointers and determining whether or not each piece of allocated memory is accessible. (Please note that this is very complicated is C, including because a pointer could be stored in an int or other data types.) If the memory is accessible via a pointer, then it is "live" in your terms. If not, then garbage collectors would consider it safe to free that memory.
If you're asking whether or not C has a native function for determining whether or not some allocated memory can be reached, then the answer is no.
Is there a similar linked list of all allocated objects already in C that I could use? Or will I require more overhead?
Again, if you're looking for a linked list that C natively provides and you can access, then the answer is no. You'd need to implement these things.
Forgive me if you've already seen this, but there are garbage collectors that you can download if you want to see how others have done it.
TL;DR: It's impossible.
To make that work, you need to solve some non-trivial problems:
Be able to name the live objects of the heap. That means to find and follow recursively all pointers in global variables and on the stack.
Move the live objects downwards to create a compact heap
Adjust pointers in your program to reflect the new locations of the moved objects.
Regarding 1.: At runtime, the C language doesn't help you to identify where you have pointer-type global variables. And on the stack, you find a mixture of e.g. integers, function-call return addresses or data pointers. For both memory areas, you have to find a way to enumerate all potential pointer values.
To make things worse, a pointer can not only point to the beginning of your data structure, but also to some inside element. And this pointer also makes the whole object "live".
Regarding 2.: That's the easy part, using the algorithm you mentioned.
Regarding 3.: Now your objects live at new addresses, so your old pointer values are no longer correct (pointing to the old locations), and you have to adjust them. So once again, you have to follow all root references (like in 1.) and adjust all pointers that are affected by your moves. But as you can't tell for sure if e.g. 0x12345678 was meant as an numeric integer or as an (old-location) address, changing that to the new-location address might break some computation.
Ran into a design problem when using memcpy and building a generic HashTable in c. The HashTable maps unsigned int keys to void * data that I memcpy over.
// Random example
void foo() {
// Suppose `a` is a struct that contains LinkedLists, char arrays, etc
// within it.
struct *a = malloc(sizeof(a));
HashTable ht = ht_create(sizeof(a));
// Insert the (key, value) pair (0, a) into the hash table ht
ht_insert(ht, 0, a);
// Prevent memory leak
destroy_struct(a);
// Do stuff...
// ... eventually destroy ht
ht_destroy(ht);
}
Now, given struct a has LinkedLists and pointers within it, and the HashTable is using memcpy, my understanding is that it copies over shallow copies of these pointers. Thus, ht_insert mallocs space for a new entry, shallowly copies over data from a, and inserts the new entry into its table.
Consequently, unless I free struct a completely with some function destroy_struct, I am leaking memory. However, given I'm shallowly copying data in ht_insert, when I call destroy_struct(a), I will have accidentally freed the data pointed to within the hash table's entry as well!
Is the logic above correct, and if so, should I use a some recursive memcpy function that makes sure to deep copy all data from struct a to the HashTable?
Firstly, if your code doesn't reproduce the problem you are explaining, you shouldn't include it. The problem your code produces is compiler errors. This doesn't help your question, does it?
Now, given struct a has LinkedLists and pointers within it, and the HashTable is using memcpy, my understanding is that it copies over shallow copies of these pointers.
If you are simply copying the internal representation of a struct whatever * into the internal representation of a void *, then you are asking for trouble. There is no guarantee that the two representations are identical. It's possible that one pointer type might be larger than the other, that they use different endianness (if they're implemented as typical quasi-integers) or other internal differences might exist. You should convert one pointer to the other type, and then you could simply assign it... In fact, because one of the types is void * that conversion will happen implicitly when you assign.
Consequently, unless I free struct a completely with some function destroy_struct, I am leaking memory.
From what you have described, you should only call free on that pointer value once (and only once) you are done with it, and your program no longer has any use for it (e.g. after you have removed it from the hashtable). This goes for all non-null pointers that are returned by malloc, realloc or calloc. To clarify: if x and y store the same pointer returned by one of those functions, free should only be called ONCE on ONE OF THEM because they contain the same value.
Is the logic above correct, and if so, should I use a some recursive memcpy function that makes sure to deep copy all data from struct a to the HashTable?
I highly recommend breaking this question up into two or more separate questions, because it's double-barreled. I could simply answer "yes" (or "no"). Would that give you any meaningful information?
This brings me back to what I first wrote. I can only guide you based on what you've written here, which might not be reflective of the code that you use (especially given the influences of the erroneous code you've given). In order to guide you better, I would need to see all of the gaps filled in. I would need to see a testcase that creates a hashtable, inserts into the hashtable, uses the hashtable, removes from the hashtable and cleans up the hashtable to determine whether or not your operations are leaking anywhere... but most importantly, this testcase would need to be COMPILABLE! Otherwise it can't do any of those things, because it can't compile.
I'm working with Hazard pointer in order to implement a lock-free linked list in C.
I couldn't find any example code other than vary basics queues and stacks.
The problem is I need to traverse the list, so my question is if I can change the value of a hazard pointer once is assigned.
For example:
t←Top
while(true) {
if t=null then
return null
*hp←t
if Top!=t then
continue
...
t←(t→next) //after this instruction pointer t will be still protected?
}
Finally I ended implementing my own version of Hazard Pointers (HP) according to the original paper. The answer to my question is NO, t is no longer safe to use. The reason is that, the way HP works, *hp is protecting the node being pointed by t when you declared it as a hazardous pointer, so when t moves to the next node, the HP mechanism keeps protecting the previous node. I have to reassign the new value to *hp before I can use it safely.
Also, in the example of the paper it is not explicit, but when you finish using a hazard pointer you have to release it. That means, return *hp to its original state (NULL). This way, if another thread wants to delete (retire) this node, it won't be seen as being used.
In my example above, I have to release *hp before leaving the method. Inside the loop it is not necessary because I am overwriting the same *hp position (*hp ← t), so the previous node is no longer protected.
You do not need hazard pointers when you are only traversing the list. Hazard happens when different threads are reading and writing from and to the same resource (In particular, hazard pointers are to overcome ABA problem, when a resource's value is changed to something and then back to its original value, which makes noticing the change difficult). With traversing, you are only reading, hence no need for hazard pointers.
By the way, it seems to me that you have to change if Top=t to if Top!=t, so that you can proceed with your code if there is no hazard. Note that continue returns to the beginning of the loop. Also, your whole code should be in a while(true) loop.
You can read more about hazard pointers here http://www.drdobbs.com/lock-free-data-structures-with-hazard-po/184401890 , or just by googling!
EDIT You need to provide the code for insert and delete functions. In short, the part of the code that you've mentioned ends up being an infinite loop after execution of t←(t→next), since Top!=t will hold true afterwards.
What you need to do instead of checking t against Top, is to check it against its previously captured value. Again, it depends on your implementation of other methods, but you probably want to implement something similar to Tim Harris algorithm, which uses a two phase deletion (1-marking and 2-freeing the node). Then, when you traverse the list, you need to check for marked nodes as well. There is also an implementation of a doubly linked list, with a find method which you can use as a base of your implementaion, in Fig 9 of http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf. Hope this helps.
In a program I'm writing, I am implementing binary tree and linked list structures; because I don't know how many nodes I will need, I am putting them on the heap and having the program use realloc() if they need more room.
The problem is that such structures include pointers to other locations in the same structure, and because the realloc() moves the structure, I need to redo all those pointers (unless I change them to offsets, but that increases the complexity of the code and the cost of using the structure, which is far more common than the reallocations).
Now, this might not be a problem; I could just take the old pointer, subtract it from the new pointer, and add the result to each of the pointers I need to change. However, this only works if it is possible to subtract two pointers and get the difference in their addresses (and then add that difference to another pointer to get the pointer that many bytes ahead); because I'm working on the heap, I can't guarantee that the difference of addresses will be divisible by the size of the entries, so normal pointer subtraction (which gives the number of objects in between) will introduce errors. So how do I make it give me the difference in bytes, and work even when they are in two different sections of the heap?
To get the difference between two pointers in bytes, cast them to char *:
(char *) ptrA - (char *) ptrB;
However, if you'd like to implement a binary tree or linked list with all nodes sharing the same block of memory, consider using an array of structs instead, with the pointers being replaced by array indices. The primary advantage of using pointers for a linked list or tree, rather than an array of structs, is that you can add or remove nodes individually without reallocating memory or moving other nodes around, but by making the nodes share the same array, you're negating this advantage.
The best way would indeed be to malloc() a new chunk for every node you have. But this might have some overhead for the internal management of the memory, so if you have lots of them, it might be useful to indeed allocate space fore more nodes at once.
If you need to realloc then, you should go another way:
1. Calculate the offset within your memory block: `ofs = ptrX - start`
2. Add this offset to the new address returned by `realloc()`.
This way, you always stay inside the area you allocated and don't have strange heap pointer differences with nearly no meaning.
In fact ,you can use malloc or calloc to get memory for each node.
So you only need to remeber the address of tree's root node.
In this way, you never need realloc memeory for the whole tree . The address of each node also never change . :)
For example, in Linux, I have a pointer pointing to a task_struct. Later, the task_struct might migrate or deleted. How do I know whether the pointer still points to a task_struct or not?
It's not possible.
Pointers only contain addresses, and generally it's not possible to determine whether or not a given address is "valid".
Sometimes you can ask the entity that gave you the pointer to begin with if it's still valid, but that of course depends on the exact details of the entity. The language itself cannot do this.
You don't know, because:
a pointer just contains the address of the object it points to;
the type information is lost at compile time.
So, C provides no facilities for dealing with this kind of problems, you have to track what happens to stuff you point to on your own.
The most you can ask (and it is alreay OS-specific) is to check if the memory page where the structure would reside is still accessible, but usually it's not a particularly useful information.
Depending on your allocation pattern/luck, you might get a segmentation fault (which of course kills your program)...but that at least would tell you the reference is no longer valid.
However, as previously stated, the best way is to track the validity yourself.
If you need to keep moving a struct around in memory (rather than just blanking it and reinitializing it at its current location), you could consider using a pointer to a pointer to make the tracking easier.
"ie. Everything gets a reference to the pointer to the struct, and then when you move or delete the struct you just set that pointer to NULL or to the new memory location."
Also, in general, if you want to do checks on your program for this kind of weirdness, I would recommend looking into valgrind.
It is your responsibility in C to write your code so that you keep track of it. You can use the special value of NULL (representing not pointing to anything), setting the pointer to NULL when you remove (or haven't yet set) whatever it was pointing to & testing for NULL before using it. You might also design your code in a way that the question never comes up.
There is no way to query a random pointer value to see if it represents something, just like there is no way to query an int variable to check if the value in it is uninitialized, junk, or the correct result of a computation.
It is all a matter of software design and, when necessary, using the value of NULL to designate not set.