Traversing a list with hazard pointers - c

I'm working with Hazard pointer in order to implement a lock-free linked list in C.
I couldn't find any example code other than vary basics queues and stacks.
The problem is I need to traverse the list, so my question is if I can change the value of a hazard pointer once is assigned.
For example:
t←Top
while(true) {
if t=null then
return null
*hp←t
if Top!=t then
continue
...
t←(t→next) //after this instruction pointer t will be still protected?
}

Finally I ended implementing my own version of Hazard Pointers (HP) according to the original paper. The answer to my question is NO, t is no longer safe to use. The reason is that, the way HP works, *hp is protecting the node being pointed by t when you declared it as a hazardous pointer, so when t moves to the next node, the HP mechanism keeps protecting the previous node. I have to reassign the new value to *hp before I can use it safely.
Also, in the example of the paper it is not explicit, but when you finish using a hazard pointer you have to release it. That means, return *hp to its original state (NULL). This way, if another thread wants to delete (retire) this node, it won't be seen as being used.
In my example above, I have to release *hp before leaving the method. Inside the loop it is not necessary because I am overwriting the same *hp position (*hp ← t), so the previous node is no longer protected.

You do not need hazard pointers when you are only traversing the list. Hazard happens when different threads are reading and writing from and to the same resource (In particular, hazard pointers are to overcome ABA problem, when a resource's value is changed to something and then back to its original value, which makes noticing the change difficult). With traversing, you are only reading, hence no need for hazard pointers.
By the way, it seems to me that you have to change if Top=t to if Top!=t, so that you can proceed with your code if there is no hazard. Note that continue returns to the beginning of the loop. Also, your whole code should be in a while(true) loop.
You can read more about hazard pointers here http://www.drdobbs.com/lock-free-data-structures-with-hazard-po/184401890 , or just by googling!
EDIT You need to provide the code for insert and delete functions. In short, the part of the code that you've mentioned ends up being an infinite loop after execution of t←(t→next), since Top!=t will hold true afterwards.
What you need to do instead of checking t against Top, is to check it against its previously captured value. Again, it depends on your implementation of other methods, but you probably want to implement something similar to Tim Harris algorithm, which uses a two phase deletion (1-marking and 2-freeing the node). Then, when you traverse the list, you need to check for marked nodes as well. There is also an implementation of a doubly linked list, with a find method which you can use as a base of your implementaion, in Fig 9 of http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf. Hope this helps.

Related

How to identify live objects while traversing the heap?

(Context: The system I am working on already maintains a form of garbage collection. I'm working on compaction.)
Most compaction algorithms follow a basic structure:
Find first object
Move object to beginning of heap
Find second object
Move second object to address right after first object
Rinse and repeat
This algorithm is followed in section 2.2 of this paper except using two pointers, denoted "from" and "to". Essentially the FROM pointer traverses the heap until it finds live objects. Then it moves said object to the TO pointer. Then TO is incremented accordingly.
The algorithm is simple, but I have yet to find much information on how these pointers determine what is a "live object". This article discusses the creation of a basic mark-and-sweep garbage collector that runs through the stack, recursively going to each reference and marking them as live. The article however requires a linked list of ALL objects ever allocated. However, this is because the author is more or less creating their own VM.
My question is, is there a way of traversing a heap in C and identifying whether the current object is a live object? Is there a similar linked list of all allocated objects already in C that I could use? Or will I require more overhead?
My question is, is there a way of traversing a heap in C and identifying whether the current object is a live object?
At a high level, the process is looking at all active pointers and determining whether or not each piece of allocated memory is accessible. (Please note that this is very complicated is C, including because a pointer could be stored in an int or other data types.) If the memory is accessible via a pointer, then it is "live" in your terms. If not, then garbage collectors would consider it safe to free that memory.
If you're asking whether or not C has a native function for determining whether or not some allocated memory can be reached, then the answer is no.
Is there a similar linked list of all allocated objects already in C that I could use? Or will I require more overhead?
Again, if you're looking for a linked list that C natively provides and you can access, then the answer is no. You'd need to implement these things.
Forgive me if you've already seen this, but there are garbage collectors that you can download if you want to see how others have done it.
TL;DR: It's impossible.
To make that work, you need to solve some non-trivial problems:
Be able to name the live objects of the heap. That means to find and follow recursively all pointers in global variables and on the stack.
Move the live objects downwards to create a compact heap
Adjust pointers in your program to reflect the new locations of the moved objects.
Regarding 1.: At runtime, the C language doesn't help you to identify where you have pointer-type global variables. And on the stack, you find a mixture of e.g. integers, function-call return addresses or data pointers. For both memory areas, you have to find a way to enumerate all potential pointer values.
To make things worse, a pointer can not only point to the beginning of your data structure, but also to some inside element. And this pointer also makes the whole object "live".
Regarding 2.: That's the easy part, using the algorithm you mentioned.
Regarding 3.: Now your objects live at new addresses, so your old pointer values are no longer correct (pointing to the old locations), and you have to adjust them. So once again, you have to follow all root references (like in 1.) and adjust all pointers that are affected by your moves. But as you can't tell for sure if e.g. 0x12345678 was meant as an numeric integer or as an (old-location) address, changing that to the new-location address might break some computation.

Linked List - Stack Building - C - Is this tutorial correct

I am trying to work a through a computer science course on coursebuffet.com, which referred me to saylor.org, which gave me this to learn about how to implement a stack with a linked list in C.
First, I think I understood the concept, but if you'd be so kind and scroll down in the link you will find at the end of it a link to a main file, with which you should test your implementation of it. And what absolutely baffles me for the last two days (yeah, that's how much time I already sank in this one problem) is the following passage:
/*
* Initialize the stack. Make it at least
* big enough to hold the string we read in.
*/
StackInit(&stack, strlen(str));
I can't understand how to initialise a linked list. I mean, that's against its concept, isn't it? I would need to create struct Elements before filling them with push commands, but if I do that, I need to follow the stack in two directions. One directions for pushing it and the opposite direction for popping it. That would need two pointers. I thought the whole concept as described here would be one data element and one pointer per ADT-unit.
Can someone please explain this to me?
When you initialize list to be length of the string you want to read, you will still have stack pointer pointing to the first element of the list. So basically nothing is lost. However you are correct, there is no point of doing something like that.
There is no need for double linked list. Stack pointer will always point to the first element. Basically whenever you want to push() you will add new node to the beginning of the list, and whenever you want to pop() you will remove first node of the list.
Assume stack signifies LIFO operation only. i.e last in will be the first to get out.
Now Lets think how we would like to implement it.
First choice: Just have an fixed size internal array. In that case, you will never have the option to resize on the fly. So the comment above stack init is valid, that you should allocate a size that you think will be safe for the purpose.
Second Choice: Having an array, but using dynamic memory. In that case, even if you hit the limit of existing stack, you always can expand the size by realloc. So comment does not make sense.
Third Choice: Using linklist, So in theory, even if you initialize the stack with 0 size, it should expand on each node insertion, so the comment is misleading. Just to add, top is always the head of the linklist, and with every insertion, new node will become new head.
So to answer your question, the comment above is confusing and make sense only when internal implementation is array based.
But in common sense and general perception associated to Stack DS, Stack is DS which is always associated to Stack Depth. And when implementing this, its always safe to have max limit of elements that can be pushed and I guess, may be comment meant that.
To further illustrate it with an real example, you must have heard of callstack of functions, though in theory its expands but has MAX possible limit and thats the reason we see stack overflow error when we do infinite recursion.

How do I know whether a pointer points to a specific structure or not?

For example, in Linux, I have a pointer pointing to a task_struct. Later, the task_struct might migrate or deleted. How do I know whether the pointer still points to a task_struct or not?
It's not possible.
Pointers only contain addresses, and generally it's not possible to determine whether or not a given address is "valid".
Sometimes you can ask the entity that gave you the pointer to begin with if it's still valid, but that of course depends on the exact details of the entity. The language itself cannot do this.
You don't know, because:
a pointer just contains the address of the object it points to;
the type information is lost at compile time.
So, C provides no facilities for dealing with this kind of problems, you have to track what happens to stuff you point to on your own.
The most you can ask (and it is alreay OS-specific) is to check if the memory page where the structure would reside is still accessible, but usually it's not a particularly useful information.
Depending on your allocation pattern/luck, you might get a segmentation fault (which of course kills your program)...but that at least would tell you the reference is no longer valid.
However, as previously stated, the best way is to track the validity yourself.
If you need to keep moving a struct around in memory (rather than just blanking it and reinitializing it at its current location), you could consider using a pointer to a pointer to make the tracking easier.
"ie. Everything gets a reference to the pointer to the struct, and then when you move or delete the struct you just set that pointer to NULL or to the new memory location."
Also, in general, if you want to do checks on your program for this kind of weirdness, I would recommend looking into valgrind.
It is your responsibility in C to write your code so that you keep track of it. You can use the special value of NULL (representing not pointing to anything), setting the pointer to NULL when you remove (or haven't yet set) whatever it was pointing to & testing for NULL before using it. You might also design your code in a way that the question never comes up.
There is no way to query a random pointer value to see if it represents something, just like there is no way to query an int variable to check if the value in it is uninitialized, junk, or the correct result of a computation.
It is all a matter of software design and, when necessary, using the value of NULL to designate not set.

How to implement Reference counting in C?

read about it here.
I need to implement a variation of such an interface, say we are given a large memory space to manage there should be getmem(size) and free(pointer to block) functions that has to make sure free(pointer to block) can actually free the memory if and only if all processes using that block are done using it.
What I was thinking about doing is to define a Collectable struct as pointer to block, size of it, and process using it count. then whenever a process using a Collectable struct instance for the first time it has to explicitly increment the count, and whenever the process free()'s it, the count is decremented.
The problem with this approach is that all processes must respond to that interface and make it explicitly work : whenever assigning collectable pointer to an instance the process must explicitly inc that counter, which does not satisfy me, I was thinking maybe there is a way to create a macro for this to happen implicitly in every assignment?
I'm seeking of ways to approach this problem for a while, so other approaches and ideas would be great...
EDIT : the above approach doesn't satisfy me not only because it doesn't look nice but mostly because I cant assume a running process's code would care for updating my count. I need a way to make sure its done without changing the process's code...
An early problem with reference counting is that it is relatively easy to count the initial reference by putting code in a custom malloc / free implementation, but it is quite a bit harder to determine if the initial recipient passes that address around to others.
Since C lacks the ability to override the assignment operator (to count the new reference), basically you are left with a limited number of options. The only one that can possibly override the assignment is macrodef, as it has the ability to rewrite the assignment into something that inlines the increment of the reference count value.
So you need to "expand" a macro that looks like
a = b;
into
if (b is a pointer) { // this might be optional, if lookupReference does this work
struct ref_record* ref_r = lookupReference(b);
if (ref_r) {
ref_r->count++;
} else {
// error
}
}
a = b;
The real trick will be in writing a macro that can identify the assignment, and insert the code cleanly without introducing other unwanted side-effects. Since macrodef is not a complete language, you might run into issues where the matching becomes impossible.
(jokes about seeing nails where you learn how to use a hammer have an interesting parallel here, except that when you only have a hammer, you had better learn how to make everything a nail).
Other options (perhaps more sane, perhaps not) is to keep track of all address values assigned by malloc, and then scan the program's stack and heap for matching addresses. If you match, you might have found a valid pointer, or you might have found a string with a luck encoding; however, if you don't match, you certainly can free the address; provided they aren't storing an address + offset calculated from the original address. (perhaps you can macrodef to detect such offsets, and add the offset as multiple addresses in the scan for the same block)
In the end, there isn't going to be a foolproof solution without building a referencing system, where you pass back references (pretend addresses); hiding the real addresses. The down side to such a solution is that you must use the library interface every time you want to deal with an address. This includes the "next" element in the array, etc. Not very C-like, but a pretty good approximation of what Java does with its references.
Semi-serious answer
#include "Python.h"
Python has a great reference counting memory manager. If I had to do this for real in production code, not homework, I'd consider embedding the python object system in my C program which would then make my C program scriptable in python too. See the Python C API documentation if you are interested!
Such a system in C requires some discipline on the part of the programmer but ...
You need to think in terms of ownership. All things that hold references are owners and must keep track of the objects to which it holds references, e.g. through lists. When a reference holding thing is destroyed it must loop its list of referred objects and decrement their reference counters and if zero destroy them in turn.
Functions are also owners and should keep track of referenced objects, e.g. by setting up a list at the start of the function and looping through it when returning.
So you need to determine in which situations objects should be transferred or shared with new owners and wrap the corresponding situations in macros/functions that add or remove owned objects to owning objects' lists of referenced objects (and adjust the reference counter accordingly).
Finally you need to deal with circular references somehow by checking for objects that are no longer reachable from objects/pointers on the stack. That could be done with some mark and sweep garbage collection mechanism.
I don't think you can do it automatically without overridable destructors/constructors.
You can look at HDF5 ref counting but those require explicit calls in C:
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5I.html

Corrupt pointer in a linked list

How would you find if one of the pointers in a linked list is corrupted or not ?
Introduce a magic value in your node structures. Initialize it upon new node allocation. Before every access, check if the node structure that the pointer points to contains the valid magic. If the pointer points at an unreadable data block, your program will crash. For that, on Windows there's API VirtualQuery() - call that before reading, and make sure the pointer points at readable data.
There are several possibilities.
If the list is doubly linked, it's possible to check the back pointer from what a front pointer points to, or vice versa.
If you have some idea as to the range of expected memory addresses, you can check. This is particularly true of the linked list is allocated from a limited number of chunks of memory, rather than having each node allocated independently.
If the nodes have some recognizable data in them, you can run down the list and check for recognizable data.
This looks to me like one of those questions where the interviewer isn't expecting a snappy answer, but rather an analysis of the question including further questions from you.
It's sort of a pain, but you can record the values of each pointer as you come across them with your debugger and verify that it's consistent with what you'd expect to find (if you'd expect a pointer to be NULL, make sure it's NULL. if you'd expect a pointer to refer to an already existing object, verify that that object's address has that value, etc.).
Yuo could keep a doubly linked list. Then you can check that node->child->parent == node (although if node->child has become corrupt this has a reasonable chance of causing an exception)
Several debuggers / bound-checkers will do this for you, but a cheap and quick solution to this question is to
Alter the structure of the list's nodes to include one additional char[n] field (or more typically two, one as the first the other as the last fields in the structure, hence allowing bounds-checking in addition to pointer corruption).
Initiallize these fields with a short (but long enough...) constant string such as "VaL1D-LiST-NODE 1234" when the nodes are created.
Check that the values read in this(these) field(s) match the expected text, each time a node is dereferenced, and before using the node in earnest.
When the field(s)' value do not match this is either the indication that:
the pointer is invalid (it never pointed to a list node)
something else is overwriting the node structure (the pointer is "valid" but the data it points to has been corrupted).

Resources