I have implemented a lock free queue in C using compare and swap based on http://www.boyet.com/articles/LockfreeQueue.html.
Its working great but I'm trying to integrate this queue into a lock free skip-list that i have implemented. I'm using the skip-list as a priority queue and would like to use the lock free queue inside each node to store multiple values when there is a priority collision. however due to the way nodes are managed in the skip list when i detect a priority collision i need to be able to add the item to the queue only if the queue is not empty.
due to the lock free nature of the queue im not sure how to actually perform this operation.
So basically how would i write an atomic enqueue_if_not_empty operation?
EDIT: As it was noticed, I wrote the function with quite the opposite semantics - enqueuing only into an empty queue. I fixed the name to reflect that, and decided to leave it as is just in case someone will be interested. So, this is not the right answer to the question, but do not downvote please, unless you find another reason :)
Below is an attempt to add EnqueueIfEmpty() to the queue implementation in the referenced paper. I did not verify that it works or even compiles.
The basic idea is that you insert a new node right after the head (and not the tail), provided that head's next is currently null (which is the necessary condition for an empty queue). I left additional checks for head being equal to tail, which possibly can be removed.
public bool EnqueueIfEmpty(T item) {
// Return immediately if the queue is not empty.
// Possibly the first condition is redundant.
if (head!=tail || head.Next!=null)
return false;
SingleLinkNode<T> oldHead = null;
// create and initialize the new node
SingleLinkNode<T> node = new SingleLinkNode<T>();
node.Item = item;
// loop until we have managed to update the tail's Next link
// to point to our new node
bool Succeeded = false;
while (head==tail && !Succeeded) {
// save the current value of the head
oldHead = head;
// providing that the tail still equals to head...
if (tail == oldHead) {
// ...and its Next field is null...
if (oldhead.Next == null) {
// ...try inserting new node right after the head.
// Do not insert at the tail, because that might succeed
// with a non-empty queue as well.
Succeeded = SyncMethods.CAS<SingleLinkNode<T>>(ref head.Next, null, node);
}
// if the head's Next field was non-null, another thread is
// in the middle of enqueuing a new node, so the queue becomes non-empty
else {
return false;
}
}
}
if (Succeeded) {
// try and update the tail field to point to our node; don't
// worry if we can't, another thread will update it for us on
// the next call to Enqueue()
SyncMethods.CAS<SingleLinkNode<T>>(ref tail, oldHead, node);
}
return Succeeded;
}
Well, Enqueue-If-Not-Empty appears to be relatively straightforward, but with a limitation: other threads may concurrently remove all previous items from the queue, so that after insertion at the tail is done, the new item might happen to be the first in the queue. Since atomic compare-and-swap operations are done with different fields (enqueuing changes tail.Next while dequeuing advances head), stronger guarantees would require additional complexity not only in this function but at least in Dequeue() as well.
The following changes to the normal Enqueue() method are sufficient:
1) at the function start, check for head.Next being null, and if so, return immediately as the queue is empty.
2) add head.Next!=null into the loop condition in case enqueuing attempts should be stopped if the initially non-empty queue becomes empty before insertion succeeds. This does not prevent the situation I descibed above (because there is a time window between the check for emptiness and the node insertion), but reduces its chance to happen.
3) at the end of the function, only try advancing the tail if the new node was successfully enqueued (like I did in the Enqueue-If-Empty answer).
Related
I'm a bit confused on how to check if a memory allocation failed in order to prevent any undefined behaviours caused by a dereferenced NULL pointer.
I know that malloc (and similiar functions) can fail and return NULL, and that for this reason the address returned should always be checked before proceeding with the rest of the program. What I don't get is what's the best way to handle these kind of cases. In other words: what is a program supposed to do when a malloc call returns NULL?
I was working on this implementation of a doubly linked list when this doubt raised.
struct ListNode {
struct ListNode* previous;
struct ListNode* next;
void* object;
};
struct ListNode* newListNode(void* object) {
struct ListNode* self = malloc(sizeof(*self));
if(self != NULL) {
self->next = NULL;
self->previous = NULL;
self->object = object;
}
return self;
}
The initialization of a node happens only if its pointer was correctly allocated. If this didn't happen, this constructor function returns NULL.
I've also written a function that creates a new node (calling the newListNode function) starting from an already existing node and then returns it.
struct ListNode* createNextNode(struct ListNode* self, void* object) {
struct ListNode* newNext = newListNode(object);
if(newNext != NULL) {
newNext->previous = self;
struct ListNode* oldNext = self->next;
self->next = newNext;
if(oldNext != NULL) {
newNext->next = oldNext;
oldNext->previous = self->next;
}
}
return newNext;
}
If newListNode returns NULL, createNextNode as well returns NULL and the node passed to the function doesn't get touched.
Then the ListNode struct is used to implement the actual linked list.
struct LinkedList {
struct ListNode* first;
struct ListNode* last;
unsigned int length;
};
_Bool addToLinkedList(struct LinkedList* self, void* object) {
struct ListNode* newNode;
if(self->length == 0) {
newNode = newListNode(object);
self->first = newNode;
}
else {
newNode = createNextNode(self->last, object);
}
if(newNode != NULL) {
self->last = newNode;
self->length++;
}
return newNode != NULL;
}
if the creation of a new node fails, the addToLinkedList function returns 0 and the linked list itself is left untouched.
Finally, let's consider this last function which adds all the elements of a linked list to another linked list.
void addAllToLinkedList(struct LinkedList* self, const struct LinkedList* other) {
struct ListNode* node = other->first;
while(node != NULL) {
addToLinkedList(self, node->object);
node = node->next;
}
}
How should I handle the possibility that addToLinkedList might return 0? For what I've gathered, malloc fails when its no longer possible to allocate memory, so I assume that subsequent calls after an allocation failure would fail as well, am I right? So, if 0 is returned, should the loop immediately stop since it won't be possible to add any new elements to the list anyway?
Also, is it correct to stack all of these checks one over another the way I did it? Isn't it redundant? Would it be wrong to just immediately terminate the program as soon as malloc fails? I read that it would be problematic for multi-threaded programs and also that in some istances a program might be able to continue to run without any further allocation of memory, so it would be wrong to treat this as a fatal error in any possible case. Is this right?
Sorry for the really long post and thank you for your help!
It depends on the broader circumstances. For some programs, simply aborting is the right thing to do.
For some applications, the right thing to do is to shrink caches and try the malloc again. For some multithreaded programs, just waiting (to give other threads a chance to free memory) and retrying will work.
For applications that need to be highly reliable, you need an application level solution. One solution that I've used and battle tested is this:
Have an emergency pool of memory allocated at startup.
If malloc fails, free some of the emergency pool.
For calls that can't sanely handle a NULL response, sleep and retry.
Have a service thread that tries to refill the emergency pool.
Have code that uses caching respond to a non-full emergency pool by reducing memory consumption.
If you have the ability to shed load, for example, by shifting load to other instances, do so if the emergency pool isn't full.
For discretionary actions that require allocating a lot of memory, check the level of the emergency pool and don't do the action if it's not full or close to it.
If the emergency pool gets empty, abort.
How to handle malloc failing and returning NULL?
Consider if the code is a set of helper functions/library or application.
The decision to terminate is best handled by higher level code.
Example: Aside from exit(), abort() and friends, the Standard C library does not exit.
Likewise returning error codes/values is a reasonable solution for OP's low-level function sets too. Even for addAllToLinkedList(), I'd consider propagating the error in the return code. (Non-zero is some error.)
// void addAllToLinkedList(struct LinkedList* self, const struct LinkedList* other) {
int addAllToLinkedList(struct LinkedList* self, const struct LinkedList* other) {
...
if (addToLinkedList(self, node->object) == NULL) {
// Do some house-keepeing (undo prior allocations)
return -1;
}
For the higher level application, follow your design. For now, it may be a simple enough to exit with a failure message.
if (addAllToLinkedList(self, ptrs)) {
fprintf(stderr, "Linked List failure in %s %u\n", __func__, __LINE__);
exit(EXIT_FAILURE);
}
Example of not exiting:
Consider a routine that read a file into a data structure with many uses of LinkedList and the file was somehow corrupted leading to excessive memory allocations. Code may want to simply free everything for that file (but just for that file), and simply report to the user "invalid file/out-of-memory" - and continue running.
if (addAllToLinkedList(self, ptrs)) {
free_file_to_struct_resouces(handle);
return oops;
}
...
return success;
Take away
Low level routines indicate an error somehow. Higher level routines can exit code if desired.
This is my function for add an element at the end of my lista but I can't find a way for fix the loop in the while, can u give me some tips/rules for make this function work?
void insCoda(t_lista *l, TipoElemLista elem){
t_lista ultimo;
t_lista temp;
temp=(node *)malloc(sizeof(node));
temp->contenuto=elem;
temp->next= NULL;
if(*l==NULL)
{
*l=temp;
printf("Dentro if");
}else{
ultimo=*l;
while(ultimo->next!=NULL)
{
ultimo=ultimo->next;
ultimo->next=temp;
}
}
}
This is thoroughly broken:
ultimo=*l;
while(ultimo->next!=NULL)
{
ultimo=ultimo->next;
ultimo->next=temp;
}
On entry, you set ultimo to (presumably) the head of the list. Then you advance past it to the next node (ultimo=ultimo->next), and immediately set the next pointer of that node to your newly allocated node (ultimo->next=temp). Except oops, your very next action is to test if the thing you just set is NULL or not (and it isn't, unless malloc failed). So you process your new node, and set its next to itself. And now you're in an infinite loop. If you don't enter the loop (because your head is the only node, so the loop condition fails immediately), you never insert the new node at all (which is nice, because this is saving you from the infinite loop).
A hint: Don't set next inside the loop. While I haven't tested, simply moving the set outside the loop should work:
ultimo=*l;
while(ultimo->next!=NULL)
{
ultimo=ultimo->next;
}
ultimo->next=temp;
so now you traverse to the final node, then make your new node the final node.
I've spent the last few weeks doing a lot of reading on memory models, compiler reordering, CPU reordering, memory barriers, and lock-free programming and I think I've driven myself into confusion now. I've written a single producer single consumer queue and am trying to figure out where I need memory barriers for things to work and if some operations need to be atomic. My single producer single consumer queue is as follows:
typedef struct queue_node_t {
int data;
struct queue_node_t *next;
} queue_node_t;
// Empty Queue looks like this:
// HEAD TAIL
// | |
// dummy_node
// Queue: insert at TAIL, remove from HEAD
// HEAD TAIL
// | |
// dummy_node -> 1 -> 2 -> NULL
typedef struct queue_t {
queue_node_t *head; // consumer consumes from head
queue_node_t *tail; // producer adds at tail
} queue_t;
queue_node_t *alloc_node(int data) {
queue_node_t *new_node = (queue_node_t *)malloc(sizeof(queue_node_t));
new_node->data = data;
new_node->next = NULL;
return new_node;
}
queue_t *create_queue() {
queue_t *new_queue = (queue_t *)malloc(sizeof(queue_t));
queue_node_t *dummy_node = alloc_node(0);
dummy_node->next = NULL;
new_queue->head = dummy_node;
new_queue->tail = dummy_node;
// 1. Do we need any kind of barrier to make sure that if the
// thread that didn't call this performs a queue operation
// and happens to run on a different CPU that queue structure
// is fully observed by it? i.e. the head and tail are properly
// initialized
return new_queue;
}
// Enqueue modifies tail
void enqueue(queue_t *the_queue, int data) {
queue_node_t *new_node = alloc_node(data);
// insert at tail
new_node->next = NULL;
// Let us save off the existing tail
queue_node_t *old_tail = the_queue->tail;
// Make the new node the new tail
the_queue->tail = new_node;
// 2. Store/Store barrier needed here?
// Link in the new node last so that a concurrent dequeue doesn't see
// the node until we're done with it
// I don't know that this needs to be atomic but it does need to have
// release semantics so that this isn't visible until prior writes are done
old_tail->next = the_queue->tail;
return;
}
// Dequeue modifies head
bool dequeue(queue_t *the_queue, int *item) {
// 3. Do I need any barrier here to make sure if an enqueue already happened
// I can observe it? i.e., if an enqueue was called on
// an empty queue by thread 0 on CPU0 and dequeue is called
// by thread 1 on CPU1
// dequeue the oldest item (FIFO) which will be at the head
if (the_queue->head->next == NULL) {
return false;
}
*item = the_queue->head->next->data;
queue_node_t *old_head = the_queue->head;
the_queue->head = the_queue->head->next;
free(old_head);
return true;
}
Here are my questions corresponding to the comments in my code above:
In create_queue() do I need some kind of barrier before I return? I'm wondering if I call this function from thread 0 running on CPU0 and then use the pointer returned in thread 1 which happens to run on CPU1 is it possible thread 1 sees a queue_t structure that isn't fully initialized?
Do I need a barrier in enqueue() to make sure the new node isn't linked in to the queue until all of the new node's fields are initialized?
Do I need a barrier in dequeue()? I feel like it would be correct without one but I may need one if I want to make sure I see any completed enqueue.
Update: I tried to make it clear with the comments in the code but the HEAD of this queue always points to a dummy node. This is a common technique that makes it so that the producer only ever needs to access the TAIL and the consumer only ever accesses the HEAD. An empty queue will contain a dummy node and dequeue() always returns the node after the HEAD, if there is one. As nodes are dequeued the dummy node advances and the previous "dummy" is freed.
first of all, it depends your specific hardware-architecture, OS, language, etc.
1.)
no. because you need a additional barrier to pass the pointer to the other thread anyway
2.)
yes, old_tail->next = the_queue->tail needs to be executed after the_queue->tail = new_node
3.)
It would have no effect, since there is nothing before the barrier, but theoretically you could need a barrier after old_tail->next = the_queue->tail in enqueue(). The compiler wont reorder outside of a function, but the CPU could maybe do something like that. (very unlikely, but not 100% sure)
OT: since you are already doing some micro-optimization, you could add some padding for the cache
typedef struct queue_t {
queue_node_t *head; // consumer consumes from head
char cache_pad[64]; // head and tail shouldnt be in the same cache-line(->64 Byte)
queue_node_t *tail; // producer adds at tail
} queue_t;
and if you have really enough memory to waste, you could do something like this
typedef struct queue_node_t {
int data;
struct queue_node_t *next;
char cache_pad[56]; // sizeof(queue_node_t) == 64; only for 32Bit
} queue_node_t;
I'm now implementing Barnes-Hut Algorithms for simulating N-body problem. I only want to ask about the building-tree part.
There are two functions I made to build the tree for it.
I recursively build the tree, and print the data of each node while building and everything seems correct, but when the program is back to the main function only the root of the tree and the child of the root stores the value. Other nodes' values are not stored, which is weird since I printed them during the recursion and they should have been stored.
Here's some part of the code with modification, which I thought where the problem might be in:
#include<...>
typedef struct node{
int data;
struct node *child1,*child2;
}Node;
Node root; // a global variable
int main(){
.
set_root_and_build(); // is called not only once cuz it's actually in a loop
traverse(&root);
.
}
Here's the function set_root_and_build():
I've set the child pointers to NULL, but didn't show it at first.
void set_root_and_build(){
root.data = ...;
..// set child1 and child2 =NULL;
build(&root,...); // ... part are values of data for it's child
}
And build:
void build(Node *n,...){
Node *new1, *new2 ;
new1 = (Node*)malloc(sizeof(Node));
new2 = (Node*)malloc(sizeof(Node));
... // (set data of new1 and new2 **,also their children are set NULL**)
if(some condition holds for child1){ // else no link, so n->child1 should be NULL
build(new1,...);
n->child1 = new1;
//for debugging, print data of n->child1 & and->child2
}
if(some condition holds for child2){ // else no link, so n->child2 should be NULL
build(new2,...);
n->child1 = new2;
//for debugging, print data of n->child1 & and->child2
}
}
Nodes in the tree may have 1~2 children, not all have 2 children here.
The program prints out the correct data when it's in build() function recursion, but when it is back to main function and calls traverse(), it fails due to a segmentation fault.
I tried to print everything in traverse() and found that only the root, and root.child1, root.child2 stores the value just as what I've mentioned.
Since I have to called build() several times, and even in parallel, new1 and new2 can't be defined as global variables. (but I don't think they cause the problem here).
Does anyone know where it goes wrong?
The traverse part with debugging info:
void traverse(Node n){
...//print out data of n
if(n.child1!=NULL)
traverse(*(n.child1))
...//same for child2
}
You may not be properly setting the children of n when the condition does not hold. You might want this instead:
void set_root_and_build()
{
root.data = ...;
build(&root,...); // ... part are values of data for it's child
}
void build(Node *n,...)
{
n->child1 = n->child2 = NULL;
Node *new1, *new2;
new1 = (Node*) malloc(sizeof(Node));
new2 = (Node*) malloc(sizeof(Node));
// set data of new1 and new2 somehow (read from stdin?)
if (some condition holds for new1)
{
n->child1 = new1;
build(n->child1,...);
//for debugging, print data of n->child1
}
else
free(new1); // or whatever else you need to do to reclaim new1
if (some condition holds for new2)
{
n->child2 = new2;
build(n->child2,...);
//for debugging, print data of n->child2
}
else
free(new2); // or whatever else you need to do to reclaim new2
}
Of course, you should be checking the return values of malloc() and handling errors too.
Also, your traversal is a bit strange as it recurses by copy rather than reference. Do you have a good reason for doing that? If not, then maybe you want:
void traverse(Node *n)
{
...//print out data of n
if (n->child1 != NULL)
traverse(n->child1)
...//same for child2
}
The problem in your tree traversal is that you certainly process the tree until you find a node pointer which is NULL.
Unfortunately when you create the nodes, these are not initialized neither with malloc() nor with new (it would be initialized with calloc() but this practice in cpp code is as bad as malloc()). So your traversal continues to loop/recurse in the neverland of random pointers.
I propose you to take benefit of cpp and change slightly your structure to:
struct Node { // that's C++: no need for typedef
int data;
struct node *child1,*child2;
Node() : data(0), child1(nullptr), child2(nullptr) {} // Makes sure that every created are first initalized
};
And later get rid of your old mallocs. And structure the code to avoid unnecessary allocations:
if(some condition holds for child1){ // else no link, so n->child1 should be NULL
new1=new Node; // if you init it here, no need to free in an else !!
build(new1,...);
n->child1 = new1;
...
}
if (... child2) { ... }
Be aware however that poitners allocated with new should be released with delete and note with free().
Edit: There is a mismatch in your code snippet:
traverse(&root); // you send here a Node*
void traverse(Node n){ // but your function defines an argument by value !
...
}
Check that you didn't overllok some warnings from the compiler, and that you have no abusive cast in your code.
I have lockless queues written in C in form of a linked list that contains requests from several threads posted to and handled in a single thread. After a few hours of stress I end up having the last request's next pointer pointing to itself, which creates an endless loop and locks up the handling thread.
The application runs (and fails) on both Linux and Windows. I'm debugging on Windows, where my COMPARE_EXCHANGE_PTR maps to InterlockedCompareExchangePointer.
This is the code that pushes a request to the head of the list, and is called from several threads:
void push_request(struct request * volatile * root, struct request * request)
{
assert(request);
do {
request->next = *root;
} while(COMPARE_EXCHANGE_PTR(root, request, request->next) != request->next);
}
This is the code that gets a request from the end of the list, and is only called by a single thread that is handling them:
struct request * pop_request(struct request * volatile * root)
{
struct request * volatile * p;
struct request * request;
do {
p = root;
while(*p && (*p)->next) p = &(*p)->next; // <- loops here
request = *p;
} while(COMPARE_EXCHANGE_PTR(p, NULL, request) != request);
assert(request->next == NULL);
return request;
}
Note that I'm not using a tail pointer because I wanted to avoid the complication of having to deal with the tail pointer in push_request. However I suspect that the problem might be in the way I find the end of the list.
There are several places that push a request into the queue, but they all look generaly like this:
// device->requests is defined as struct request * volatile requests;
struct request * request = malloc(sizeof(struct request));
if(request) {
// fill out request fields
push_request(&device->requests, request);
sem_post(device->request_sem);
}
The code that handles the request is doing more than that, but in essence does this in a loop:
if(sem_wait_timeout(device->request_sem, timeout) == sem_success) {
struct request * request = pop_request(&device->requests);
// handle request
free(request);
}
I also just added a function that is checking the list for duplicates before and after each operation, but I'm afraid that this check will change the timing so that I will never encounter the point where it fails. (I'm waiting for it to break as I'm writing this.)
When I break the hanging program the handler thread loops in pop_request at the marked position. I have a valid list of one or more requests and the last one's next pointer points to itself. The request queues are usually short, I've never seen more then 10, and only 1 and 3 the two times I could take a look at this failure in the debugger.
I thought this through as much as I could and I came to the conclusion that I should never be able to end up with a loop in my list unless I push the same request twice. I'm quite sure that this never happens. I'm also fairly sure (although not completely) that it's not the ABA problem.
I know that I might pop more than one request at the same time, but I believe this is irrelevant here, and I've never seen it happening. (I'll fix this as well)
I thought long and hard about how I can break my function, but I don't see a way to end up with a loop.
So the question is: Can someone see a way how this can break? Can someone prove that this can not?
Eventually I will solve this (maybe by using a tail pointer or some other solution - locking would be a problem because the threads that post should not be locked, I do have a RW lock at hand though) but I would like to make sure that changing the list actually solves my problem (as opposed to makes it just less likely because of different timing).
It's subtle but you do have a race condition there.
Start with a list with one element in it, req1. So we have:
device->requests == req1;
req1->next == NULL;
Now, we push a new element req2, and simultaneously try to pop the queue. The push for req2 starts first. The while loop body runs, so we now have:
device->requests == req1;
req1->next == NULL;
req2->next == req1;
Then the COMPARE_EXCHANGE_PTR runs, so we have:
device->requests == req2;
req1->next == NULL;
req2->next == req1;
...and the COMPARE_EXCHANGE_PTR() returns req1. Now, at this point, before the comparison in the while condition, the push gets interrupted and the pop starts running.
The pop runs correctly to completion, popping off req1 - which means that we have:
device->requests == req2;
req2->next == NULL;
The push restarts. It now fetches request->next to do the comparison - and it fetches the new value of req2->next, which is NULL. It compares req1 with NULL, the comparison succeeds, the while loop runs again, and now we have:
device->requests == req2;
req2->next == req2;
This time the test fails, the while loop exits, and you have your loop.
This should fix it:
void push_request(struct request * volatile * root, struct request * request)
{
struct request *oldroot;
assert(request);
do {
request->next = oldroot = *root;
} while(COMPARE_EXCHANGE_PTR(root, request, oldroot) != oldroot);
}