I've spent the last few weeks doing a lot of reading on memory models, compiler reordering, CPU reordering, memory barriers, and lock-free programming and I think I've driven myself into confusion now. I've written a single producer single consumer queue and am trying to figure out where I need memory barriers for things to work and if some operations need to be atomic. My single producer single consumer queue is as follows:
typedef struct queue_node_t {
int data;
struct queue_node_t *next;
} queue_node_t;
// Empty Queue looks like this:
// HEAD TAIL
// | |
// dummy_node
// Queue: insert at TAIL, remove from HEAD
// HEAD TAIL
// | |
// dummy_node -> 1 -> 2 -> NULL
typedef struct queue_t {
queue_node_t *head; // consumer consumes from head
queue_node_t *tail; // producer adds at tail
} queue_t;
queue_node_t *alloc_node(int data) {
queue_node_t *new_node = (queue_node_t *)malloc(sizeof(queue_node_t));
new_node->data = data;
new_node->next = NULL;
return new_node;
}
queue_t *create_queue() {
queue_t *new_queue = (queue_t *)malloc(sizeof(queue_t));
queue_node_t *dummy_node = alloc_node(0);
dummy_node->next = NULL;
new_queue->head = dummy_node;
new_queue->tail = dummy_node;
// 1. Do we need any kind of barrier to make sure that if the
// thread that didn't call this performs a queue operation
// and happens to run on a different CPU that queue structure
// is fully observed by it? i.e. the head and tail are properly
// initialized
return new_queue;
}
// Enqueue modifies tail
void enqueue(queue_t *the_queue, int data) {
queue_node_t *new_node = alloc_node(data);
// insert at tail
new_node->next = NULL;
// Let us save off the existing tail
queue_node_t *old_tail = the_queue->tail;
// Make the new node the new tail
the_queue->tail = new_node;
// 2. Store/Store barrier needed here?
// Link in the new node last so that a concurrent dequeue doesn't see
// the node until we're done with it
// I don't know that this needs to be atomic but it does need to have
// release semantics so that this isn't visible until prior writes are done
old_tail->next = the_queue->tail;
return;
}
// Dequeue modifies head
bool dequeue(queue_t *the_queue, int *item) {
// 3. Do I need any barrier here to make sure if an enqueue already happened
// I can observe it? i.e., if an enqueue was called on
// an empty queue by thread 0 on CPU0 and dequeue is called
// by thread 1 on CPU1
// dequeue the oldest item (FIFO) which will be at the head
if (the_queue->head->next == NULL) {
return false;
}
*item = the_queue->head->next->data;
queue_node_t *old_head = the_queue->head;
the_queue->head = the_queue->head->next;
free(old_head);
return true;
}
Here are my questions corresponding to the comments in my code above:
In create_queue() do I need some kind of barrier before I return? I'm wondering if I call this function from thread 0 running on CPU0 and then use the pointer returned in thread 1 which happens to run on CPU1 is it possible thread 1 sees a queue_t structure that isn't fully initialized?
Do I need a barrier in enqueue() to make sure the new node isn't linked in to the queue until all of the new node's fields are initialized?
Do I need a barrier in dequeue()? I feel like it would be correct without one but I may need one if I want to make sure I see any completed enqueue.
Update: I tried to make it clear with the comments in the code but the HEAD of this queue always points to a dummy node. This is a common technique that makes it so that the producer only ever needs to access the TAIL and the consumer only ever accesses the HEAD. An empty queue will contain a dummy node and dequeue() always returns the node after the HEAD, if there is one. As nodes are dequeued the dummy node advances and the previous "dummy" is freed.
first of all, it depends your specific hardware-architecture, OS, language, etc.
1.)
no. because you need a additional barrier to pass the pointer to the other thread anyway
2.)
yes, old_tail->next = the_queue->tail needs to be executed after the_queue->tail = new_node
3.)
It would have no effect, since there is nothing before the barrier, but theoretically you could need a barrier after old_tail->next = the_queue->tail in enqueue(). The compiler wont reorder outside of a function, but the CPU could maybe do something like that. (very unlikely, but not 100% sure)
OT: since you are already doing some micro-optimization, you could add some padding for the cache
typedef struct queue_t {
queue_node_t *head; // consumer consumes from head
char cache_pad[64]; // head and tail shouldnt be in the same cache-line(->64 Byte)
queue_node_t *tail; // producer adds at tail
} queue_t;
and if you have really enough memory to waste, you could do something like this
typedef struct queue_node_t {
int data;
struct queue_node_t *next;
char cache_pad[56]; // sizeof(queue_node_t) == 64; only for 32Bit
} queue_node_t;
Related
I'm a bit confused on how to check if a memory allocation failed in order to prevent any undefined behaviours caused by a dereferenced NULL pointer.
I know that malloc (and similiar functions) can fail and return NULL, and that for this reason the address returned should always be checked before proceeding with the rest of the program. What I don't get is what's the best way to handle these kind of cases. In other words: what is a program supposed to do when a malloc call returns NULL?
I was working on this implementation of a doubly linked list when this doubt raised.
struct ListNode {
struct ListNode* previous;
struct ListNode* next;
void* object;
};
struct ListNode* newListNode(void* object) {
struct ListNode* self = malloc(sizeof(*self));
if(self != NULL) {
self->next = NULL;
self->previous = NULL;
self->object = object;
}
return self;
}
The initialization of a node happens only if its pointer was correctly allocated. If this didn't happen, this constructor function returns NULL.
I've also written a function that creates a new node (calling the newListNode function) starting from an already existing node and then returns it.
struct ListNode* createNextNode(struct ListNode* self, void* object) {
struct ListNode* newNext = newListNode(object);
if(newNext != NULL) {
newNext->previous = self;
struct ListNode* oldNext = self->next;
self->next = newNext;
if(oldNext != NULL) {
newNext->next = oldNext;
oldNext->previous = self->next;
}
}
return newNext;
}
If newListNode returns NULL, createNextNode as well returns NULL and the node passed to the function doesn't get touched.
Then the ListNode struct is used to implement the actual linked list.
struct LinkedList {
struct ListNode* first;
struct ListNode* last;
unsigned int length;
};
_Bool addToLinkedList(struct LinkedList* self, void* object) {
struct ListNode* newNode;
if(self->length == 0) {
newNode = newListNode(object);
self->first = newNode;
}
else {
newNode = createNextNode(self->last, object);
}
if(newNode != NULL) {
self->last = newNode;
self->length++;
}
return newNode != NULL;
}
if the creation of a new node fails, the addToLinkedList function returns 0 and the linked list itself is left untouched.
Finally, let's consider this last function which adds all the elements of a linked list to another linked list.
void addAllToLinkedList(struct LinkedList* self, const struct LinkedList* other) {
struct ListNode* node = other->first;
while(node != NULL) {
addToLinkedList(self, node->object);
node = node->next;
}
}
How should I handle the possibility that addToLinkedList might return 0? For what I've gathered, malloc fails when its no longer possible to allocate memory, so I assume that subsequent calls after an allocation failure would fail as well, am I right? So, if 0 is returned, should the loop immediately stop since it won't be possible to add any new elements to the list anyway?
Also, is it correct to stack all of these checks one over another the way I did it? Isn't it redundant? Would it be wrong to just immediately terminate the program as soon as malloc fails? I read that it would be problematic for multi-threaded programs and also that in some istances a program might be able to continue to run without any further allocation of memory, so it would be wrong to treat this as a fatal error in any possible case. Is this right?
Sorry for the really long post and thank you for your help!
It depends on the broader circumstances. For some programs, simply aborting is the right thing to do.
For some applications, the right thing to do is to shrink caches and try the malloc again. For some multithreaded programs, just waiting (to give other threads a chance to free memory) and retrying will work.
For applications that need to be highly reliable, you need an application level solution. One solution that I've used and battle tested is this:
Have an emergency pool of memory allocated at startup.
If malloc fails, free some of the emergency pool.
For calls that can't sanely handle a NULL response, sleep and retry.
Have a service thread that tries to refill the emergency pool.
Have code that uses caching respond to a non-full emergency pool by reducing memory consumption.
If you have the ability to shed load, for example, by shifting load to other instances, do so if the emergency pool isn't full.
For discretionary actions that require allocating a lot of memory, check the level of the emergency pool and don't do the action if it's not full or close to it.
If the emergency pool gets empty, abort.
How to handle malloc failing and returning NULL?
Consider if the code is a set of helper functions/library or application.
The decision to terminate is best handled by higher level code.
Example: Aside from exit(), abort() and friends, the Standard C library does not exit.
Likewise returning error codes/values is a reasonable solution for OP's low-level function sets too. Even for addAllToLinkedList(), I'd consider propagating the error in the return code. (Non-zero is some error.)
// void addAllToLinkedList(struct LinkedList* self, const struct LinkedList* other) {
int addAllToLinkedList(struct LinkedList* self, const struct LinkedList* other) {
...
if (addToLinkedList(self, node->object) == NULL) {
// Do some house-keepeing (undo prior allocations)
return -1;
}
For the higher level application, follow your design. For now, it may be a simple enough to exit with a failure message.
if (addAllToLinkedList(self, ptrs)) {
fprintf(stderr, "Linked List failure in %s %u\n", __func__, __LINE__);
exit(EXIT_FAILURE);
}
Example of not exiting:
Consider a routine that read a file into a data structure with many uses of LinkedList and the file was somehow corrupted leading to excessive memory allocations. Code may want to simply free everything for that file (but just for that file), and simply report to the user "invalid file/out-of-memory" - and continue running.
if (addAllToLinkedList(self, ptrs)) {
free_file_to_struct_resouces(handle);
return oops;
}
...
return success;
Take away
Low level routines indicate an error somehow. Higher level routines can exit code if desired.
I'm now implementing Barnes-Hut Algorithms for simulating N-body problem. I only want to ask about the building-tree part.
There are two functions I made to build the tree for it.
I recursively build the tree, and print the data of each node while building and everything seems correct, but when the program is back to the main function only the root of the tree and the child of the root stores the value. Other nodes' values are not stored, which is weird since I printed them during the recursion and they should have been stored.
Here's some part of the code with modification, which I thought where the problem might be in:
#include<...>
typedef struct node{
int data;
struct node *child1,*child2;
}Node;
Node root; // a global variable
int main(){
.
set_root_and_build(); // is called not only once cuz it's actually in a loop
traverse(&root);
.
}
Here's the function set_root_and_build():
I've set the child pointers to NULL, but didn't show it at first.
void set_root_and_build(){
root.data = ...;
..// set child1 and child2 =NULL;
build(&root,...); // ... part are values of data for it's child
}
And build:
void build(Node *n,...){
Node *new1, *new2 ;
new1 = (Node*)malloc(sizeof(Node));
new2 = (Node*)malloc(sizeof(Node));
... // (set data of new1 and new2 **,also their children are set NULL**)
if(some condition holds for child1){ // else no link, so n->child1 should be NULL
build(new1,...);
n->child1 = new1;
//for debugging, print data of n->child1 & and->child2
}
if(some condition holds for child2){ // else no link, so n->child2 should be NULL
build(new2,...);
n->child1 = new2;
//for debugging, print data of n->child1 & and->child2
}
}
Nodes in the tree may have 1~2 children, not all have 2 children here.
The program prints out the correct data when it's in build() function recursion, but when it is back to main function and calls traverse(), it fails due to a segmentation fault.
I tried to print everything in traverse() and found that only the root, and root.child1, root.child2 stores the value just as what I've mentioned.
Since I have to called build() several times, and even in parallel, new1 and new2 can't be defined as global variables. (but I don't think they cause the problem here).
Does anyone know where it goes wrong?
The traverse part with debugging info:
void traverse(Node n){
...//print out data of n
if(n.child1!=NULL)
traverse(*(n.child1))
...//same for child2
}
You may not be properly setting the children of n when the condition does not hold. You might want this instead:
void set_root_and_build()
{
root.data = ...;
build(&root,...); // ... part are values of data for it's child
}
void build(Node *n,...)
{
n->child1 = n->child2 = NULL;
Node *new1, *new2;
new1 = (Node*) malloc(sizeof(Node));
new2 = (Node*) malloc(sizeof(Node));
// set data of new1 and new2 somehow (read from stdin?)
if (some condition holds for new1)
{
n->child1 = new1;
build(n->child1,...);
//for debugging, print data of n->child1
}
else
free(new1); // or whatever else you need to do to reclaim new1
if (some condition holds for new2)
{
n->child2 = new2;
build(n->child2,...);
//for debugging, print data of n->child2
}
else
free(new2); // or whatever else you need to do to reclaim new2
}
Of course, you should be checking the return values of malloc() and handling errors too.
Also, your traversal is a bit strange as it recurses by copy rather than reference. Do you have a good reason for doing that? If not, then maybe you want:
void traverse(Node *n)
{
...//print out data of n
if (n->child1 != NULL)
traverse(n->child1)
...//same for child2
}
The problem in your tree traversal is that you certainly process the tree until you find a node pointer which is NULL.
Unfortunately when you create the nodes, these are not initialized neither with malloc() nor with new (it would be initialized with calloc() but this practice in cpp code is as bad as malloc()). So your traversal continues to loop/recurse in the neverland of random pointers.
I propose you to take benefit of cpp and change slightly your structure to:
struct Node { // that's C++: no need for typedef
int data;
struct node *child1,*child2;
Node() : data(0), child1(nullptr), child2(nullptr) {} // Makes sure that every created are first initalized
};
And later get rid of your old mallocs. And structure the code to avoid unnecessary allocations:
if(some condition holds for child1){ // else no link, so n->child1 should be NULL
new1=new Node; // if you init it here, no need to free in an else !!
build(new1,...);
n->child1 = new1;
...
}
if (... child2) { ... }
Be aware however that poitners allocated with new should be released with delete and note with free().
Edit: There is a mismatch in your code snippet:
traverse(&root); // you send here a Node*
void traverse(Node n){ // but your function defines an argument by value !
...
}
Check that you didn't overllok some warnings from the compiler, and that you have no abusive cast in your code.
I have implemented a lock free queue in C using compare and swap based on http://www.boyet.com/articles/LockfreeQueue.html.
Its working great but I'm trying to integrate this queue into a lock free skip-list that i have implemented. I'm using the skip-list as a priority queue and would like to use the lock free queue inside each node to store multiple values when there is a priority collision. however due to the way nodes are managed in the skip list when i detect a priority collision i need to be able to add the item to the queue only if the queue is not empty.
due to the lock free nature of the queue im not sure how to actually perform this operation.
So basically how would i write an atomic enqueue_if_not_empty operation?
EDIT: As it was noticed, I wrote the function with quite the opposite semantics - enqueuing only into an empty queue. I fixed the name to reflect that, and decided to leave it as is just in case someone will be interested. So, this is not the right answer to the question, but do not downvote please, unless you find another reason :)
Below is an attempt to add EnqueueIfEmpty() to the queue implementation in the referenced paper. I did not verify that it works or even compiles.
The basic idea is that you insert a new node right after the head (and not the tail), provided that head's next is currently null (which is the necessary condition for an empty queue). I left additional checks for head being equal to tail, which possibly can be removed.
public bool EnqueueIfEmpty(T item) {
// Return immediately if the queue is not empty.
// Possibly the first condition is redundant.
if (head!=tail || head.Next!=null)
return false;
SingleLinkNode<T> oldHead = null;
// create and initialize the new node
SingleLinkNode<T> node = new SingleLinkNode<T>();
node.Item = item;
// loop until we have managed to update the tail's Next link
// to point to our new node
bool Succeeded = false;
while (head==tail && !Succeeded) {
// save the current value of the head
oldHead = head;
// providing that the tail still equals to head...
if (tail == oldHead) {
// ...and its Next field is null...
if (oldhead.Next == null) {
// ...try inserting new node right after the head.
// Do not insert at the tail, because that might succeed
// with a non-empty queue as well.
Succeeded = SyncMethods.CAS<SingleLinkNode<T>>(ref head.Next, null, node);
}
// if the head's Next field was non-null, another thread is
// in the middle of enqueuing a new node, so the queue becomes non-empty
else {
return false;
}
}
}
if (Succeeded) {
// try and update the tail field to point to our node; don't
// worry if we can't, another thread will update it for us on
// the next call to Enqueue()
SyncMethods.CAS<SingleLinkNode<T>>(ref tail, oldHead, node);
}
return Succeeded;
}
Well, Enqueue-If-Not-Empty appears to be relatively straightforward, but with a limitation: other threads may concurrently remove all previous items from the queue, so that after insertion at the tail is done, the new item might happen to be the first in the queue. Since atomic compare-and-swap operations are done with different fields (enqueuing changes tail.Next while dequeuing advances head), stronger guarantees would require additional complexity not only in this function but at least in Dequeue() as well.
The following changes to the normal Enqueue() method are sufficient:
1) at the function start, check for head.Next being null, and if so, return immediately as the queue is empty.
2) add head.Next!=null into the loop condition in case enqueuing attempts should be stopped if the initially non-empty queue becomes empty before insertion succeeds. This does not prevent the situation I descibed above (because there is a time window between the check for emptiness and the node insertion), but reduces its chance to happen.
3) at the end of the function, only try advancing the tail if the new node was successfully enqueued (like I did in the Enqueue-If-Empty answer).
I have a list in C that is something like this:
typedef struct _node
{
int number;
DWORD threadID;
HANDLE threadH;
struct *_node next;
} *node;
And you have somthing like this:
node new_node = malloc(sizeof(node));
As you may have guessed out, this list will store information for threads, including their handlers and Id's. Still I am having trouble when I try to do this:
free(new_node);
Everytime I try to do this I encounter an unexpected error, VS saying that there was a data corruption. I've pinned down as much as possible and I found that the problem resides when I try to use free the handle.
I've searched on MSDN how to do this but the only thing I can find is the function that closes the thread (which is not intended here, since I want the thread to run, just deleting it's record from the list).
The question is: how I am supposed to free an handle from the memory? (Considering that this is only a copy of the value of the handle, the active handle is not being deleted).
EDIT: This is the function to insert nodes from the list:
int insereVisitanteLista(node* lista, DWORD threadID, HANDLE threadH, int num_visitante)
{
node visitanteAnterior;
node novoVisitante = (node)malloc(sizeof(node));
if(novoVisitante == NULL)
return 0;
novoVisitante->threadID = threadID;
novoVisitante->threadH = threadH;
novoVisitante->number = num_visitante;
novoVisitante->next = NULL;
if(*lista == NULL)
{
*lista = novoVisitante;
return 1;
}
visitanteAnterior = *lista;
while(visitanteAnterior->next != NULL)
visitanteAnterior = visitanteAnterior->next;
visitanteAnterior->next =novoVisitante;
return 1;
}
And this is the function to delete nodes:
int removeVisitanteLista(node * lista, DWORD threadID)
{
node visitanteAnterior = NULL, visitanteActual;
if(*lista == NULL)
return 0;
visitanteActual = *lista;
if((*lista)->threadID == threadID)
{
*lista = visitanteActual->next;
visitanteActual->next = NULL;
free(visitanteActual);
return 1;
}
while(visitanteActual != NULL && visitanteActual->threadID != threadID)
{
visitanteAnterior = visitanteActual;
visitanteActual = visitanteActual->next;
}
if (visitanteActual == NULL)
return 0;
visitanteAnterior->next = visitanteActual->next;
free(visitanteActual);
return 1;
}
What exactly is a node that you are trying to free? Is this a pointer to a struct _node? If yes, have you allocated it previously? If no, free is not needed, otherwise you have to check if node is not NULL and make sure you do not free it multiple times. It is hard to guess what you are doing and where is an error without a minimal working example reproducing the problem. The only thing I can suggest is to read about memory management in C. This resource might help.
UPDATE:
node in your code is a pointer to _node. So sizeof (node) is a size of a pointer, which is either 4 or 8 bytes (depending on architecture). So you allocate 8 bytes, for example, but assume you have a pointer to the structure which is much larger. As a result, you corrupt memory, and behavior of the program becomes undefined. So changing node novoVisitante = (node)malloc(sizeof(node)) to node novoVisitante = (node)malloc(sizeof(_node)) should fix the problem.
You haven't shown us the context of your call to free() so I need to speculate a little but my first concern is that you didn't mention removing the node from the list before deleting it.
Start by unlinking the node by modifying the next field of the previous (or head) node. If you still get the error, then you have corrupted memory somehow by writing past the end of one of your allocated memory structures or something similar.
Also, I assume node is a pointer. You really haven't provided much information about what you're doing.
How to traverse each node of a tree efficiently without recursion in C (no C++)?
Suppose I have the following node structure of that tree:
struct Node
{
struct Node* next; /* sibling node linked list */
struct Node* parent; /* parent of current node */
struct Node* child; /* first child node */
}
It's not homework.
I prefer depth first.
I prefer no additional data struct needed (such as stack).
I prefer the most efficient way in term of speed (not space).
You can change or add the member of Node struct to store additional information.
If you don't want to have to store anything, and are OK with a depth-first search:
process = TRUE;
while(pNode != null) {
if(process) {
//stuff
}
if(pNode->child != null && process) {
pNode = pNode->child;
process = true;
} else if(pNode->next != null) {
pNode = pNode->next;
process = true;
} else {
pNode = pNode->parent;
process = false;
}
}
Will traverse the tree; process is to keep it from re-hitting parent nodes when it travels back up.
Generally you'll make use of a your own stack data structure which stores a list of nodes (or queue if you want a level order traversal).
You start by pushing any given starting node onto the stack. Then you enter your main loop which continues until the stack is empty. After you pop each node from the stack you push on its next and child nodes if not empty.
This looks like an exercise I did in Engineering school 25 years ago.
I think this is called the tree-envelope algorithm, since it plots the envelope of the tree.
I can't believe it is that simple. I must have made an oblivious mistake somewhere.
Any mistake regardless, I believe the enveloping strategy is correct.
If code is erroneous, just treat it as pseudo-code.
while current node exists{
go down all the way until a leaf is reached;
set current node = leaf node;
visit the node (do whatever needs to be done with the node);
get the next sibling to the current node;
if no node next to the current{
ascend the parentage trail until a higher parent has a next sibling;
}
set current node = found sibling node;
}
The code:
void traverse(Node* node){
while(node!=null){
while (node->child!=null){
node = node->child;
}
visit(node);
node = getNextParent(Node* node);
}
}
/* ascend until reaches a non-null uncle or
* grand-uncle or ... grand-grand...uncle
*/
Node* getNextParent(Node* node){
/* See if a next node exists
* Otherwise, find a parentage node
* that has a next node
*/
while(node->next==null){
node = node->parent;
/* parent node is null means
* tree traversal is completed
*/
if (node==null)
break;
}
node = node->next;
return node;
}
You can use the Pointer Reversal method. The downside is that you need to save some information inside the node, so it can't be used on a const data structure.
You'd have to store it in an iterable list. a basic list with indexes will work. Then you just go from 0 to end looking at the data.
If you want to avoid recursion you need to hold onto a reference of each object within the tree.