in spite of having so many efficient data structures, why is only linked list used
so heavily in systems programming? Is it because it allows least usage of heap/less buggy code?
Regards,
Pwn
Linked List is a very efficient data structure to use for something like a queue or stack where you want to add something at the end or remove it from the beginning. Systems programming deals a lot with queues and stacks.
I'm guessing because it's just extremely simple to implement and understand. And it has an extremely small overhead.
The kernel has access to copious amounts of memory (it's in charge of it, after all) and mainly has to control lists of things (rather than associative structures that connect one thing with another).
So it's just a good match. There is no (or rarely) any need to complicate it further.
Linked lists are quite efficient for many systems-level tasks:
Queues (for scheduling)
Collections where your main operation is to visit every single thing in the collection, e.g., gather information about every active process
Memory blocks (for first-fit allocation)
Collections where you add or remove one element at a time
It's quite rare in systems programming to have a collection where you have to look things up by key, or to have to take the union of two sets. Except of course in filesystems, where you will find that more sophisticated data structures are used.
Is it because it allows least usage of heap/less buggy code?
No. In many cases, a linked list is the most appropriate data structure for the job.
Linked lists provide an easy implementation for several important abstract data structures, including stacks, queues, hash tables, symbolic expressions, and skip lists.
Its partly because efficent datastructures tend to have overheads that are relatively large when dealing with small numbers of entries, and partly because a lot of OS data-structures are FIFOs which linked-lists are good for.
Usually it is because you need a list of stuff - that is, you often pretty much just need something you can add/remove easily at either end or remove/insert at a node pointer you already have and iterate over.
There's plenty of areas where you don't need random access or search capabilities - or the space you need to search is so small that a linked list is anyway faster than more "fancy" data structures.
Sometimes though, it's also because linked lists are easiy to implement.
There's no good answer, because you're making wrong assumptions. It's not true that only linked list is used heavily. It's used when there are many things which need to be traversed in sequence - like free memory segments list, queue-like structures, etc.
There are many places where you need exactly such structure. There are other places where you don't need it.
Linked lists can easily be tied-in with other data-structures, (such as hash-tables for example). Also they can be translated easily to arrays and there are different ways of implementing them, for example one or two way linked list.
Related
I'm working on a self-resizing queue in C using arrays with structs and int pointers.
I wanted to be able to dynamically resize it (double it as needed) using realloc and of course still preserve the priority or "queueness". I'm struggling to do that last part cleanly. I've got it working as a queue of static size, which I know is the most common use of circular buffers, and I can realloc that block. I just can't find the best way to put humpty dumpty back together. I know I can implement a queue easily as a linked list. I just wanted to make a good attempt for learning purposes but I don't see many people asking this particular question.
There was one person who asked about it here with respect to Java, but I don't know that language. So anyway my question is: is there any point or advantage to making queues with arrays if you know you'll have to resize them later? It seems overly complicated and wasteful regarding time and space complexity, so do people always them and just go wtih linked lists or other methods in the field?
Edit: When I say priority above, I'm not referring to priority queues, but general queue FIFO ordering.
I'm reading about Read-copy-update (RCU). I'm not sure if I understood it correctly in case of SMP. As far as I know RCU ensures that Update is executed atomically. In case for example single linked list it is obvious that exchanging old element with the new one can be done in one operation, because it is done by changing pointer. But how to ensure that RCU will still be atomically executed in case of doubly-linked list? There are two pointers points to given element (next and prev), so every change on this element needs to change those two pointers.
How to ensure that changing those two pointers will be done as atomic operation?
How it is done in Linux?
I was asking myself the same question, and a quick search brought a reply to a comment, extracted from an introduction article to RCU by Paul McKenney (who, from what I gather, is one of the multiple concurrent inventors of the ideas behind RCU).
The question:
I'm wondering whether the omission of the backlinks in the examples is a
good thing. The omission makes the technique trivial, since publishing
only involves one replacing one pointer.
What about the second, back, one? Without support for atomic two-pointer
updates, how can both the p->prev->next = q and p->next->prev = q updates
be performed without risking clients to see an inconsistent view of the
doubly linked list? Or is that not a problem in practice?
Thanks for the article, though. Looking forward to the next installment!
The answer:
Glad you liked the article, and thank you for the excellent question! I could give any number of answers, including:
In production systems, trivial techniques are a very good thing.
Show me an example where it is useful to traverse the ->prev pointers under RCU protection. Given several such examples, we could work out how best to support this.
Consistency is grossly overrated. (Not everyone agrees with me on this, though!)
Even with atomic two-pointer updates, consider the following sequence of events: (1) task 1 does p=p->next (2) task 2 inserts a new element between the two that task 1 just dealt with (3) task 1 does p=p->prev and fails to end up where it started! Even double-pointer atomic update fails to banish inconsistency! ;-)
If you need consistency, use locks.
Given the example above, we could support a level of consistency equivalent to the double-pointer atomic update simply by assigning the pointers in sequence -- just remove the prev-pointer poisoning from list_del_rcu(), for example. But doing this would sacrifice the ability to catch those bugs that pointer-poisoning currently catches.
So, there might well come a time when the Linux kernel permits RCU-protected traversal of linked lists in both directions, but we need to see a compelling need for this before implementing it.
So basically, Linux "disallows" backwards traversal in both directions when doing RCU. As mentioned in the comment, you could use some newer hardware mechanisms like Double Compare And Swap, but they're not available everywhere, and as mentioned, you can still get memory consistency issues.
I need a simple (LRU) cache which should run in-process. I found memcached, which looks great but there does not seem to be an easy way to host it in-process. I don't need a distributed cache, just a simple key/value store and some kind of LRU behaviour and some nice allocator to limit fragmentation, as the entry size varies a lot (few bytes -- few kilobytes.) There must be surely an existing implementation of such a thing? Should be C or C++.
I hate to answer this way, but it would be fairly simple to implement yourself.
Allocator. Use malloc and free. They do work, and they work well. This also makes it easier to interface with the rest of your program.
Mutex -> hash table, tree, or trie. You can use a linked list to track LRU. Don't try to do fancy lockless stuff.
Should weigh less than a couple hundred lines, knock it out in a good solid day.
I've had success using commoncache but the project doesn't appear to have any activity and issues raised (with patches) by my colleague are still unaddressed...
I want to sort on the order of four million long longs in C. Normally I would just malloc() a buffer to use as an array and call qsort() but four million * 8 bytes is one huge chunk of contiguous memory.
What's the easiest way to do this? I rate ease over pure speed for this. I'd prefer not to use any libraries and the result will need to run on a modest netbook under both Windows and Linux.
Just allocate a buffer and call qsort. 32MB isn't so very big these days even on a modest netbook.
If you really must split it up: sort smaller chunks, write them to files, and merge them (a merge takes a single linear pass over each of the things being merged). But, really, don't. Just sort it.
(There's a good discussion of the sort-and-merge approach in volume 2 of Knuth, where it's called "external sorting". When Knuth was writing that, the external data would have been on magnetic tape, but the principles aren't very different with discs: you still want your I/O to be as sequential as possible. The tradeoffs are a bit different with SSDs.)
32 MB? thats not too big.... quicksort should do the trick.
Your best option would be to prevent having the data unordered if possible. Like it has been mentioned, you'd be better of reading the data from disk (or network or whatever the source) directly into a selforganizing container (a tree, perhaps std::set will do).
That way, you'll never have to sort through the lot, or have to worry about memory management. If you know the required capacity of the container, you might squeeze out additional performance by using std::vector(initialcapacity) or call vector::reserve up front.
You'd then best be advised to use std::make_heap to heapify any existing elements, and then add element by element using push_heap (see also pop_heap). This essentially is the same paradigm as the self-ordering set but
duplicates are ok
the storage is 'optimized' as a flat array (which is perfect for e.g. shared memory maps or memory mapped files)
(Oh, minor detail, note that sort_heap on the heap takes at most N log N comparisons, where N is the number of elements)
Let me know if you think this is an interesting approach. I'd really need a bit more info on the use case
I am using C NOT C++!
I know the C++ collections, but I was wondering if Microsoft has a C based List structure of some type, like the linux kernel provides, that I can use in a user mode project?
I would prefer not rolling my own.
Only thing in the Windows API is interlocked singly linked lists, which are used via InterlockedPushEntrySList and InterlockedPopEntrySList .
For device drivers, there is LIST_ENTRY, but I am not sure if this can be pulled into user-mode.
Many algorithms books and websites contain implementations of linked lists that can easily be ported to C. Rolling your own is not too difficult.
reusable collections are tough in C, it just doesn't have the flexibility or metadata (how do you know when you are overflowing this array list and need to reallocate? How do will reallocate work if the rest of the code is using a custom alloc?
You can do it (you CAN do anything in c), but it gets abstract really fast.
On the other hand, creating a linked list in c yourself is downright fun. Arrays are already there, hashes are annoying but not impossible, trees are fun, ...
Also--people who think in c tend to be constantly optimizing. Setting every linked list operation behind a function call instead of just using this=this.next would probably disgust many of them (rightfully so).