I am working on Embedded C, Task related implementation in OS. I have implemented the Linked List. Now it needs to minimize the use of pointers to satisfy MISRA C, in my present implementation I am searching for the best alternative for the Linked List, in Embedded OS for task operation.
It'd be easy to use a static array of structures to completely avoid pointers (you'd just use array indexes and not pointers). This has both advantages and disadvantages.
The disadvantages are:
you have to implement your own allocator (to allocate and free "array elements" within the static array)
the memory used for the array can't be used for any other purpose when it's not being used for the linked list
you have to determine a "max. number of elements that could possibly be needed"
it has all the same problems as pointers. E.g. you can access an array element that was freed, free the same array element multiple times, use an index that's out of bounds (including the equivalent of NULL if you decide to do something like use -1 to represent NULL_ELEMENT), etc.
The advantages are:
by implementing your own allocator you can avoid the mistakes caused by malloc(), including (e.g.) checking something isn't already free when freeing it and returning an error instead of trashing your own metadata
allocation can typically be simpler/faster, because you're only allocating/freeing one "thing" (array element) at a time and don't need to worry about allocating/freeing a variable number of contiguous "things" (bytes) at a time
entries in your list are more likely to be closer (in memory) to each other (unlike for malloc() where your entries are scattered among everything else you allocate), and this can improve performance (cache locality)
you have a "max. number of elements that could possibly be needed" to make it far easier to track down problems like (e.g.) memory leaks; and (where memory is limited) make it easier to determine things like worst case memory footprint
it satisfies pointless requirements (like "no pointers") despite not avoiding anything these requirements are intended to avoid
Now it needs to minimize the use of pointers to satisfy MISRA C
I used to work with some embedded engineers. They built low-end (and high-end) routers and gateways. Rather than dynamically allocating memory, they used fixed buffers provisioned at boot. They then tracked indexes into the array of provisioned buffers.
Static arrays and indexes begs for a Cursor data structure. Your first search hit is Cursor Implementation of Linked Lists from
Data Structures and Algorithm Analysis in C++, 2nd ed. by Mark Weiss. (I actually used that book in college years ago).
Related
I was recently in an interview that required me to choose over the two data structures for a problem, and now I have the question of:
What is the reasoning for using a Stack over an array if the only operations needed are push and pop? An array provides constant time for appending and popping the last element from it and it takes up generally less memory than implementing a Stack with a LinkedList. It also provides random access should it be required. Is the only reasoning because an array is typically of fixed size, so we need to dynamically resize the array for each element we put in? This still is in constant time though isn't it unless the penalty is disproportionate?
There are several aspects to consider here...
First, a Stack is an abstract data type. It doesn't define how to implement itself.
An array is (generally) a well defined concrete implementation, and might even be fixed size unless explicitly defined to be dynamic.
A dynamic array can be implemented such that it automatically grows by some factor when exhausted and also might shrink when fill rate drops. These operations are not constant time, but are actually amortized to constant time because the array doesn't grow or shrink in each operation. In terms of memory usage it's hard to imagine an array being more expensive then a linked list unless extremely under used.
The main problem with an array is large allocation size. This is both a problem of maximum limitation and memory fragmentation. Using a linked list avoids both issues because every entry has a small memory footprint.
In some languages like C++, the underlying container that the 'stack' class uses can actually be changed between a dynamic array (vector), linked list (list), or even a double ended queue (deque). I only mention this because its typically not fair to compare a stack vs an array (one is an interface, another is a data structure).
Most dynamic array implementations will allocate more space than is needed, and upon filling the array they will again resize to 2x the size and so on. This avoids allocations and keeps the performance of push generally constant time. However the occasional resize does require copying elements O(n), though this is usually said to amortized to constant time. So in general, you are correct in that this is efficient.
Linked lists on the other hand typically require allocations for every push, which can be somewhat expensive, and the node's they create are larger in size than a single element in the array.
One possible advantage of linked lists, however, is that they do not require contiguous memory. If you have many many elements, its possible that you can fail to allocate a large enough block of memory for an array. Having said that, linked lists take up more memory... so its a bit of a wash.
In C++ for example, the stack by default uses the deque container. The deque is typically implemented as a dynamic array of 'pages' of memory. Each page of memory is fixed in size, which allows the container to actually have random access properties. Moreover, since each page is separate, then the entire container does not require contiguous memory meaning that it can store many many elements. Resizing is also cheap for a deque because it simply allocates another page, making it a great choice for a large stack.
I was reviewing an interview question and comparing notes with a friend, and we have different ideas on one with respect to CPU caching.
Consider a large set of data, such as a large array of double, ie:
double data[1024];
Consider using a dynamically allocated on-the-fly linked list to store the same number of elements. The question asked for a description of a number of trade-offs:
Which allows for quicker random access: we both agreed the array was quicker, since you didn't have to traverse the list in a linear fashion (O(n)), just provide an index (O(1)).
Which is quicker for comparing two lists of the same length: we both decided that if it was just primitive data types, the array would allow for a memcmp(), while the linked list required element-wise comparison plus dereferencing overhead.
Which allowed for more efficient caching if you were accessing the same element several times?
In point 3, this is where our opinions differed. I contended that that the CPU is going to try and cache the entire array, and if the array is obscenely large, it can't be stored in cache, and therefore there will be no caching benefit. With the linked list, individual elements can be cached. Therefore, linked lists lend themselves to cache "hits" more than static arrays do when dealing with a very large number of elements.
To the question: Which of the two is better for cache "hits", and can modern systems cache part of an array, or do they need the whole array or it won't try? Any sort of references to technical documents or standards I could also use to provide a definitive answer would help a lot.
Thanks!
The CPU doesn't know about your data structures. It caches more-or-less raw blocks of memory. Therefore, if you suppose you can access the same one element multiple times without traversing the list each time, then neither linked list nor array has a caching advantage over the other.
HOWEVER, arrays have a big advantage over dynamically-allocated linked lists for accessing multiple elements in sequence. Because CPU caches operate on blocks of memory rather larger than one double, when one array element is in the cache, it is likely that several others that reside at adjacent addresses are in the cache, too. Thus one (slow) read from main memory gives access to (fast) cached access to multiple adjacent array elements. The same is not true of linked lists, as nodes may be allocated anywhere in memory, and even a single node has at minimum the overhead of a next pointer to dilute the number of data elements that may be cached at the same time.
Caches don't know about arrays, they just see memory accesses and store a little bit of the memory near that address. Once you've accessed something at an address it should stick around in the cache a while, regardless of whether that address belongs to an array or a linked list. But the cache controllers don't really know what's being accessed.
When you traverse an array, the cache system may pre-fetch the next bit of an array. This is usually heuristically driven (maybe with some compiler hints).
Some hardware and toolchains offer intrinsics that let you control cache residency (through pre-fetches, explicit flushes and so forth). Normally you don't need this kind of control, but for things like DSP code, resource-constrained game consoles and OS-level stuff that needs to worry about cache coherency it's pretty common to see people use this functionality.
I am making a level editor for a simple game in lua and the tiles are represented by integers in a 2d array, when I read the level description in from a file, it may so happen that this 2d array is sparsely populated, how does lua manage memory ? will it kep those holes in the array or will it be smart about it and not waste any space?
The question itself is irrelevant in a practical sense. You have one of two cases:
Your tilemaps are reasonably small.
Your tilemaps are big enough such that compression is important for fitting in memory constraints.
If #1 is the case, then you shouldn't care. It doesn't matter how memory efficient Lua is or isn't, because your tilemaps aren't big enough for it to ever matter.
If #2 is the case, then you shouldn't care either. Why? Because if fitting in memory is important to you, and you're likely to run out, then you shouldn't leave it to the vagaries of how Lua happens to manage the memory for arrays.
If memory is important, you should build a specialized data structure that Lua can use, but is written in C. That way, you can have explicit control over memory management; your tilemaps will therefore take up as much or as little memory as you choose for them to.
As for the actual question, it rather depends on how you build your "array". Lua tables are associative arrays by nature, but their implementation is split between an "array part" and a "table part". In general though, if you store elements sparsely, then the elements will be sparsely stored in memory (for some definition of "sparse"). As long as you don't do something silly like:
for i = 1, max_table_size do
my_tilemap[i] = 0
end
Then again, you may want to do that for performance reasons. This ensures that you have a big array rather than a sparse table. Since the array elements are references rather than values, they only take up maybe 16 bytes per element. Once you decide to put something real in an entry (an actual tile), you can. Indexing into the array would be fast in this case, though since the table part is a hash-table, it's not exactly slow.
Many times, stack is implemented as a linked list, Is array representation not good enough, in array we can perform push pop easily, and linked list over array complicates the code, and has no advantage over array implementation.
Can you give any example where a linked list implementation is more beneficial, or we cant do without it.
I would say that many practical implementations of stacks are written using arrays. For example, the .NET Stack implementation uses an array as a backing store.
Arrays are typically more efficient because you can keep the stack nodes all nearby in contiguous memory that can fit nicely in your fast cache lines on the processor.
I imagine you see textbook implementations of stacks that use linked lists because they're easier to write and don't force you to write a little bit of extra code to manage the backing array store as well as come up with a growth/copy/reserve space heuristic.
In addition, if you're really pressed to use little memory, a linked list implementation might make sense since you don't "waste" space that's not currently used. However, on modern processors with plenty of memory, it's typically better to use arrays to gain the cache advantages they offer rather than worry about page faults with the linked list approach.
Size of array is limited and predefined. When you dont know how many of them are there then linked list is a perfect option.
More Elaborated comparison:-(+ for dominating linked list and - for array)
Size and type constraint:-
(+) Further members of array are aligned at equal distance and need contiguous memory while on the other side link list can provide non contiguous memory solution, so sometimes it is good for memory as well in case of huge data(avoids cpu polling for resource).
(+) Suppose in a case you are using an array as stack, and the array is of type int.Now how will you accommodate a double in it??
Portability
(+) Array can cause exceptions like index out of bound exceptions but you can increase the chain anytime in a linked list.
Speed and performance
(-)If its about performance, then obviously most of the complexity fall around O(1) for arrays.In case of a linked list you will have to select a starting node to start the tracing and this adds to performance penalty.
When the size of the stack can vary greatly you waste space if you have generalized routines which always allocate a huge array.
Obviously a fixed size array has limitation of knowing maximum size before hand.
If you consider dynamic array then Linked List vs. Arrays covers the details including complexities for performing operations.
Stack is implemented using Linked List because Push and Pop operations are of O(1) time complexities, compared to O(n) for arrays. (apart from flexible size advantage in Linked List)
While there are lots of different sophisticated implementations of malloc / free for C/C++, I'm looking for a really simple and (especially) small one that works on a fixed-size buffer and supports realloc. Thread-safety etc. are not needed and my objects are small and do not vary much in size. Is there any implementation that you could recommend?
EDIT:
I'll use that implementation for a communication buffer at the receiver to transport objects with variable size (unknown to the receiver). The allocated objects won't live long, but there are possibly several objects used at the same time.
As everyone seems to recommend the standard malloc, I should perhaps reformulate my question. What I need is the "simplest" implementation of malloc on top of a buffer that I can start to optimize for my own needs. Perhaps the original question was unclear because I'm not looking for an optimized malloc, only for a simple one. I don't want to start with a glibc-malloc and extend it, but with a light-weight one.
Kerninghan & Ritchie seem to have provided a small malloc / free in their C book - that's exactly what I was looking for (reimplementation found here). I'll only add a simple realloc.
I'd still be glad about suggestions for other implementations that are as simple and concise as this one (for example, using doubly-linked lists).
I recommend the one that came with standard library bundled with your compiler.
One should also note there is no legal way to redefine malloc/free
The malloc/free/realloc that come with your compiler are almost certainly better than some functions you're going to plug in.
It is possible to improve things for fixed-size objects, but that usually doesn't involve trying to replace the malloc but rather supplementing it with memory pools. Typically, you would use malloc to get a large chunk of memory that you can divide into discrete blocks of the appropriate size, and manage those blocks.
It sounds to me that you are looking for a memory pool. The Apache Runtime library has a pretty good one, and it is cross-platform too.
It may not be entirely light-weight, but the source is open and you can modify it.
There's a relatively simple memory pool implementation in CCAN:
http://ccodearchive.net/info/antithread/alloc.html
This looks like fits your bill. Sure, alloc.c is 1230 lines, but a good chunk of that is test code and list manipulation. It's a bit more complex than the code you implemented, but decent memory allocation is complicated.
I would generally not reinvent the wheel with allocation functions unless my memory-usage pattern either is not supported by malloc/etc. or memory can be partitioned into one or more pre-allocated zones, each containing one or two LIFO heaps (freeing any object releases all objects in the same heap that were allocated after it). In a common version of the latter scenario, the only time anything is freed, everything is freed; in such a case, malloc() may be usefully rewritten as:
char *malloc_ptr;
void *malloc(int size)
{
void *ret;
ret = (void*)malloc_ptr;
malloc_ptr += size;
return ret;
}
Zero bytes of overhead per allocated object. An example of a scenario where a custom memory manager was used for a scenario where malloc() was insufficient was an application where variable-length test records produced variable-length result records (which could be longer or shorter); the application needed to support fetching results and adding more tests mid-batch. Tests were stored at increasing addresses starting at the bottom of the buffer, while results were stored at decreasing addresses starting at the top. As a background task, tests after the current one would be copied to the start of the buffer (since there was only one pointer that was used to read tests for processing, the copy logic would update that pointer as required). Had the application used malloc/free, it's possible that the interleaving of allocations for tests and results could have fragmented memory, but with the system used there was no such risk.
Echoing advice to measure first and only specialize if performance sucks - should be easy to abstract your malloc/free/reallocs such that replacement is straightforward.
Given the specialized platform I can't comment on effectiveness of the runtimes. If you do investigate your own then object pooling (see other answers) or small object allocation a la Loki or this is worth a look. The second link has some interesting commentary on the issue as well.