Stack Implementations in C [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I’m stuck trying to figure out what, exactly, are the definitions of the following stack implementations and the advantages and disadvantages associated with each.
1.Array-based implementation
2.Linked implementation
3.Blocked implementation
Any help would be much appreciated.

A user of a stack implementation expects a dynamic data structure. C does not provide that directly in language terms. You have to allocate and administrate the memory your own.
Allocating an array, therefore a static sized structure, is very easy. However, you have the problem, that the number of maximum entries is limited. If you make the array too small, this will cause errors, if you make the array very big, you are waisting memory.
A solution for this is to dynamically reallocate the array, if the number of entries exceeds the array size. But this moves the whole memory to another place in memory, what has some disadvantages. (I. e. you have to copy the whole memroy and it is not possible to hold a pointer to a specific entry.)
Having a linked list is the contrary. You (dynamically) allocate memory for each entry. You can free the memory for a single entry on removing from stack. This sounds better, but has the caveat, that you spent a pointer size of memory for each entry. Typically this is the size for an entry. So you double the memory consumption. Beside this, allocating small pieces of memory over and over wastes memory, too.
So you can implement a compromise: A linked list of arrays: You allocate a block for a number of entries, let's say 256. Then you fill that block with entries, without reallocating or allocating memory. If the number of entries exceeds that value, you allocate a new block for additional 256 entries. The blocks are linked. So it is a linked list of arrays.
Esp. for a stack – you do not have removals in the middle of the structure – this is the best implementation in most cases.

Think about how much space each data structure takes, and whether you can do the stack operations on them efficiently; i.e., in O(1) time.
For a basic stack, you need to be able to push new elements onto the stack, and pop the most recently pushed (top) element off. You probably also want to peek at the top element, and check if the stack is empty.
A dynamically sized array (or block, if I understand the OP's comment correctly) is fine for a stack. It may be advantageous in certain situations if you will be accessing and changing the stack a lot, and want to avoid the small amount of extra work of allocating and destroying memory with each push or pop. It also gives you direct, indexed access to everything in the stack, for extended functionality. The disadvantage is that the stack will use some extra space.
You can use a singly-linked-list list for a stack as well, pushing and popping at the head. This is probably the most common type structure used if you don't need extended functionality like direct access to the elements besides the head, and if you aren't trying to implement something on the bleeding edge of time efficiency.

Related

Why would you use a LIFO stack over an array?

I was recently in an interview that required me to choose over the two data structures for a problem, and now I have the question of:
What is the reasoning for using a Stack over an array if the only operations needed are push and pop? An array provides constant time for appending and popping the last element from it and it takes up generally less memory than implementing a Stack with a LinkedList. It also provides random access should it be required. Is the only reasoning because an array is typically of fixed size, so we need to dynamically resize the array for each element we put in? This still is in constant time though isn't it unless the penalty is disproportionate?
There are several aspects to consider here...
First, a Stack is an abstract data type. It doesn't define how to implement itself.
An array is (generally) a well defined concrete implementation, and might even be fixed size unless explicitly defined to be dynamic.
A dynamic array can be implemented such that it automatically grows by some factor when exhausted and also might shrink when fill rate drops. These operations are not constant time, but are actually amortized to constant time because the array doesn't grow or shrink in each operation. In terms of memory usage it's hard to imagine an array being more expensive then a linked list unless extremely under used.
The main problem with an array is large allocation size. This is both a problem of maximum limitation and memory fragmentation. Using a linked list avoids both issues because every entry has a small memory footprint.
In some languages like C++, the underlying container that the 'stack' class uses can actually be changed between a dynamic array (vector), linked list (list), or even a double ended queue (deque). I only mention this because its typically not fair to compare a stack vs an array (one is an interface, another is a data structure).
Most dynamic array implementations will allocate more space than is needed, and upon filling the array they will again resize to 2x the size and so on. This avoids allocations and keeps the performance of push generally constant time. However the occasional resize does require copying elements O(n), though this is usually said to amortized to constant time. So in general, you are correct in that this is efficient.
Linked lists on the other hand typically require allocations for every push, which can be somewhat expensive, and the node's they create are larger in size than a single element in the array.
One possible advantage of linked lists, however, is that they do not require contiguous memory. If you have many many elements, its possible that you can fail to allocate a large enough block of memory for an array. Having said that, linked lists take up more memory... so its a bit of a wash.
In C++ for example, the stack by default uses the deque container. The deque is typically implemented as a dynamic array of 'pages' of memory. Each page of memory is fixed in size, which allows the container to actually have random access properties. Moreover, since each page is separate, then the entire container does not require contiguous memory meaning that it can store many many elements. Resizing is also cheap for a deque because it simply allocates another page, making it a great choice for a large stack.

Shrink memory of an array of pointers, possible?

I am having difficulties to find a possible solution so I decided to post my question. I am writing a program in C, and:
i am generating a huge array containing a lot of pointers to ints, it is allocated dynamically and filled during runtime. So before I don't know which pointers will be added and how many. The problem is that they are just to many of them, so I need to shrink somehow the space.
IS there any package or tool available which could possibly encode my entries somehow or change the representation so that I save space?
Another question, I also thought about writing a file with my information, is this then kept in memory the whole time or just if I reopen the file again?
It seems like you are looking for a simple dynamic array (the advanced data type dynamic array, that is). There should be many implementations for this out there. You can simply start with a small dynamic array and push new items to the back just like you would do with a vector in c++ or java. One implementation would be GArray. You will only allocate the memory you need.
If you have to/want to do it manually, the usual method is to store the capacity and the size of the array you allocated along with the pointer in a struct and call realloc() from within push_back() whenever you need more space. Usually you should increase the size of your array by a factor of 1.3 to 1.4, but a factor of 2 will do if you're not expecting a HUGE array. If you call remove and your size is below a certain threshold (e.g. capacity/2) you shrink the array again with realloc();

What's the relationship between those two “heaps”? [duplicate]

This question already has answers here:
Why are two different concepts both called "heap"? [duplicate]
(9 answers)
Closed 8 years ago.
There're two concepts named "heap" in computer science. One is the memory pool used in memory management, the other is an algorithm.
I know they are different, but what's the relationship between them? Or they just happen to get the same name?
As far as I know, they just happen to have the same name.
One 'heap' is the data structure that contains, at its head, the greatest element of a collection. Since its children just have to be smaller than it, it is a semi-sorted collection. This is more efficient than maintaining a fully sorted list in certain circumstances, such as when you're commonly interested in only the largest element. http://en.wikipedia.org/wiki/Heap_(data_structure)
The other 'heap' is the place in memory besides the stack data can be stored. The problem with data stored in the stack is that it will be freed, and thus lost, when the function returns, and on top of that the stack can only hold so much before overflowing. The 'structure' of the heap is usually a linked list of free data segments - malloc looking for a data segment that satisfies the size requirements, marking it as in use and returning it, whereas free looks for the header for that data segment in the heap, marks it as unused and puts it back in the linked list of free data segments. (Other optimizations include things like having dedicated linked lists for certain chunk sizes.)
As you can see - not related at all!

Array Performance very similar to LinkedList - What gives?

So the title is somewhat misleading... I'll keep this simple: I'm comparing these two data structures:
An array, whereby it starts at size 1, and for each subsequent addition, there is a realloc() call to expand the memory, and then append the new (malloced) element to the n-1 position.
A linked list, whereby I keep track of the head, tail, and size. And addition involves mallocing for a new element and updating the tail pointer and size.
Don't worry about any of the other details of these data structures. This is the only functionality I'm concerned with for this testing.
In theory, the LL should be performing better. However, they're near identical in time tests involving 10, 100, 1000... up to 5,000,000 elements.
My gut feeling is that the heap is large. I think the data segment defaults to 10 MB on Redhat? I could be wrong. Anyway, realloc() is first checking to see if space is available at the end of the already-allocated contiguous memory location (0-[n-1]). If the n-th position is available, there is not a relocation of the elements. Instead, realloc() just reserves the old space + the immediately following space. I'm having a hard time finding evidence of this, and I'm having a harder time proving that this array should, in practice, perform worse than the LL.
Here is some further analysis, after reading posts below:
[Update #1]
I've modified the code to have a separate list that mallocs memory every 50th iteration for both the LL and the Array. For 1 million additions to the array, there are almost consistently 18 moves. There's no concept of moving for the LL. I've done a time comparison, they're still nearly identical. Here's some output for 10 million additions:
(Array)
time ./a.out a 10,000,000
real 0m31.266s
user 0m4.482s
sys 0m1.493s
(LL)
time ./a.out l 10,000,000
real 0m31.057s
user 0m4.696s
sys 0m1.297s
I would expect the times to be drastically different with 18 moves. The array addition is requiring 1 more assignment and 1 more comparison to get and check the return value of realloc to ensure a move occurred.
[Update #2]
I ran an ltrace on the testing that I posted above, and I think this is an interesting result... It looks like realloc (or some memory manager) is preemptively moving the array to larger contiguous locations based on the current size.
For 500 iterations, a memory move was triggered on iterations:
1, 2, 4, 7, 11, 18, 28, 43, 66, 101, 154, 235, 358
Which is pretty close to a summation sequence. I find this to be pretty interesting - thought I'd post it.
You're right, realloc will just increase the size of the allocated block unless it is prevented from doing so. In a real world scenario you will most likely have other objects allocated on the heap in between subsequent additions to the list? In that case realloc will have to allocate a completely new chunk of memory and copy the elements already in the list.
Try allocating another object on the heap using malloc for every ten insertions or so, and see if they still perform the same.
So you're testing how quickly you can expand an array verses a linked list?
In both cases you're calling a memory allocation function. Generally memory allocation functions grab a chunk of memory (perhaps a page) from the operating system, then divide that up into smaller pieces as required by your application.
The other assumption is that, from time to time, realloc() will spit the dummy and allocate a large chunk of memory elsewhere because it could not get contiguous chunks within the currently allocated page. If you're not making any other calls to memory allocation functions in between your list expand then this won't happen. And perhaps your operating system's use of virtual memory means that your program heap is expanding contiguously regardless of where the physical pages are coming from. In which case the performance will be identical to a bunch of malloc() calls.
Expect performance to change where you mix up malloc() and realloc() calls.
Assuming your linked list is a pointer to the first element, if you want to add an element to the end, you must first walk the list. This is an O(n) operation.
Assuming realloc has to move the array to a new location, it must traverse the array to copy it. This is an O(n) operation.
In terms of complexity, both operations are equal. However, as others have pointed out, realloc may be avoiding relocating the array, in which case adding the element to the array is O(1). Others have also pointed out that the vast majority of your program's time is probably spent in malloc/realloc, which both implementations call once per addition.
Finally, another reason the array is probably faster is cache coherency and the generally high performance of linear copies. Jumping around to erratic addresses with significant gaps between them (both the larger elements and the malloc bookkeeping) is not usually as fast as doing a bulk copy of the same volume of data.
The performance of an array-based solution expanded with realloc() will depend on your strategy for creating more space.
If you increase the amount of space by adding a fixed amount of storage on each re-allocation, you'll end up with an expansion that, on average, depends on the number of elements you have stored in the array. This is on the assumption that realloc will need to (occasionally) allocate space elsewhere and copy the contents, rather than just expanding the existing allocation.
If you increase the amount of space by adding a proportion of your current number of elements (doubling is pretty standard), you'll end up with an expansion that, on average, takes constant time.
Will the compiler output be much different in these two cases?
This is not a real life situation. Presumably, in real life, you are interested in looking at or even removing items from your data structures as well as adding them.
If you allow removal, but only from the head, the linked list becomes better than the array because removing an item is trivial and, if instead of freeing the removed item, you put it on a free list to be recycled, you can eliminate a lot of the mallocs needed when you add items to the list.
On the other had, if you need random access to the structure, clearly an array beats the linked list.
(Updated.)
As others have noted, if there are no other allocations in between reallocs, then no copying is needed. Also as others have noted, the risk of memory copying lessens (but also its impact of course) for very small blocks, smaller than a page.
Also, if all you ever do in your test is to allocate new memory space, I am not very surprised you see little difference, since the syscalls to allocate memory are probably taking most of the time.
Instead, choose your data structures depending on how you want to actually use them. A framebuffer is for instance probably best represented by a contiguous array.
A linked list is probably better if you have to reorganise or sort data within the structure quickly.
Then these operations will be more or less efficient depending on what you want to do.
(Thanks for the comments below, I was initially confused myself about how these things work.)
What's the basis of your theory that the linked list should perform better for insertions at the end? I would not expect it to, for exactly the reason you stated. realloc will only copy when it has to to maintain contiguity; in other cases it may have to combine free chunks and/or increase the chunk size.
However, every linked list node requires fresh allocation and (assuming double linked list) two writes. If you want evidence of how realloc works, you can just compare the pointer before and after realloc. You should find that it usually doesn't change.
I suspect that since you're calling realloc for every element (obviously not wise in production), the realloc/malloc call itself is the biggest bottleneck for both tests, even though realloc often doesn't provide a new pointer.
Also, you're confusing the heap and data segment. The heap is where malloced memory lives. The data segment is for global and static variables.

What is causing a stack overflow?

You may think that this is a coincidence that the topic of my question is similar to the name of the forum but I actually got here by googling the term "stack overflow".
I use the OPNET network simulator in which I program using C. I think I am having a problem with big array sizes. It seems that I am hitting some sort of memory allocation limitation. It may have to do with OPNET, Windows, my laptop memory or most likely C language. The problem is caused when I try to use nested arrays with a total number of elements coming to several thousand integers. I think I am exceeding an overall memory allocation limit and I am wondering if there is a way to increase this cap.
Here's the exact problem description:
I basically have a routing table. Let's call it routing_tbl[n], meaning I am supporting 30 nodes (routers). Now, for each node in this table, I keep info. about many (hundreds) available paths, in an array called paths[p]. Again, for each path in this array, I keep the list of nodes that belong to it in an array called hops[h]. So, I am using at least nph integers worth of memory but this table contains other information as well. In the same function, I am also using another nested array that consumes almost 40,000 integers as well.
As soon as I run my simulation, it quits complaining about stack overflow. It works when I reduce the total size of the routing table.
What do you think causes the problem and how can it be solved?
Much appreciated
Ali
It may help if you post some code. Edit the question to include the problem function and the error.
Meanwhile, here's a very generic answer:
The two principal causes of a stack overflow are 1) a recursive function, or 2) the allocation of a large number of local variables.
Recursion
if your function calls itself, like this:
int recurse(int number) {
return (recurse(number));
}
Since local variables and function arguments are stored on the stack, then it will in fill the stack and cause a stack overflow.
Large local variables
If you try to allocate a large array of local variables then you can overflow the stack in one easy go. A function like this may cause the issue:
void hugeStack (void) {
unsigned long long reallyBig[100000000][1000000000];
...
}
There is quite a detailed answer to this similar question.
Somehow you are using a lot of stack. Possible causes include that you're creating the routing table on the stack, you're passing it on the stack, or else you're generating lots of calls (eg by recursively processing the whole thing).
In the first two cases you should create it on the heap and pass around a pointer to it. In the third case you'll need to rewrite your algorithm in an iterative form.
Stack overflows can happen in C when the number of embedded recursive calls is too high. Perhaps you are calling a function from itself too many times?
This error may also be due to allocating too much memory in static declarations. You can switch to dynamic allocations through malloc() to fix this type of problem.
Is there a reason why you cannot use the debugger on this program?
It depends on where you have declared the variable.
A local variable (i.e. one declared on the stack is limited by the maximum frame size) This is a limit of the compiler you are using (and can usually be adjusted with compiler flags).
A dynamically allocated object (i.e. one that is on the heap) is limited by the amount of available memory. This is a property of the OS (and can technically by larger the physical memory if you have a smart OS).
Many operating systems dynamically expand the stack as you use more of it. When you start writing to a memory address that's just beyond the stack, the OS assumes your stack has just grown a bit more and allocates it an extra page (usually 4096Kib on x86 - exactly 1024 ints).
The problem is, on the x86 (and some other architectures) the stack grows downwards but C arrays grow upwards. This means if you access the start of a large array, you'll be accessing memory that's more than a page away from the edge of the stack.
If you initialise your array to 0 starting from the end of the array (that's right, make a for loop to do it), the errors might go away. If they do, this is indeed the problem.
You might be able to find some OS API functions to force stack allocation, or compiler pragmas/flags. I'm not sure about how this can be done portably, except of course for using malloc() and free()!
You are unlikely to run into a stack overflow with unthreaded compiled C unless you do something particularly egregious like have runaway recursion or a cosmic memory leak. However, your simulator probably has a threading package which will impose stack size limits. When you start a new thread it will allocate a chunk of memory for the stack for that thread. Likely, there is a parameter you can set somewhere that establishes the the default stack size, or there may be a way to grow the stack dynamically. For example, pthreads has a function pthread_attr_setstacksize() which you call prior to starting a new thread to set its size. Your simulator may or may not be using pthreads. Consult your simulator reference documentation.

Resources