I'm studying Data Structures, and I'm not getting why stacks and queues need to be declared like:
struct stack *Stack;
(forget about the struct syntax)
I mean, why it is always declared as a pointer?
They are not always declared like that!
In general, declaring a variable as a pointer is useful for later allocating it dynamically. This can be due to a couple of reasons:
The variable is too big for the program stack
You want to return that variable from a function
In your case, let's think of two different implementations of stack:
struct stack
{
void *stuff[10000];
int size;
};
This is a terrible implementation, but assuming you have one like this, then you'd most probably not want to put it on the program stack.
Alternatively, if you have:
struct stack
{
void **stuff;
int size;
int mem_size;
};
You dynamically change the size of stuff anyway, so there is absolutely no harm in declaring a variable of type struct stack on the program stack, i.e. like this:
struct stack stack;
Unless, you'd want to return it from a function. For example:
struct stack *make_stack(int initial_size)
{
struct stack *s;
s = malloc(sizeof(*s));
if (s == NULL)
goto exit_no_mem;
if (initial_size == 0)
initial_size = 1;
s->stuff = malloc(initial_size * sizeof(*s->stuff));
if (s->stuff == NULL)
goto exit_no_stuff_mem;
s->size = 0;
s->mem_size = initial_size;
return s;
exit_no_stuff_mem:
free(s);
exit_no_mem:
return NULL;
}
Personally, though, I would have declared the function like this:
int make_stack(struct stack *s, int initial_size);
and allocate the struct stack on the program stack.
It depends on how your stack structure is defined (not just the layout of the struct, but the operations that manipulate it as well).
It's entirely possible to define a stack as a simple array and index, such as
struct stack_ {
T data[N]; // for some type T and size N
size_t stackptr; // Nobody caught that error, so it never existed, right? ;-)
} stack;
stack.stackptr = N; // stack grows towards 0
// push operation
if (stack.stackptr)
stack.data[--stack.stackptr] = some_data();
else
// overflow
// pop operation
if (stack.stackptr < N)
x = stack.data[stack.stackptr++];
else
// underflow
However, fixed-sized arrays are limiting. One easy method of implementing a stack is to use a list structure:
struct stack_elem {
T data;
struct stack_elem *next;
};
The idea is that the head of the list is the top of the stack. Pushing an item onto the stack adds an element at the head of the list; popping an item removes that element from the head of the list:
int push(struct stack_elem **stack, T data)
{
struct stack_elem *s = malloc(sizeof *s);
if (s)
{
s->data = data; // new element gets data
s->next = *stack; // set new element to point to current stack head
*stack = s; // new element becomes new stack head
}
return s != NULL;
}
int pop(struct stack_elem **stack, T *data)
{
int stackempty = (*stack == NULL);
if (!stackempty)
{
struct stack_elem *s = *stack; // retrieve the current stack head
*stack = (*stack)->next; // set stack head to point to next element
*data = s->data; // get the data
free(s); // deallocate the element
}
return r;
}
int main(void)
{
struct stack_elem *mystack = NULL; // stack is initially empty
T value;
...
if (!push(&mystack, some_data()))
// handle overflow
...
if (!pop(&mystack, &value))
// handle underflow
...
}
Since push and pop need to be able to write new pointer values to mystack, we need to pass a pointer to it, hence the double indirection for stack in push and pop.
No, they don't have to be declared as pointers.
One can as well allocate stacks and queues as global variables:
struct myHash { int key; int next_idx; int data[4]; } mainTable[65536];
struct myHash duplicates[65536*10];
int stack[16384];
myHash also includes a linked list for duplicate entries using indices.
But as stated in the comments, if one has to add more elements to the structures, that was initially planned, then pointers come handy.
An additional reason to declare structures as pointers is that it typically with pointers one can access both the complete structure as a whole, any individual element of the structure or some subset of the elements. That makes the syntax more versatile. Also when the structure is passed as a parameter to some external function, a pointer is inevitable.
There's really no need to implement stacks and queues with pointers - others have already stated this fact clearly. Look at #JohnBode 's answer for how a stack can be perfectly implemented using arrays. The thing is that modelling certain data structures (such as stacks, queues and linked lists) using pointers, allows you to program them in a very efficient way in terms of both execution speed and memory consumption.
Usually an underlying array for holding a data structure is very good implementation choice if your use-cases require frequent random access to an element in the structure, given it's positional index (this is FAST with an array). However growing the structure past its initial capacity can be expensive AND you waste memory with the unused elements in the array. Insertion and deletion operations can also be very expensive since you may need to rearrange elements to either compact the structure or make space for the new elements.
Since a queue and a stack don't have this random-acess requirement and you don't insert or delete elements in the middle of them, it is a better implementation choice to dynamically allocate each individual element "on the fly", requesting memory when a new element is required (this is what malloc does), and freeing it as an element is deleted. This is fast and will consume no more memory than it is actually needed by your data structure.
As aleady pointed out it depends on how big the struct is.
Another reason is encapsulation. The stack implementation might not expose the definition of struct stack in its header file. This hides the implementation detail from the user, with the downside that free store allocation is required.
Related
I have the following tree node struct that holds pointers to other tree nodes:
struct node {
// ...
struct node* children[20];
}
The idea is that I want to check whether there is node* inside the children and based and that go deeper into the tree. So when I allocate the node I want to have children with 20 NULL values.
Currently I am not doin
How should I allocate this array in order to not get errors like Conditional jump or move depends on uninitialised value(s) (Valgrind)?
Would it be better to use struct node** children and allocate fixed size each time I allocate a new node?
EDIT: Example of one place where Valgrind complains:
for(int i=0;i<20;i++)
if(node->children[i] != NULL)
do_something_with_the_node(node->children[i]);
When you allocate a new instance of struct node, you must set the contained pointers to NULL to mark them as "not pointing anywhere". This will make the Valgrind warning go away, since the pointers will no longer be uninitialized.
Something like this:
struct node * node_new(void)
{
struct node *n = malloc(sizeof *n);
if(n != NULL)
{
for(size_t i = 0; i < sizeof n->children / sizeof *n->children; ++i)
n->children[i] = NULL;
}
return n;
}
You cannot portably use either memset() on n->children nor calloc(), since those will give you "all bits zero" which is not the same as "pointer NULL".
Your struct definition is valid (although it's hard to tell without more context if it fits your requirements).
Valgrind doesn't complain about your struct definition, it probably complains about how you instantiate variables of that type. Ensure that all of the array members get initialized and the complaints will most likely go away.
The problem is that you are using an unintialized value in an if condition.
When you instantiate a struct node, its member struct node* children[20]; is an array of 20 struct node *, all of which are uninitialized.
It would be no different from this:
char *x;
if (x == NULL) {
/* Stuff */
}
At this point, x may have literally any value. In your example, any element of an array may have any value.
To fix this, you need to initialize the elements of an array before using them, for example like this:
for (int i = 0; i < 20; ++i) {
node->children[i] = NULL;
}
Or shorter:
memset(node->children, 0, 20);
If you changed the member to, as you've suggested, node **children, the situation wouldn't be much different - you'll still need to initialize all the members, including array's elements. You could make it shorter by using calloc, which will initialize all bytes to 0; then again, you'll need some code for correct deallocation (and remember to do it), so I think the tradeoff's not worth it.
I'm a little unclear on this part of C, since it's a bit unlike other languages I've used, but this may just be a dumb question. I'm trying to implement a stack. I have the node struct, it has the information I want to pass:
struct position{
int square[2];
int counter;
struct position *prev;
};
so in main, I declare and initialize the bottom node of the stack, set *prev to NULL, then declare the rest. My question is, what happens when I try to pass it to function pop? I can create a position object that points to this one and return that, but will it be pushed off the stack when the function closes? Or should I return the position and set that equal to a new position object in main? What if I decide to create several of these nodes in a function? Will they remain once the function closes?
Edit: mah reminded me of my followup question which is, if they don't exist outside of the function, should I use malloc to create the space in the memory for them?
The lifetime of your objects depend on where they're created; if you declare for example a structure within a block of code (where a block is everything inside { and its matching }), that structure is no longer valid once execution leaves the block. Pointers to that structure are only valid as long as the structure is valid.
For what you're describing, you want to dynamically allocate your structures, using either malloc() or a similar function. Dynamically allocated data will remain valid (assuming you do not overwrite it) until you free() the memory, or until your program terminates. Pointers to these areas of memory will remain valid for that same period of time.
Consider:
static struct position *topOfStack = NULL;
void push(struct position *node)
{
node->prev = topOfStack;
topOfStack = node;
}
struct position *pop()
{
struct position *popped = topOfStack;
if (topOfStack) topOfStack = topOfStack->pref;
return popped;
}
To use this, you can:
f() {
struct position *node = malloc(sizeof(*node));
/* ... fill in node details ... */
push(node);
}
Notice that I allocated the node dynamically. Had I just declared a struct position node;, I could legally call push(&node); but once my function left scope, the stack would have an invalid item in it (which would likely cause havoc).
what happens when I try to pass it to function pop?
it depends on your pop() function prototype. If the pop's function prototype should be:
struct position* pop(struct position* stack);
I can create a position object that points to this one and return that, but will it be pushed off the stack when the function closes?
your question is quite unclear, and it looks like a big misunderstanding of instance scoping in C. Basically, you have two ways to allocate variables, either on the stack or on the heap. The scoping you're talking about is stack instances scope.
What if I decide to create several of these nodes in a function? Will they remain once the function closes?
basically, if you use the stack, they will live as long as the scope they're declared in. In C, scope is defined by { and }. for example:
int main() {
struct position pos1;
struct position pos2;
struct position pos3;
pos3.prev = pos2;
pos2.prev = pos1;
pos1.prev = NULL;
pop(&pos3);
}
there you declare 3 variables, and associate them, and the pop function just resets the .prev link. But for a stack that kind of architecture is not really useful, because it is quite limited.
There you definitely need to push your instances in the heap, thus using malloc() and free():
// push() pseudocode:
// take stack, iterate over each prev until prev is NULL
// allocate prev with malloc() the same way as for "stack" in main()
// insert values in prev
void push(struct position* stack, int* value);
// pop() pseudocode:
// take stack, iterate over each prev until prev->prev is NULL,
// then keep prev->prev in a temporary variable
// set prev to NULL
// return temporary variable (former prev->prev)
struct position* pop(struct position* stack);
int main() {
int value[2];
struct position* stack = malloc(sizeof(struct position));
// value is what you want to push to the stack
value[0] = 42;
value[1] = 42;
push(stack, value);
value[0] = 2;
value[1] = 20;
push(stack, value);
struct position* pos;
pos = pop(stack);
// do something with pos->value
free(pos);
}
there you create a pointer to a node for which you allocate some memory in the heap. the push() function is allocating some new memory, assigning .prev for that new space to stack's address and populating that memory with the value. pop() should get to the value before the last one, reset its pointer to that value, and return that value.
Of course, I'm just giving concepts and ideas here, I'm leaving you get to the real implementation. One advice though, instead of using square that contains an array, use two separate values in your struct, that will make it simpler for a first implementation.
I've implemented a stack with pointers, that works like it's suppose too. Now, I need it push to the stack, without it pushing a duplicate. For example, if I push '2' into the stack, pushing another '2' will still result with only one '2' in the stack because it already exists.
Below is how I went about trying to create the new push function. I know that I'm suppose to traverse the stack and check it for the element I'm adding, but I guess I'm doing that wrong? Can anyone help me out?
typedef struct Node {
void *content;
struct Node *next;
} Node;
typedef struct Stack {
Node *head;
int count;
} Stack;
void push(Stack *stack, void *newElem) {
Node *newNode = (Node*) malloc(sizeof(Node));
if (stack->count > 0) {
int i;
for (i = 0, newNode = stack->head; i < stack->count; i++, newNode =
newNode->next) {
if (newNode->content == newElem) return;
}
} else {
newNode->next = stack->head;
newNode->content = newElem;
stack->head = newNode;
stack->count++;
}
}
if (newNode->content == newElem)
You are comparing two pointers. I guess you want to check whether their contents are equal:
#include <string.h>
if (memcmp(newNode->content, newElem, size) == 0)
The value size may be indicated by the caller. In your case, it should be sizeof(int).
Moreover, once you have traversed the stack, you don't add the element to your data structure.
The problem is that if your stack is non-empty, and you don't find the element already in the stack, you don't do anything. You need to get rid of the else keyword and make that code unconditional. Then, you allocate space for the new Node before you know if you need it or not, and, even worse, overwrite the newly allocated pointer with your iteration over the stack to see if you need to push it or not. So move the malloc down after the } ending the if
You already have a working
void push(Stack *stack, void *newElem);
right?
So, why not write a new function
int push_unique(Stack *stack, void *newElem) {
if (find_value(stack, newElem) != NULL) {
return 1; // indicate a collision
}
push(stack, newElem); // re-use old function
return 0; // indicate success
}
Now you've reduced the problem to writing
Node *find_value(Stack *stack, void *value);
... can you do that?
I'm not sure you realized it, but your proposed implementation is performing a linear search over a linked list. If you're pushing 2,000 elements on a stack with an average of 2 duplicates of each element value, that's 2,000 searches of a linked list averaging between 500-750 links(it depends on when, IE:what order, the duplicates are presented to the search function in. This requires 1 million+ compares. Not pretty.
A MUCH more efficient duplicate detection in find_value() above could use a hash table, with search time O(1), or a tree, with search time O(log N). The former if you know how many values you're potentially pushing onto the stack, and the latter if the number is unknown, like when receiving data from a socket in real-time. (if the former you could implement your stack in an array instead of a much slower, and more verbose linked-list)
In either case, to properly maintain the hashtable, your pop() function would need to be paired with a hashtable hashpop() function, which would remove the matching value from the hashtable.
With a Hashtable, your stack could just point to the element's value sitting in it's hash location - returned from find_value(). With a self-balancing tree however, the location of the node, and thus the element value, would be changing all the time, so you'd need to store the element's value in the stack, and the tree. Unless you're writing in a very tight memory environment, the performance the 2nd data structure would afford would be well worth the modest cost in memory.
If I have a snippit of my program like this:
struct Node *node;
while(...){
node = malloc(100);
//do stuff with node
}
This means that every time I loop through the while loop I newly allocate 100 bytes that is pointed to by the node pointer right?
If this is true, then how do I free up all the memory that I have made with all the loops if I only have a pointer left pointing to the last malloc that happened?
Thanks!
Please allocate exactly the size you need: malloc(sizeof *node); -- if you move to a 64-bit platform that doubles the size of all your members, your old 96-byte structure might take 192 bytes in the new environment.
If you don't have any pointers to any of the struct Nodes you have created, then I don't think you should be allocating them with malloc(3) in the first place. malloc(3) is best if your application requires the data to persist outside the calling scope of the current function. I expect that you could re-write your function like this:
struct Node node;
while(...){
//do stuff with node
}
or
while(...){
struct Node node;
//do stuff with node
}
depending if you want access to the last node (the first version) or not (the second version).
Of course, if you actually need those structures outside this piece of code, then you need to store references to them somewhere. Add them to a global list keeping track of struct Node objects, or add each one to the next pointer of the previous struct Node, or add each one to a corresponding struct User that refers to them, whatever is best for your application.
If you set node = NULL before the loop and then use free(node) before node = malloc(100) you should be OK. You will also need to do a free(node) after the loop exits. But then again, it all depends on what "//do stuff with node" actually does. As others have pointed out, malloc(100) is not a good idea. What I would use is malloc(sizeof(*node)). That way, if the type of node changes, you don't have to change the malloc line.
If you don't need the malloc'ed space at the end of one iteration anymore, you should free it right away.
To keep track of the allocated nodes you could save them in a dynamically growing list:
#include <stdlib.h>
int main() {
int i;
void *node;
int prt_len = 0;
void **ptrs = NULL;
for (i = 0; i < 10; i++) {
node = malloc(100);
ptrs = realloc(ptrs, sizeof(void*) * ++prt_len);
ptrs[prt_len-1] = node;
/* code */
}
for (i = 0; i < prt_len; i++) {
free(ptrs[i]);
}
free(ptrs);
return 0;
}
Note: You should probably re-think your algorithm if you need to employ such methods!
Otherwise see sarnold's answer.
then how do I free up all the memory that I have made with all the loops if I only have a pointer left pointing to the last malloc that happened?
You can't. You just created a giant memory leak.
You have to keep track of every chunk of memory you malloc() and free() it when you're done using it.
You can not. You need to store all the pointer to free the memory. if you are saving those pointer somewhere then only you can free the memory.
I'm just reading about malloc() in C.
The Wikipedia article provides an example, however it justs allocate enough memory for an array of 10 ints in comparison with int array[10]. Not very useful.
When would you decided to use malloc() over C handling the memory for you?
Dynamic data structures (lists, trees, etc.) use malloc to allocate their nodes on the heap. For example:
/* A singly-linked list node, holding data and pointer to next node */
struct slnode_t
{
struct slnode_t* next;
int data;
};
typedef struct slnode_t slnode;
/* Allocate a new node with the given data and next pointer */
slnode* sl_new_node(int data, slnode* next)
{
slnode* node = malloc(sizeof *node);
node->data = data;
node->next = next;
return node;
}
/* Insert the given data at the front of the list specified by a
** pointer to the head node
*/
void sl_insert_front(slnode** head, int data)
{
slnode* node = sl_new_node(data, *head);
*head = node;
}
Consider how new data is added to the list with sl_insert_front. You need to create a node that will hold the data and the pointer to the next node in the list. Where are you going to create it?
Maybe on the stack! - NO - where will that stack space be allocated? In which function? What happens to it when the function exits?
Maybe in static memory! - NO - you'll then have to know in advance how many list nodes you have because static memory is pre-allocated when the program loads.
On the heap? YES - because there you have all the required flexibility.
malloc is used in C to allocate stuff on the heap - memory space that can grow and shrink dynamically at runtime, and the ownership of which is completely under the programmer's control. There are many more examples where this is useful, but the one I'm showing here is a representative one. Eventually, in complex C programs you'll find that most of the program's data is on the heap, accessible through pointers. A correct program always knows which pointer "owns" the data and will carefully clean-up the allocated memory when it's no longer needed.
What if you don't know the size of the array when you write your program ?
As an example, we could imagine you want to load an image. At first you don't know its size, so you will have to read the size from the file, allocate a buffer with this size and then read the file in that buffer. Obviously you could not have use a static size array.
EDIT:
Another point is: When you use dynamic allocation, memory is allocated on the heap while arrays are allocated on the stack. This is quite important when you are programming on embedded device as stack can have a limited size compared to heap.
I recommend that you google Stack and Heap.
int* heapArray = (int*)malloc(10 * sizeof(int));
int stackArray[10];
Both are very similar in the way you access the data. They are very different in the way that the data is stored behind the scenes. The heapArray is allocated on the heap and is only deallocted when the application dies, or when free(heapArray) is called. The stackArray is allocated on the stack and is deallocated when the stack unwinds.
In the example you described int array[10] goes away when you leave your stack frame. If you would like the used memory to persist beyond local scope you have to use malloc();
Although you can do variable length arrays as of C99, there's still no decent substitute for the more dynamic data structures. A classic example is the linked list. To get an arbitrary size, you use malloc to allocate each node so that you can insert and delete without massive memory copying, as would be the case with a variable length array.
For example, an arbitrarily sized stack using a simple linked list:
#include <stdio.h>
#include <stdlib.h>
typedef struct sNode {
int payLoad;
struct sNode *next;
} tNode;
void stkPush (tNode **stk, int val) {
tNode *newNode = malloc (sizeof (tNode));
if (newNode == NULL) return;
newNode->payLoad = val;
newNode->next = *stk;
*stk = newNode;
}
int stkPop (tNode **stk) {
tNode *oldNode;
int val;
if (*stk == NULL) return 0;
oldNode = *stk;
*stk = oldNode->next;
val = oldNode->payLoad;
free (oldNode);
return val;
}
int main (void) {
tNode *top = NULL;
stkPush (&top, 42);
printf ("%d\n", stkPop (&top));
return 0;
}
Now, it's possible to do this with variable length arrays but, like writing an operating system in COBOL, there are better ways to do it.
malloc() is used whenever:
You need dynamic memory allocation
If you need to create array of size n, where n is calculated during your program execution, the only way you can do it is using malloc().
You need to allocate memory in heap
Variables defined in some functions live only till the end of this function. So, if some "callstack-independent" data is needed, it must be either passed/returned as function parameter (which is not always suitable), or stored in heap. The only way to store data in heap is to use malloc(). There are variable-size arrays, but they are allocated on stack.