the dynamic array in C - c

Recently I found it was annoyed to deal with array in c language.
I have to realloc() frequently to increase the size.
And there is no standard data structure like vector in C++ or Arraylist in java
I have got to known that in linux kernel, there is some data structure, such as kfifo,
we could use this by kfifo_in(), kfifo_out() function.
But this means the user would define kfifo *pointer; to record the array, and this variable does not contain any info about the type contained in the structure.
The user have to remember that when he try to use the dynamic array by kfifo pointer.
I think it may be a little confusing.
Is there any better way to deal with the problem? What's the common solution in linux c programing?

realloc is not that bad, as long as you do not spread it all over your code, and use a reasonable strategy to grow your dynamic array.
Rolling your own dynamic arrays in C is a matter of implementing a handful of easy functions. Numerous short articles walk you through this exercise - here is one for an example. The article defines a struct that represents your dynamic array, along with the currently used and the allocated size. It also provides functions for initializing, growing, and de-allocating the array represented by the structure. There is no explicit initialization function in the library - you initialize by passing NULL as the first parameter. This is a valid approach, but you could also opt for a more traditional separation of init and grow.

I'd use Glib arrays. It's a very well known library in Linux and other OSes, used in projects like Gnome.
There is no standard for dynamic arrays in C.

#bluesea
I mean they could define struct array{int len; int capacity; int each_element_size; void *data;} and copy the bytes of element, put at the end of the data. – bluesea Jun 29 at 3:04
This is already taken care of in the library under discussion. See the macro's that it comes with and the examples in the main.c file. Depending on the macros's being used, you would either end up with an array of pointers to the original data, or an array of pointers to a copy of the data.
FWIW, I'm the author of the library, and I'll be the first to admit that it comes without airbags, so you have to be sure to use it safely (as with anything else in C).

Related

What is the best way to store integers with void pointers in C?

Hello i am trying to learn and build data structures in c and i want to store integers progressively in the stack.
my struct is like this:
typedef struct STACK_NODE_s *STACK_NODE;
typedef struct STACK_NODE_s{
STACK_NODE forward;
void *storage;
} STACK_NODE_t;
typedef struct L_STACK_s{
STACK_NODE top;
} L_STACK_t, *L_STACK;
In a while loop i want to read and store my chars in integer form.
//assume that str is an proper string
//assume that we have a linked stack called LS
int i=0;
int temp;
while(str[i]!='\0'){
tmp=str[i]-'0';
push(LS,(void *)&tmp);
}
I know this won't work properly as we store the same variable's adress over and over again.
Do i need to allocate an auxiliary array in order to store them 1 by 1 or is there a better way to do this?
The answer must address two separate aspects of your question:
How to organize some collection of items, and where to get the memory from to do that.
First code snippet / Linked list format
The first code snippet is good the way it is.
It sets up a linked list, which has its pros and cons, but serves very well if you don't know the number of items in advance, if you want to be able to quickly remove or insert items somewhere in the middle of the list, and if you don't mind that looking up one certain entry inside the list costs you O(N) effort.
For a generic library-like implementation...
... void* is as good as it goes with ANSI C.
In C++, for example, you could make a template that leaves open the type that is stored in the list (or better yet, you would directly reuse the well-known STL implementation in class forward_list<int>).
Sadly, ANSI C doesn't have something comparable.
One solution is the one you picked, create int objects and hook their addresses into your list of void*.
Another solution for a generic library implementation is to use a precompiler macro for the type, and to define this macro above a header file that holds the generic implementation. This tries to resemble the clean C++ solution, but with precompiler it is not typesafe, so this approach is far from beautiful and comes with several risks.
Second code snippet / Memory allocation
Creating the list with void* instead of int (or whatever non-pointer type) requires you to allocate further memory beside the list.
I. e., it is not only that you have to allocate every list item (= variable of type STACK_NODE_t) but also the actual entry value (e. g., *(int*)(LS->storage)).
This means you have to allocate/deallocate the data in some other way that outlives the stack.
On most systems, you can use malloc/free for that, and you only have to take into account the size of the heap available for malloc and the time de-/allocating takes.
If the list shall implement real-time requirements or on embedded systems, you may not have malloc or you may not be allowed to use it.
Then you have to allocate and implement your own heap (= memory pool of storage items) for your list.
How to implement such a memory pool with desired properties is a separate question that would take us to far here.
In any case, you must not use the pointer to a stack variable (like a local variable inside a function) because the memory "behind" that variable will not be reserved for this purpose once the function exits, and the memory may be used for something different in the meantime.
This is, however, what the second code snippet does apparently.
As you noticed yourself, taking this path...
we store the same variable's adress over and over again.
Reusing the memory position for another entry of the same list is an extreme case of the risk explained above.
I solved the problem using an auxiliary array like i anticipated. If someone comes up with a better solution its more than welcome.

Dynamically creating objects in C

Hello, guys!
I'm familiar with JavaScript and PHP, but new to C.
I am trying to play around with graphics in C and craete colision algorithm. Now, I need to create objects dynamically, just like in more advanced languages. For example, I need to create a polygon via my own function and make it an object that would be visible to the whole script. I assume, a struct would be needed.
As far as I know, everything declared in a function stays in a function. How can I dynamically declare global structs?
C is a fairly static language. By static I mean, you can create memory during run-time, but you will need pointers to address that memory declared at compile time. That is if you are going to need memory during run-time and do not want to declare it during compile time, you will need to use malloc and free (when you've finished with the memory).
To create a global structure whose memory you would create at run time, you would minimally need a pointer to a structure at compile time. If you need several of those structures, you could create several structures' worth of memory, but traversing the structures would be tedious without having an array of those structures. You would need that array of pointers to structures at compile time. There are some ways to make this more dynamic, but in decade or so I used C and C++, we never ran into those other ways, including in device drivers.
When you say create objects in C, you really have no objects you can create other than those created by a function call to a library or creating memory from the heap, and then interpreting that memory by overlapping structure or array pointers over it.
Functions can alter parameters if those parameters are passed in by reference (a pointer to the parameter), and functions can return nothing or return a single atom of data, a char, integer, smallint, or pointer.
a. function can return value.
b. you can use global variables.
c (and probably the most useful). dynamically allocate memory (using malloc,etc) and return pointer to it. (And remember to free it after using)
You need to have a struct or a more complex abstract data type (ADT) to hold your dynamically created variables. Once you have this, you can create the any object you want via malloc(), and store it in there.
As I mentioned earlier, it would be highly recommended to have a look at the ADTs and learn how to work with them. This will allow you to create any complex data structure like queues or linked lists in order to work a little more OOP oriented.
declare global pointers(array of pointers) of the same type as the structure. Use the functions like malloc etc. to dynamically allocate memory and assign it to the pointers.

C Language: Why does malloc() return a pointer, and not the value?

From my understanding of C it seems that you are supposed to use malloc(size) whenever you are trying to initialize, for instance, an array whose size you do not know of until runtime.
But I was wondering why the function malloc() returns a pointer to the location of the variable and why you even need that.
Basically, why doesn't C just hide it all from you, so that whenever you do something like this:
// 'n' gets stdin'ed from the user
...
int someArray[n];
for(int i = 0; i < n; i++)
someArray[i] = 5;
you can do it without ever having to call malloc() or some other function? Do other languages do it like this (by hiding the memory properties/location altogether)? I feel that as a beginner this whole process of dealing with the memory locations of variables you use just confuse programmers (and since other languages don't use it, C seems to make a simple initialization process such as this overly complicated)...
Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore that for now...
C lets you manage every little bit of your program. You can manage when memory gets allocated; you can manage when it gets deallocated; you can manage how to grow a small allocation, etc.
If you prefer not to manage that and let the compiler do it for you, use another language.
Actually C99 allows this (so you're not the only one thinking of it). The feature is called VLA (VAriable Length Array).
It's legal to read an int and then have an array of that size:
int n;
fscanf("%d", &n);
int array[n];
Of course there are limitations since malloc uses the heap and VLAs use the stack (so the VLAs can't be as big as the malloced objects).
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore
that for now...
So we can concentrate on the flame ?
Basically, what I'm trying to ask is why malloc() is even necessary,
because why the language doesn't take care of all that for you
internally without the programmer having to be concerned about or
having to see memory.
The very point of malloc(), it's raison d'être, it's function, if you will, is to allocate a block of memory. The way we refer to a block of memory in C is by its starting address, which is by definition a pointer.
C is close to 40 years old, and it's not nearly as "high level" as some more modern languages. Some languages, like Java, attempt to prevent mistakes and simplify programming by hiding pointers and explicit memory management from the programmer. C is not like that. Why? Because it just isn't.
Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
One of the hallmarks of C is its simplicity (C compilers are relatively easy to implement); one way of making a language simple is to force the programmer to do all his own memory management. Clearly, other languages do manage objects on the heap for you - Java and C# are modern examples, but the concept isn't new at all; Lisp implementations have been doing it for decades. But that convenience comes at a cost in both compiler complexity and runtime performance.
The Java/C# approach helps eliminate whole classes of memory-management bugs endemic to C (memory leaks, invalid pointer dereferences, etc.). By the same token, C provides a level of control over memory management that allows the programmer to achieve high levels of performance that would be difficult (not impossible) to match in other languages.
If the only purpose of dynamic allocation were to allocate variable-length arrays, then malloc() might not be necessary. (But note that malloc() was around long before variable-length arrays were added to the language.)
But the size of a VLA is fixed (at run time) when the object is created. It can't be resized, and it's deallocated only when you leave the scope in which it's declared. (And VLAs, unlike malloc(), don't have a mechanism for reporting allocation failures.)
malloc() gives you a lot more flexibility.
Consider creating a linked list. Each node is a structure, containing some data and a pointer to the next node in the list. You might know the size of each node in advance, but you don't know how many nodes to allocate. For example, you might read lines from a text file, creating and appending a new node for each line.
You can also use malloc() along with realloc() to create a buffer (say, an array of unsigned char) whose size can be changed after you created it.
Yes, there are languages that don't expose pointers, and that handle memory management for you.
A lot of them are implemented in C.
Maybe the question should be "why do you need something like int array[n] when you can use pointers?"
After all, pointers allow you to keep an object alive beyond the scope it was created in, you can use pointer to slice and dice arrays (for example strchr() returns a pointer to a string), pointers are light-weight objects, so it's cheap to pass them to functions and return them from functions, etc.
But the real answer is "that's how it is". Other options are possible, and the proof is that there are other languages that do other things (and even C99 allows different things).
C is treated as highly developed low-level language, basically malloc is used in dynamic arrays which is a key component in stack & queue. for other languages that hides the pointer part from the developer are not well capable of doing hardware related programming.
The short answer to your question is to ponder this question: What if you also need to control exactly when the memory is de-allocated?
C is a compiled language, not an interpreted one. If you don't know n at compile time, how is the compiler supposed to produce a binary?

Is there any case for which returning a structure directly is good practice?

IMO all code that returns structure directly can be modified to return pointer to structure.
When is returning a structure directly a good practice?
Modified how? Returning a pointer to a static instance of the structure within the function, thus making the function non-reentrant; or by returning a pointer to a heap allocated structure that the caller has to make sure to free and do so appropiately? I would consider returning a structure being the good practice in the general case.
The biggest advantage to returning a complete structure instead of a pointer is that you don't have to mess with pointers. By avoiding the risks inherent with pointers, especially if you're allocating and freeing your own memory, both coding and debugging can be significantly simplified.
In many cases, the advantages of passing the structure directly outweigh the downsides (time/memory) of copying the entire structure to the stack. Unless you know that optimization is necessary, no reason not to take the easier path.
I see the following cases as the ones I would most commonly opt for the passing structs directly approach:
"Functional programming" style code. Lots of stuff is passed around and having pointers would complicate the code a lot (and that is not even counting if you need to start using malloc+free)
Small structs, like for example
struct Point{ int x, y; };
aren't worth the trouble of passing stuff around by reference.
And lastly, lets not forget that pass-by-value and pass-by-reference are actually very different so some classes of programs will be more suited to one style and will end up looking ugly if the other style is used instead.
These other answers are good, but I think missingno comes closest to "answering the question" by mentioning small structs. To be more concrete, if the struct itself is only a few machine words long, then both the "space" objection and the "time" objection are overcome. If a pointer is one word, and the struct is two words, how much slower is the struct copy operation vs the pointer copy? On a cached architecture, I suspect the answer is "none aat all". And as for space, 2 words on stack < 1 word on stack + 2 words (+overhead) on heap.
But thes considerations are only appropriate for specific cases: THIS porion of THIS program on THIS architecture.
For the level of writing C programs, you should use whichever is easier to read.
If you're trying to make your function side-effect free, returning a struct directly would help, because it would effectively be pass-by-value. Is it more efficient? No, passing by reference is quicker. But having no side effects can really simplify working with threads (a notoriously difficult task).
There are a few cases where returning a structure by value is contra-indicated:
1) A library function that returns 'token' data that is to be re-used later in other calls, eg. a file or socket stream descriptor. Returning a complete structure would break encapsulation of the library.
2) Structs containing data buffers of variable length where the struct has been sized to accommodate the absolute maximum size of the data but where the average data size is much less, eg. a network buffer struct that has a 'dataLen' int and a 'char data[65536]' at its end.
3) Large structs of any typedef where the cost of copying the data becomes significant, eg:
a) When the struct has to be returned through several function calls - multiple copying of the same data.
b) Where the struct is subsequently queued off to other threads - wide queues means longer lock times during the copy-in/copy-out and so increased chance of contention. That, and the size of the struct is inflicted on both producer and consumer thread stacks.
c) Where the struct is often moved around between layers, eg. protocol stack.
4) Where structs of varying def. are to be stored in any array/list/queue/stack/whateverContainer.
I suspect that I am so corrupted by c++ and other OO languages that I tend to malloc/new almost anything that cannot be stored in a native type
Rgds,
Martin

Is it a best practice to wrap arrays and their length variable in a struct in C?

I will begin to use C for an Operating Systems course soon and I'm reading up on best practices on using C so that headaches are reduced later on.
This has always been one my first questions regarding arrays as they are easy to screw up.
Is it a common practice out there to bundle an array and its associated variable containing it's length in a struct?
I've never seen it in books and usually they always keep the two separate or use something like sizeof(array[]/array[1]) kind of deal.
But with wrapping the two into a struct, you'd be able to pass the struct both by value and by reference which you can't really do with arrays unless using pointers, in which case you have to again keep track of the array length.
I am beginning to use C so the above could be horribly wrong, I am still a student.
Cheers,
Kai.
Yes this is a great practice to have in C. It's completely logical to wrap related values into a containing structure.
I would go ever further. It would also serve your purpose to never modify these values directly. Instead write functions which act on these values as a pair inside your struct to change length and alter data. That way you can add invariant checks and also make it very easy to test.
Sure, you can do that. Not sure if I'd call it a best practice, but it's certainly a good idea to make C's rather rudimentary arrays a bit more manageable. If you need dynamic arrays, it's almost a requirement to group the various fields needed to do the bookkeeping together.
Sometimes you have two sizes in that case: one current, and one allocated. This is a tradeoff where you trade fewer allocations for some speed, paying with a bit of memory overhead.
Many times arrays are only used locally, and are of static size, which is why the sizeof operator is so handy to determine the number of elements. Your syntax is slightly off with that, by the way, here's how it usually looks:
int array[4711];
int i;
for(i = 0; i < sizeof array / sizeof *array; i++)
{
/* Do stuff with each element. */
}
Remember that sizeof is not a function, the parenthesis are not always needed.
EDIT: One real-world example of a wrapping exactly as that which you describe is the GArray type provided by glib. The user-visible part of the declaration is exactly what you describe:
typedef struct {
gchar *data;
guint len;
} GArray;
Programs are expected to use the provided API to access the array whenever possible, not poke these fields directly.
There are three ways.
For static array (not dynamically allocated and not passed as pointer) size is knows at compile time so you can used sizeof operator, like this: sizeof(array)/sizeof(array[0])
Use terminator (special value for last array element which cannot be used as regular array value), like null-terminated strings
Use separate value, either as a struct member or independent variable. It doesn't really matter because all the standard functions that work with arrays take separate size variable, however joining the array pointer and size into one struct will increase code readability. I suggest to use to have a cleaner interface for your own functions. Please note that if you pass your struct by value, called function will be able to change the array, but not the size variable, so passing struct pointer would be a better option.
For public API I'd go with the array and the size value separated. That's how it is handled in most (if not all) c library I know. How you handle it internally it's completely up to you. So using a structure plus some helper functions/macros that do the tricky parts for you is a good idea. It's always making me head-ache to re-think how to insert an item or to remove one, so it's a good source of problems. Solving that once and generic, helps you getting bugs from the beginning low. A nice implementation for dynamic and generic arrays is kvec.
I'd say it's a good practice. In fact, it's good enough that in C++ they've put it into the standard library and called it vector. Whenever you talk about arrays in a C++ forum, you'll get inundated with responses that say to use vector instead.
I don't see anything wrong with doing that but I think the reason that that is not usually done is because of the overhead incurred by such a structure. Most C is bare-metal code for performance reasons and as such, abstractions are often avoided.
I haven't seen it done in books much either, but I've been doing the same the same thing for a while now. It just seems to make sense to "package" those things together. I find it especially useful if you need to return an allocated array from a method for instance.
If you use static arrays you have access to the size of array using sizeof operator. If you'll put it into struct, you can pass it to function by value, reference and pointer. Passing argument by reference and by pointer is the same on assembly level (I'm almost sure of it).
But if you use dynamic arrays, you don't know the size of array at compile time. So you can store this value in struct, but you will also store only a pointer to array in structure:
struct Foo {
int *myarray;
int size;
};
So you can pass this structure by value, but what you realy do is passing pointer to int (pointer to array) and int (size of array).
In my opinion it won't help you much. The only thing that is in plus, is that you store the size and the array in one place and it is easy to get the size of the array. If you will use a lot of dynamic arrays you can do it this way. But if you will use few arrays, easier will be not to use structures.
I've never seen it done that way, but I haven't done OS level work in over a decade... :-) Seems like a reasonable approach at first glance. Only concern would be to make sure that the size somehow stays accurate... Calculating as needed doesn't have that concern.
considering you can calculate the length of the array (in bytes, that is, not # of elements) and the compiler will replace the sizeof() calls with the actual value (its not a function, calls to it are replaced by the compiler with the value it 'returns'), then the only reason you'd want to wrap it in a struct is for readability.
It isn't a common practice, if you did it, someone looking at your code would assume the length field was some special value, and not just the array size. That's a danger lurking in your proposal, so you'd have to be careful to comment it properly.
I think this part of your question is backwards:
"But with wrapping the two into a struct, you'd be able to pass the struct both by value and by reference which you can't really do with arrays unless using pointers, in which case you have to again keep track of the array length."
Using pointers when passing arrays around is the default behavior; that doesn't buy you anything in terms of passing the entire array by value. If you want to copy the entire array instead of having it decay to a pointer you NEED to wrap the array in a struct. See this answer for more information:
Pass array by value to recursive function possible?
And here is more on the special behavior of arrays:
http://denniskubes.com/2012/08/20/is-c-pass-by-value-or-reference/

Resources