I've been looking into using C over C++ as I find it cleaner and the main thing I find it to lack is a vector like array.
What is the best implementation of this?
I want to just be able to call something like vector_create, vector_at, vector_add, etc.
EDIT
This answer is from a million years ago, but at some point, I actually implemented a macro-based, efficient, type-safe vector work-alike in C that covers all the typical features and needs. You can find it here:
https://github.com/eteran/c-vector
Original answer below.
What about a vector are you looking to replicate? I mean in the end, it all boils down to something like this:
int *create_vector(size_t n) {
return malloc(n * sizeof(int));
}
void delete_vector(int *v) {
free(v);
}
int *resize_vector(int *v, size_t n) {
return realloc(v, n * sizeof(int));
/* returns NULL on failure here */
}
You could wrap this all up in a struct, so it "knows its size" too, but you'd have to do it for every type (macros here?), but that seems a little uneccessary... Perhaps something like this:
typedef struct {
size_t size;
int *data;
} int_vector;
int_vector *create_vector(size_t n) {
int_vector *p = malloc(sizeof(int_vector));
if(p) {
p->data = malloc(n * sizeof(int));
p->size = n;
}
return p;
}
void delete_vector(int_vector *v) {
if(v) {
free(v->data);
free(v);
}
}
size_t resize_vector(int_vector *v, size_t n) {
if(v) {
int *p = realloc(v->data, n * sizeof(int));
if(p) {
v->data = p;
v->size = n;
}
return v->size;
}
return 0;
}
int get_vector(int_vector *v, size_t n) {
if(v && n < v->size) {
return v->data[n];
}
/* return some error value, i'm doing -1 here,
* std::vector would throw an exception if using at()
* or have UB if using [] */
return -1;
}
void set_vector(int_vector *v, size_t n, int x) {
if(v) {
if(n >= v->size) {
resize_vector(v, n);
}
v->data[n] = x;
}
}
After which, you could do:
int_vector *v = create_vector(10);
set_vector(v, 0, 123);
I dunno, it just doesn't seem worth the effort.
The most complete effort I know of to create a comprehensive set of utility types in C is GLib. For your specific needs it provides g_array_new, g_array_append_val and so on. See GLib Array Documentation.
Rather than going off on a tangent in the comments to #EvanTeran's answer I figured I'd submit a longer reply here.
As various comments allude to there's really not much point in trying to replicate the exact behavior of std::vector since C lacks templates and RAII.
What can however be useful is a dynamic array implementation that just works with bytes. This can obviously be used directly for char* strings, but can also easily be adapted for usage with any other types as long as you're careful to multiply the size parameter by sizeof(the_type).
Apache Portable Runtime has a decent set of array functions and is all C.
See the tutorial for a quick intro.
If you can multiply, there's really no need for a vector_create() function when you have malloc() or even calloc(). You just have to keep track of two values, the pointer and the allocated size, and send two values instead of one to whatever function you pass the "vector" to (if the function actually needs both the pointer and the size, that is). malloc() guarantees that the memory chunk is addressable as any type, so assign it's void * return value to e.g. a struct car * and index it with []. Most processors access array[index] almost as fast as variable, while a vector_at() function can be many times slower. If you store the pointer and size together in a struct, only do it in non time-critical code, or you'll have to index with vector.ptr[index]. Delete the space with free().
Focus on writing a good wrapper around realloc() instead, that only reallocates on every power of e.g. 2 or 1.5. See user786653's Wikipedia link.
Of course, calloc(), malloc() and realloc() can fail if you run out memory, and that's another possible reason for wanting a vector type. C++ has exceptions that automatically terminate the program if you don't catch it, C doesn't. But that's another discussion.
Lack of template functionality in C makes it impossible to support a vector like structure. The best you can do is to define a 'generic' structure with some help of the preprocessor, and then 'instantiate' for each of the types you want to support.
Related
In C++ we have structures which have a constructor and the destructor. It makes life much easier especially when it going to have the pointers, therefore dynamically allocated memory in the structure. You can even use std::shared_pointer library to deal with the pointers.
class A{
private:
int size;
double* stack;
public:
A(int size) : this->size(size){}
~A(){free(stack);}
};
But my maths lecturer doesn't like C++ and prefers everything in C. So I had to use C instead and came up with the following structure:
typedef struct vectorOfDoubles{
double* stack;
int size;
} vector;
I made up function that calculate the median of the vector of doubles.
double median_(const vector* v) {
vector n; // creates vector class object
n.stack = (double*)malloc(sizeof(double)*(n.size = v->size)); // takes double ptr and allocates the right amount of memory for it
memcpy(n.stack, v->stack, sizeof(double)*n.size); // copies the array of doubles.
sort(&n); // sorts the array of doubles
if(v->size%2) // checks for odd size
return n.stack[(v->size/2+1)]; // return median for odd size
else
return (n.stack[(v->size/2)]+n.stack[(v->size/2+1)])/2; // return median for even size
}
As an example of the bad practices I didn't free the memory. When the function returns its value it destructs the local variables and structures. But my structure has a pointer that holds the allocated memory. Unlikely after some research on the internet I didn't find any good destruction method solution for these situations.
My question is how the old-school C programmers dealt with those situations when they want to free the pointers in the structure but they do not have the destructor for structure that would execute itself to do a certain job?
A simple pointer needs a *alloc() and lastly a free().
A structure with dynamic fields deserves a crafted vector_alloc() and vector_free().
An old school flavored result:
// Return non-0 on error
int vector_alloc(vector *ptr, size_t size) {
assert(ptr);
ptr->stack = calloc(size, sizeof *(ptr->stack));
if (ptr->stack) {
ptr->size = size;
return 0;
}
ptr->size = 0;
return 1;
}
void vector_free(vector *ptr) {
assert(ptr);
free(ptr->stack);
ptr->stack = NULL;
ptr->size = 0;
}
double median_(const vector* v) {
vector n;
if (vector_alloc(&n, v->size)) return 0.0/0.0;
memcpy(n, v->stack, sizeof *n->stack *n.size);
sort(&n);
double y;
if(v->size%2)
y = n.stack[(v->size/2+1)];
else
y = (n.stack[(v->size/2)]+n.stack[(v->size/2+1)])/2;
vector_free(&n);
return y;
}
If you want to free an object that uses some kind of hierarchical storage (i.e., internal pointers to other object) in C, I usually write a free function specific to that object. For instance:
void free_my_obj(my_obj_t *my_obj)
{
free(my_obj->a);
free_my_obj2(my_obj->my_obj2);
free(my_obj);
}
At beginning: n.stack=…malloc();
At end: free(n.stack);
Every malloc() near beginning of block, should have a matching free(); near end of block.
is it possible to 'dynamically' allocate file pointers in C?
What I mean is this :
FILE **fptr;
fptr = (FILE **)calloc(n, sizeof(FILE*));
where n is an integer value.
I need an array of pointer values, but I don't know how many before I get a user-input, so I can't hard-code it in.
Any help would be wonderful!
You're trying to implement what's sometimes called a flexible array (or flex array), that is, an array that changes size dynamically over the life of the program.) Such an entity doesn't exist among in C's native type system, so you have to implement it yourself. In the following, I'll assume that T is the type of element in the array, since the idea doesn't have anything to do with any specific type of content. (In your case, T is FILE *.)
More or less, you want a struct that looks like this:
struct flexarray {
T *array;
int size;
}
and a family of functions to initialize and manipulate this structure. First, let's look at the basic accessors:
T fa_get(struct flexarray *fa, int i) { return fa->array[i]; }
void fa_set(struct flexarray *fa, int i, T p) { fa->array[i] = p; }
int fa_size(struct flexarray *fa) { return fa->size; }
Note that in the interests of brevity these functions don't do any error checking. In real life, you should add bounds-checking to fa_get and fa_set. These functions assume that the flexarray is already initialized, but don't show how to do that:
void fa_init(struct flexarray *fa) {
fa->array = NULL;
fa->size = 0;
}
Note that this starts out the flexarray as empty. It's common to make such an initializer create an array of a fixed minimum size, but starting at size zero makes sure you exercise your array growth code (shown below) and costs almost nothing in most practical circumstances.
And finally, how do you make a flexarray bigger? It's actually very simple:
void fa_grow(struct flexarray *fa) {
int newsize = (fa->size + 1) * 2;
T *newarray = malloc(newsize * sizeof(T));
if (!newarray) {
// handle error
return;
}
memcpy(newaray, fa->array, fa->size * sizeof(T));
free(fa->array);
fa->array = newarray;
fa->size = newsize;
}
Note that the new elements in the flexarray are uninitialized, so you should arrange to store something to each new index i before fetching from it.
Growing flexarrays by some constant multiplier each time is generally speaking a good idea. If instead you increase it's size by a constant increment, you spend quadratic time copying elements of the array around.
I haven't showed the code to shrink an array, but it's very similar to the growth code,
Any way it's just pointers so you can allocate memory for them
but don't forget to fclose() each file pointer and then free() the memory
I'm writing a function which increases the size of a dynamic memory object created with malloc. The function should as arguments take a pointer to the memory block to be increased, the current size of the block and the amount the block is going to be increased.
Something like this:
int getMoreSpace(void **pnt, int size, int add) {
xxxxxx *tmp; /* a pointer to the same as pnt */
if (tmp = realloc(pnt, (size+add)*sizeof(xxxxxx))) { /* get size of what pnt points to */
*pnt=tmp;
return 1;
else return 0;
}
The problem is that I want the function to work no matter what pnt points to. How do I achieve that?
This type of function cannot possibly work, because pnt is local and the new pointer is lost as soon as the function returns. You could take an argument of type xxxxxx ** so that you could update the pointer, but then you're stuck with only supporting a single type.
The real problem is that you're writing an unnecessary and harmful wrapper for realloc. Simply use realloc directly as it was meant to be used. There is no way to make it simpler or more efficient by wrapping it; it's already as simple as possible.
You pass in the size as an argument. You can use a convenience macro to make it look the same as your function:
#define getMoreSpace(P, SZ, ADD) getMoreSpaceFunc(&(P), sizeof(*(P)), (SZ), (ADD))
int getMoreSpace(void **pnt, size_t elem_size, int size, int add) {
*pnt = ...
}
Edit to show that your convenience macro would also need to add call-by-reference semantics.
Pass in the element size as a separate parameter:
int getMoreSpace(void **pnt, int size, int add, size_t eltSize)
{
void *tmp;
if (tmp = realloc(pnt, (size+add)*eltSize))
{
*pnt=tmp;
return 1;
}
else
return 0;
}
...
int *p = malloc(100 * sizeof *p);
...
if (!getMoreSpace(&p, 100, 20, sizeof *p))
{
// panic
}
It's the most straightforward solution, if not the most elegant. C just doesn't lend itself to dynamic type information.
Edit
Changed the type of pnt in response to Steve's comment.
As caf points out, this won't work, even with the "fix" per Steve. R.'s right; don't do this.
I have a dynamic array of structures, so I thought I could store the information about the array in the first structure.
So one attribute will represent the amount of memory allocated for the array and another one representing number of the structures actually stored in the array.
The trouble is, that when I put it inside a function that fills it with these structures and tries to allocate more memory if needed, the original array gets somehow distorted.
Can someone explain why is this and how to get past it?
Here is my code
#define INIT 3
typedef struct point{
int x;
int y;
int c;
int d;
}Point;
Point empty(){
Point p;
p.x=1;
p.y=10;
p.c=100;
p.d=1000; //if you put different values it will act differently - weird
return p;
}
void printArray(Point * r){
int i;
int total = r[0].y+1;
for(i=0;i<total;i++){
printf("%2d | P [%2d,%2d][%4d,%4d]\n",i,r[i].x,r[i].y,r[i].c,r[i].d);
}
}
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
}
void enter(Point* r,int c){
int i;
for(i=1;i<c;i++){
r[r[0].y+1]=empty();
r[0].y++;
if( (r[0].y+2) >= r[0].x ){ /*when the amount of Points is near
*the end of allocated memory.
reallocate the array*/
reallocFunction(r);
}
}
}
int main(int argc, char** argv) {
Point * r=(Point *) malloc ( sizeof ( Point ) * INIT );
r[0]=empty();
r[0].x=INIT; /*so here I store for how many "Points" is there memory
//in r[0].y theres how many Points there are.*/
enter(r,5);
printArray(r);
return (0);
}
Your code does not look clean to me for other reasons, but...
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
}
The problem here is that r in this function is the parameter, hence any modifications to it are lost when the function returns. You need some way to change the caller's version of r. I suggest:
Point * // Note new return type...
reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
return r; // Note: now we return r back to the caller..
}
Then later:
r = reallocFunction(r);
Now... Another thing to consider is that realloc can fail. A common pattern for realloc that accounts for this is:
Point *reallocFunction(Point * r){
void *new_buffer = realloc(r, r[0].x*2*sizeof(Point));
if (!new_buffer)
{
// realloc failed, pass the error up to the caller..
return NULL;
}
r = new_buffer;
r[0].x*=2;
r[0].y++;
return r;
}
This ensures that you don't leak r when the memory allocation fails, and the caller then has to decide what happens when your function returns NULL...
But, some other things I'd point out about this code (I don't mean to sound like I'm nitpicking about things and trying to tear them apart; this is meant as constructive design feedback):
The names of variables and members don't make it very clear what you're doing.
You've got a lot of magic constants. There's no explanation for what they mean or why they exist.
reallocFunction doesn't seem to really make sense. Perhaps the name and interface can be clearer. When do you need to realloc? Why do you double the X member? Why do you increment Y? Can the caller make these decisions instead? I would make that clearer.
Similarly it's not clear what enter() is supposed to be doing. Maybe the names could be clearer.
It's a good thing to do your allocations and manipulation of member variables in a consistent place, so it's easy to spot (and later, potentially change) how you're supposed to create, destroy and manipulate one of these objects. Here it seems in particular like main() has a lot of knowledge of your structure's internals. That seems bad.
Use of the multiplication operator in parameters to realloc in the way that you do is sometimes a red flag... It's a corner case, but the multiplication can overflow and you can end up shrinking the buffer instead of growing it. This would make you crash and in writing production code it would be important to avoid this for security reasons.
You also do not seem to initialize r[0].y. As far as I understood, you should have a r[0].y=0 somewhere.
Anyway, you using the first element of the array to do something different is definitely a bad idea. It makes your code horribly complex to understand. Just create a new structure, holding the array size, the capacity, and the pointer.
Which is considered better style?
int set_int (int *source) {
*source = 5;
return 0;
}
int main(){
int x;
set_int (&x);
}
OR
int *set_int (void) {
int *temp = NULL;
temp = malloc(sizeof (int));
*temp = 5;
return temp;
}
int main (void) {
int *x = set_int ();
}
Coming for a higher level programming background I gotta say I like the second version more. Any, tips would be very helpful. Still learning C.
Neither.
// "best" style for a function which sets an integer taken by pointer
void set_int(int *p) { *p = 5; }
int i;
set_int(&i);
Or:
// then again, minimise indirection
int an_interesting_int() { return 5; /* well, in real life more work */ }
int i = an_interesting_int();
Just because higher-level programming languages do a lot of allocation under the covers, does not mean that your C code will become easier to write/read/debug if you keep adding more unnecessary allocation :-)
If you do actually need an int allocated with malloc, and to use a pointer to that int, then I'd go with the first one (but bugfixed):
void set_int(int *p) { *p = 5; }
int *x = malloc(sizeof(*x));
if (x == 0) { do something about the error }
set_int(x);
Note that the function set_int is the same either way. It doesn't care where the integer it's setting came from, whether it's on the stack or the heap, who owns it, whether it has existed for a long time or whether it's brand new. So it's flexible. If you then want to also write a function which does two things (allocates something and sets the value) then of course you can, using set_int as a building block, perhaps like this:
int *allocate_and_set_int() {
int *x = malloc(sizeof(*x));
if (x != 0) set_int(x);
return x;
}
In the context of a real app, you can probably think of a better name than allocate_and_set_int...
Some errors:
int main(){
int x*; //should be int* x; or int *x;
set_int(x);
}
Also, you are not allocating any memory in the first code example.
int *x = malloc(sizeof(int));
About the style:
I prefer the first one, because you have less chances of not freeing the memory held by the pointer.
The first one is incorrect (apart from the syntax error) - you're passing an uninitialised pointer to set_int(). The correct call would be:
int main()
{
int x;
set_int(&x);
}
If they're just ints, and it can't fail, then the usual answer would be "neither" - you would usually write that like:
int get_int(void)
{
return 5;
}
int main()
{
int x;
x = get_int();
}
If, however, it's a more complicated aggregate type, then the second version is quite common:
struct somestruct *new_somestruct(int p1, const char *p2)
{
struct somestruct *s = malloc(sizeof *s);
if (s)
{
s->x = 0;
s->j = p1;
s->abc = p2;
}
return s;
}
int main()
{
struct somestruct *foo = new_somestruct(10, "Phil Collins");
free(foo);
return 0;
}
This allows struct somestruct * to be an "opaque pointer", where the complete definition of type struct somestruct isn't known to the calling code. The standard library uses this convention - for example, FILE *.
Definitely go with the first version. Notice that this allowed you to omit a dynamic memory allocation, which is SLOW, and may be a source of bugs, if you forget to later free that memory.
Also, if you decide for some reason to use the second style, notice that you don't need to initialize the pointer to NULL. This value will either way be overwritten by whatever malloc() returns. And if you're out of memory, malloc() will return NULL by itself, without your help :-).
So int *temp = malloc(sizeof(int)); is sufficient.
Memory managing rules usually state that the allocator of a memory block should also deallocate it. This is impossible when you return allocated memory. Therefore, the second should be better.
For a more complex type like a struct, you'll usually end up with a function to initialize it and maybe a function to dispose of it. Allocation and deallocate should be done separately, by you.
C gives you the freedom to allocate memory dynamically or statically, and having a function work only with one of the two modes (which would be the case if you had a function that returned dynamically allocated memory) limits you.
typedef struct
{
int x;
float y;
} foo;
void foo_init(foo* object, int x, float y)
{
object->x = x;
object->y = y;
}
int main()
{
foo myFoo;
foo_init(&foo, 1, 3.1416);
}
In the second one you would need a pointer to a pointer for it to work, and in the first you are not using the return value, though you should.
I tend to prefer the first one, in C, but that depends on what you are actually doing, as I doubt you are doing something this simple.
Keep your code as simple as you need to get it done, the KISS principle is still valid.
It is best not to return a piece of allocated memory from a function if somebody does not know how it works they might not deallocate the memory.
The memory deallocation should be the responsibility of the code allocating the memory.
The first is preferred (assuming the simple syntax bugs are fixed) because it is how you simulate an Out Parameter. However, it's only usable where the caller can arrange for all the space to be allocated to write the value into before the call; when the caller lacks that information, you've got to return a pointer to memory (maybe malloced, maybe from a pool, etc.)
What you are asking more generally is how to return values from a function. It's a great question because it's so hard to get right. What you can learn are some rules of thumb that will stop you making horrid code. Then, read good code until you internalize the different patterns.
Here is my advice:
In general any function that returns a new value should do so via its return statement. This applies for structures, obviously, but also arrays, strings, and integers. Since integers are simple types (they fit into one machine word) you can pass them around directly, not with pointers.
Never pass pointers to integers, it's an anti-pattern. Always pass integers by value.
Learn to group functions by type so that you don't have to learn (or explain) every case separately. A good model is a simple OO one: a _new function that creates an opaque struct and returns a pointer to it; a set of functions that take the pointer to that struct and do stuff with it (set properties, do work); a set of functions that return properties of that struct; a destructor that takes a pointer to the struct and frees it. Hey presto, C becomes much nicer like this.
When you do modify arguments (only structs or arrays), stick to conventions, e.g. stdc libraries always copy from right to left; the OO model I explained would always put the structure pointer first.
Avoid modifying more than one argument in one function. Otherwise you get complex interfaces you can't remember and you eventually get wrong.
Return 0 for success, -1 for errors, when the function does something which might go wrong. In some cases you may have to return -1 for errors, 0 or greater for success.
The standard POSIX APIs are a good template but don't use any kind of class pattern.