I am working with an external library that is storing void* inside of a queue. For example,
void queue_insert(Queue* queue, void* data);
However, I want to store size_t data inside the queue instead. I can't pass the address of the size_t data (because it's locally scoped). I never need to access the data again, though.
In other words, I will be calling these functions
queue_insert(queue, 5);
bool exists = queue_contains(queue, 5);
but I will never be doing the following (because it doesn't make sense)
void* p = queue_pop(queue);
size_t s = *p;
With that being said, can I pass size_t variables to a function that takes a void*?
Typically, when a general-purpose library offers algorithms such as queue management, sorting, spawning threads, and so on, a void * parameters essentially means “I will take a handle to anything you want. Just put it in memory and give me a pointer to it. I will give you the pointer back when you need it.”
This is often used with a structure. Need to manage job information passed to a thread, like starting and ending indices, configuration parameters, and more? Define a structure type, allocate memory for it, pass the address of the memory for the library’s void * parameter. But it can also be used with a scalar item too.
To pass a size_t, use malloc to allocate space for the size_t, put the value in the allocated space, and pass the address to queue_insert.
When you are popping an element, use size_t *s = queue_pop(queue);. Then *s is the size_t you stored. When you are doing with it, free the memory with free(s).
Related
I am putting together a project in C where I must pass around a variable length byte sequence, but I'm trying to limit malloc calls due to potentially limited heap.
Say I have a struct, my_struct, that contains the variable length byte sequence, ptr, and a function, my_func, that creates an instance of my_struct. In my_func, my_struct.ptr is malloc'd and my_struct is returned by value. my_struct will then be used by other functions being passed by value: another_func. Code below.
Is this "safe" to do against memory leaks provided somewhere on the original or any copy of my_struct when passed by value, I call my_struct_destroy or free the malloc'd pointer? Specifically, is there any way that when another_func returns, that inst.ptr is open to being rewritten or dangling?
Since stackoverflow doesn't like opinion-based questions, are there any good references that discuss this behavior? I'm not sure what to search for.
typedef struct {
char * ptr;
} my_struct;
// allocates n bytes to pointer in structure and initializes.
my_struct my_func(size_t n) {
my_struct out = {(char *) malloc(n)};
/* initialization of out.ptr */
return out;
}
void another_func(my_struct inst) {
/*
do something using the passed-by-value inst
are there problems with inst.ptr here or after this function returns?
*/
}
void my_struct_destroy(my_struct * ms_ptr) {
free(ms_ptr->ptr);
ms_ptr->ptr = NULL;
}
int main() {
my_struct inst = my_func(20);
another_func(inst);
my_struct_destroy(&inst);
}
I's safe to pass and return a struct containing a pointer by value as you did it. It contains a copy of ptr. Nothing is changed in the calling function. There would, of course, be a big problem if another_func frees ptr and then the caller tries to use it or free it again.
Locality of alloc+free is a best practice. Wherever possible, make the function that allocates an object also responsible for freeing it. Where that's not feasible, malloc and free of the same object should be in the same source file. Where that's not possible (think complex graph data structure with deletes), the collection of files that manage objects of a given type should be clearly identified and conventions documented. There's a common technique useful for programs (like compilers) that work in stages where much of the memory allocated in one stage should be freed before the next starts. Here, memory is only malloced in big blocks by a manager. From these, the manager allocs objects of any size. But it knows only one way to free: all at once, presumably at the end of a stage. This is a gcc idea: obstacks. When allocation is more complex, bigger systems implement some kind of garbage collector. Beyond these ideas, there are as many ways to manage C storage as there are colors. Sorry I don't have any pointers to references (pun intended :)
If you only have one variable-length field and its size doesn't need to be dynamically updated, consider making the last field in the struct an array to hold it. This is okay with the C standard:
typedef struct {
... other fields
char a[1]; // variable length
} my_struct;
my_struct my_func(size_t n) {
my_struct *p = malloc(sizeof *p + (n - 1) * sizeof p->a[0]);
... initialize fields of p
return p;
}
This avoids the need to separately free the variable length field. Unfortunately it only works for one.
If you're okay with gcc extensions, you can allocate the array with size zero. In C 99, you can get the same effect with a[]. This avoids the - 1 in the size calculation.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
If I need to write a function that returns an array: int*, which way is better?
int* f(..data..)
or: void f(..data..,int** arr)
and we call f like this: int* x; f(&x);. (maybe they are both the same but I am not sure. but if I need to return an ErrorCode(it's an enum) too, then in the first way f will get ErrorCode* and in the second way, f will return an ErrorCode).
Returning an array is just returning a variable amount of data.
That's a really old problem, and C programmers developed many answers for it:
Caller passes in buffer.
The neccessary size is documented and not passed, too short buffers are Undefined Behavior: strcpy()
The neccessary size is documented and passed, errors are signaled by the return value: strcpy_s()
The buffer size is passed by pointer, and the called function reallocates with the documented allocator as needed: POSIX getline()
The neccessary size is unknown, but can be queried by calling the function with buffer-length 0: snprintf()
The neccessary size is unknown and cannot be queried, as much as fits in a buffer of passed size is returned. If neccessary, additional calls must be made to get the rest: fread()
⚠ The neccessary size is unknown, cannot be queried, and passing too small a buffer is Undefined Behavior. This is a design defect, therefore the function is deprecated / removed in newer versions, and just mentioned here for completeness: gets().
Caller passes a callback:
The callback-function gets a context-parameter: qsort_s()
The callback-function gets no context-parameter. Getting the context requires magic: qsort()
Caller passes an allocator: Not found in the C standard library. All allocator-aware C++ containers support that though.
Callee contract specifies the deallocator. Calling the wrong one is Undefined Behavior: fopen()->fclose() strdup()->free()
Callee returns an object which contains the deallocator: COM-Objects
Callee uses an internal shared buffer: asctime()
Be aware that either the returned array must contain a sentinel object or other marker, you have to return the length separately, or you have to return a struct containing a pointer to the data and the length.
Pass-by-reference (pointer to size or such) helps there.
In general, whenever the user has to guess the size or look it up in the manual, he will sometimes get it wrong. If he does not get it wrong, a later revision might invalidate his careful work, so it doesn't matter he was once right. Anyway, this way lies madness (UB).
For the rest, choose the most comfortable and efficient one you can.
Regarding an error code: Remember there's errno.
Usually it's more convenient and semantic to return the array
int* f(..data..)
If ever you need complexe error handling (e.g., returning errors values), you should return the error as an int, and the array by value.
There is no "better" here: you decide which approach fits the needs of the callers better.
Note that both functions are bound to give a user an array that they allocate internally, so deallocating the resultant array becomes a responsibility of the caller. In other words, somewhere inside f() you would have a malloc, and the user who receives the data must call free() on it.
You have another option here - let the caller pass the array into you, and return back a number that says how many items you put back into it:
size_t f(int *buffer, size_t max_length)
This approach lets the caller pass you a buffer in a static or in the automatic memory, thus improving flexibility.
the classic model is (assuming you need to return error code too)
int f(...., int **arr)
even though it doesnt flow so nicely as a function returning the array
Note this is why the lovely go language supports multiple return values.
Its also one of the reasons for exceptions - it gets the error indicators out of the function i/o space
The first one is better if there is no requirement to deal with an already existent pointer in the function.
The second one is used when you already have a defined pointer that points to an already allocated container (for example a list) and inside the function the value of the pointer can be changed.
If you must call f like int* x; f(&x);, you do not have much of a choice. You must use the second syntax, i.e., void f(..data..,int** arr). This is because you are not using return value anyways in your code.
The approach depends on a specific task and perhaps on your personal taste or a coding convention adopted in your project.
In general, I'd like to pass pointers as "output" parameters instead of return'ing an array for a number of reasons.
You likely want to return a number of elements in the array together with the array itself. But if you do this:
int f(const void* data, int** out_array);
Then if you see the signature first time, you can't quite tell what the function returns, the number of elements, or an error code, so I prefer to do this:
void f(const void* data, int** out_array, int* out_array_nelements);
Or even better:
void f(const void* data, int** out_array, size_t* out_array_nelements);
The function signature must be self-explanatory, and the parameter names help to achieve that.
The output array needs to be stored somewhere. You need to allocate some memory for the array. If you return a pointer to the array without passing the same pointer as argument, then you can't allocate memory on the stack. I mean, you cannot do this:
int f (const void *data) {
int array[10];
return array; /* the array is likely deallocated when the function exits */
}
Instead, you have to do static int array[10] (which is not thread-safe) or int *array = malloc(...) which leads to memory leaks.
So I suggest you to pass a pointer to the array which is already allocated before the function call, like this:
void f(const void *data, int* out_array, size_t* out_nelements, size_t max_nelements);
The benefit is you are free to choose where to allocate the array:
On the stack:
int array[10] = { 0 };
size_t max_nelements = sizeof(array)/sizeof(array[0]);
size_t nelements = 0;
f(data, array, &nelements, max_nelements);
Or in the heap:
size_t nelements = 0;
size_t max_nelements = 10;
int *array = malloc(max_nelements * sizeof(int));
f(data, array, &nelements, max_nelements);
See, with this approach you are free to choose how to allocate the memory.
I have an assignment where I need to read from and write to a memory block (pre-allocated), to do so, I need to implement two functions:
memory_read(base,offset,size);
memory_write(base,offset,size,buffer);
No problem so far, I successfully implemented the writing part. The problem is with the memory_read. I need it to return a chunk of data (perhaps void*), then I can cast it to whatever I am expecting outside and use it.
Just as an example. Let's say I have written a serialised structure to that memory. I would like to do something like this
void *variable;
variable = memory_read(pointer_to_where_it_is,offset,sizeof(some_structure);
(some_structure) variable // and use it for something
And this is what memory_read does
void *memory_read(void *base, int offset, int size){
void *buffer;
buffer = malloc(size);
memcpy(&buffer,base+offset,size);
return buffer;
}
Of course, since buffer lives in the function stack, I lose reference to it on return.
Any idea on how to do it? I am not allowed to modify the function parameters by the teacher, otherwise I would have passed &variable as a parameter.
Thanks!
In this line:
memcpy(&buffer,base+offset,size);
the problem is that you are trying to copy a block of memory into a stack-allocated-variable buffer instead of the heap-allocated block of memory that this variable is pointing to. The fix is to remove the &:
memcpy(buffer, base+offset, size);
Otherwise your code is fine.
UPDATE: Don't forget to free() the returned buffer so the allocated memory doesn't "leak" :)
I'm creating a function that returns a string. The size of the string is known at runtime, so I'm planning to use malloc(), but I don't want to give the user the responsibility for calling free() after using my function's return value.
How can this be achieved? How do other functions that return strings (char *) work (such as getcwd(), _getcwd(), GetLastError(), SDL_GetError())?
Your challenge is that something needs to release the resources (i.e. cause the free() to happen).
Normally, the caller frees the allocated memory either by calling free() directly (see how strdup users work for instance), or by calling a function you provide the wraps free. You might, for instance, require callers to call a foo_destroy function. As another poster points out you might choose to wrap that in an opaque struct, though that's not necessary as having your own allocation and destroy functions is useful even without that (e.g. for resource tracking).
However, another way would be to use some form of clean-up function. For instance, when the string is allocated, you could attach it to a list of resources allocated in a pool, then simply free the pool when done. This is how apache2 works with its apr_pool structure. In general, you don't free() anything specifically under that model. See here and (easier to read) here.
What you can't do in C (as there is no reference counting of malloc()d structures) is directly determine when the last 'reference' to an object goes out of scope and free it then. That's because you don't have references, you have pointers.
Lastly, you asked how existing functions return char * variables:
Some (like strdup, get_current_dir_name and getcwd under some circumstances) expect the caller to free.
Some (like strerror_r and getcwd in under other circumstances) expect the caller to pass in a buffer of sufficient size.
Some do both: from the getcwd man page:
As an extension to the POSIX.1-2001 standard, Linux (libc4, libc5, glibc) getcwd() allocates the buffer dynamically
using malloc(3) if buf is NULL. In this case, the allocated buffer has the length size unless size is zero, when
buf is allocated as big as necessary. The caller should free(3) the returned buffer.
Some use an internal static buffer and are thus not reentrant / threadsafe (yuck - do not do this). See strerror and why strerror_r was invented.
Some only return pointers to constants (so reentrancy is fine), and no free is required.
Some (like libxml) require you to use a separate free function (xmlFree() in this case)
Some (like apr_palloc) rely on the pool technique above.
Many libraries force the user to deal with memory allocation. This is a good idea because every application has its own patterns of object lifetime and reuse. It's good for the library to make as few assumptions about its users as possible.
Say a user wants to call your library function like this:
for (a lot of iterations)
{
params = get_totally_different_params();
char *str = your_function(params);
do_something(str);
// now we're done with this str forever
}
If your libary mallocs the string every time, it is wasting a lot of effort calling malloc, and possibly showing poor cache behavior if malloc picks a different block each time.
Depending on the specifics of your library, you might do something like this:
int output_size(/*params*/);
void func(/*params*/, char *destination);
where destination is required to be at least output_size(params) size, or you could do something like the socket recv API:
int func(/*params*/, char *destination, int destination_size);
where the return value is:
< desination_size: this is the number of bytes we actually used
== destination_size: there may be more bytes waiting to output
These patterns both perform well when called repeatedly, because the caller can reuse the same block of memory over and over without any allocations at all.
There is no way to do this in C. You have to either pass a parameter with size information, so that malloc() and free() can be called in the called function, or the calling function has to call free after malloc().
Many object oriented languages (eg. C++) handle memory in such a way as to do what you want to, but not C.
Edit
By size information as an argument, I mean something to let the called function know the how many bytes of memory are owned by the pointer you are passing. This can be done by looking directly at the called string if it has already been assigned a value, such as:
char test1[]="this is a test";
char *test2="this is a test";
when called like this:
readString(test1); // (or test2)
char * readString(char *abc)
{
int len = strlen(abc);
return abc;
}
Both of those arguments will result in len = 14
However if you create a non populated variable, such as:
char *test3;
And allocate the same amount of memory, but do not populate it, for example:
test3 = malloc(strlen("this is a test") +1);
There is no way for the called function to know what memory has been allocated. The variable len will == 0 inside the 1st prototype of readString(). However, if you change the prototype readString() to:
readString(char *abc, int sizeString); Then size information as an argument can be used to create memory:
void readString(char *abc, size_t sizeString)
{
char *in;
in = malloc(sizeString +1);
//do something with it
//then free it
free(in);
}
example call:
int main()
{
int len;
char *test3;
len = strlen("this is a test") +1; //allow for '\0'
readString(test3, len);
// more code
return 0;
}
You cannot do this in C.
Return a pointer and it is up to the person calling the function to call free
Alternatively use C++. shared_ptr etc
You can wrap it in a opaque struct.
Give the user access to pointers to your struct but not its internal. Create a function to release resources.
void release_resources(struct opaque *ptr);
Of course the user needs to call the function.
You could keep track of the allocated strings and free them in an atexit routine (http://www.tutorialspoint.com/c_standard_library/c_function_atexit.htm). In the following, I have used a global variable but it could be a simple array or list if you have one handy.
#include <stdlib.h>
#include <string.h>
#include <malloc.h>
char* freeme = NULL;
void AStringRelease(void)
{
if (freeme != NULL)
free(freeme);
}
char* AStringGet(void)
{
freeme = malloc(20);
strcpy(result, "A String");
atexit(AStringRelease);
return freeme;
}
Why would you do this:
void f(Struct** struct)
{
...
}
If I wish to operate on a list of structs, is it not enough to pass in a Struct*? This way I can do struct++ to address the next struct or am I very confused here? :)
Wouldn't it only be useful if I want to rearrange the list of structs in some way? However if I'm just reading I don't see the point.
It depends on what your data structure looks like. Assuming that p is a null-terminated array of pointers to struct s, you can run through it using a loop like this:
void f(struct s **p)
{
while (*p != NULL) {
/* some stuff */
(*p)++;
}
}
Generally, use a pointer to a pointer is useful only if you attempt to modify the pointer itself.
If you want to modify the pointer in caller which was passed to this function, you'd typically do this.
Because, everything is passed by value in C, passing struct* will only pass the copy of the pointer and won't modify pointer in the caller. Why passing struct * is explained in this C-FAQ.
If you don't intend to modify the pointer in caller, it's not neccessary to pass struct **.
There are a number of uses for this kind of parameter...
One already mentioned, and quite common is to allow the caller to use the function to modify a pointer. The obvious case here would be when getting some blob of data...
void getData( void** pData, int* size )
{
*pData = getMyDataPointer();
*size = getMyDataSize();
}
Another option is that perhaps the extra level of indirection allows for the list to behave in some way? e.g. by using indices to refer to specific elements they can be allocated and reallocated without having the risk of dangling pointers.
Yet another option is that the list is very large and lives in fragmented memory, or is rapidly accessed so that the list is actually several smaller lists grouped together. This sort of technique can also be used to 'lazily' allocate huge arrays, e.g. providing an interface to an array of a billion elements, but then allocating chunks of 100k on demand as they are read/written with struct** pointing at the whole thing, and each struct* being either null or pointing to 100k structs...
To be honest the context is quite important... there are plenty of uses for also triple pointers as function parameters that follow similar reasoning. (e.g. combine the first thing i mention with the second, or the second with the third etc.)
You are correct, there is no reason to pass a pointer to pointer unless your function is intended to modify the pointer passed in. In case of accessing an array of structs, a single level of indirection is definitely sufficient.
The creator of the API probably thought that the argument list would be easier to memorize if the first argument of every function is the same.