Return a string allocated with malloc? - c

I'm creating a function that returns a string. The size of the string is known at runtime, so I'm planning to use malloc(), but I don't want to give the user the responsibility for calling free() after using my function's return value.
How can this be achieved? How do other functions that return strings (char *) work (such as getcwd(), _getcwd(), GetLastError(), SDL_GetError())?

Your challenge is that something needs to release the resources (i.e. cause the free() to happen).
Normally, the caller frees the allocated memory either by calling free() directly (see how strdup users work for instance), or by calling a function you provide the wraps free. You might, for instance, require callers to call a foo_destroy function. As another poster points out you might choose to wrap that in an opaque struct, though that's not necessary as having your own allocation and destroy functions is useful even without that (e.g. for resource tracking).
However, another way would be to use some form of clean-up function. For instance, when the string is allocated, you could attach it to a list of resources allocated in a pool, then simply free the pool when done. This is how apache2 works with its apr_pool structure. In general, you don't free() anything specifically under that model. See here and (easier to read) here.
What you can't do in C (as there is no reference counting of malloc()d structures) is directly determine when the last 'reference' to an object goes out of scope and free it then. That's because you don't have references, you have pointers.
Lastly, you asked how existing functions return char * variables:
Some (like strdup, get_current_dir_name and getcwd under some circumstances) expect the caller to free.
Some (like strerror_r and getcwd in under other circumstances) expect the caller to pass in a buffer of sufficient size.
Some do both: from the getcwd man page:
As an extension to the POSIX.1-2001 standard, Linux (libc4, libc5, glibc) getcwd() allocates the buffer dynamically
using malloc(3) if buf is NULL. In this case, the allocated buffer has the length size unless size is zero, when
buf is allocated as big as necessary. The caller should free(3) the returned buffer.
Some use an internal static buffer and are thus not reentrant / threadsafe (yuck - do not do this). See strerror and why strerror_r was invented.
Some only return pointers to constants (so reentrancy is fine), and no free is required.
Some (like libxml) require you to use a separate free function (xmlFree() in this case)
Some (like apr_palloc) rely on the pool technique above.

Many libraries force the user to deal with memory allocation. This is a good idea because every application has its own patterns of object lifetime and reuse. It's good for the library to make as few assumptions about its users as possible.
Say a user wants to call your library function like this:
for (a lot of iterations)
{
params = get_totally_different_params();
char *str = your_function(params);
do_something(str);
// now we're done with this str forever
}
If your libary mallocs the string every time, it is wasting a lot of effort calling malloc, and possibly showing poor cache behavior if malloc picks a different block each time.
Depending on the specifics of your library, you might do something like this:
int output_size(/*params*/);
void func(/*params*/, char *destination);
where destination is required to be at least output_size(params) size, or you could do something like the socket recv API:
int func(/*params*/, char *destination, int destination_size);
where the return value is:
< desination_size: this is the number of bytes we actually used
== destination_size: there may be more bytes waiting to output
These patterns both perform well when called repeatedly, because the caller can reuse the same block of memory over and over without any allocations at all.

There is no way to do this in C. You have to either pass a parameter with size information, so that malloc() and free() can be called in the called function, or the calling function has to call free after malloc().
Many object oriented languages (eg. C++) handle memory in such a way as to do what you want to, but not C.
Edit
By size information as an argument, I mean something to let the called function know the how many bytes of memory are owned by the pointer you are passing. This can be done by looking directly at the called string if it has already been assigned a value, such as:
char test1[]="this is a test";
char *test2="this is a test";
when called like this:
readString(test1); // (or test2)
char * readString(char *abc)
{
int len = strlen(abc);
return abc;
}
Both of those arguments will result in len = 14
However if you create a non populated variable, such as:
char *test3;
And allocate the same amount of memory, but do not populate it, for example:
test3 = malloc(strlen("this is a test") +1);
There is no way for the called function to know what memory has been allocated. The variable len will == 0 inside the 1st prototype of readString(). However, if you change the prototype readString() to:
readString(char *abc, int sizeString); Then size information as an argument can be used to create memory:
void readString(char *abc, size_t sizeString)
{
char *in;
in = malloc(sizeString +1);
//do something with it
//then free it
free(in);
}
example call:
int main()
{
int len;
char *test3;
len = strlen("this is a test") +1; //allow for '\0'
readString(test3, len);
// more code
return 0;
}

You cannot do this in C.
Return a pointer and it is up to the person calling the function to call free
Alternatively use C++. shared_ptr etc

You can wrap it in a opaque struct.
Give the user access to pointers to your struct but not its internal. Create a function to release resources.
void release_resources(struct opaque *ptr);
Of course the user needs to call the function.

You could keep track of the allocated strings and free them in an atexit routine (http://www.tutorialspoint.com/c_standard_library/c_function_atexit.htm). In the following, I have used a global variable but it could be a simple array or list if you have one handy.
#include <stdlib.h>
#include <string.h>
#include <malloc.h>
char* freeme = NULL;
void AStringRelease(void)
{
if (freeme != NULL)
free(freeme);
}
char* AStringGet(void)
{
freeme = malloc(20);
strcpy(result, "A String");
atexit(AStringRelease);
return freeme;
}

Related

Function that returns an array of strings C

Is there a way to return an array of strings from a function without using dynamic memory allocation? The function goes something like this:
char** modify(char original[1000][1000]){
char result[1000][1000];
// some operations are applied to the original
// the original is copied to the result
return result;
}
In C, an object has one of four storage durations (also called lifetimes): static, thread, automatic, and allocated (C 2018 6.2.4 1).
Objects with automatic duration are automatically created inside a function and cease to exist when execution of the function ends, so you cannot use these that is created inside your function to return a value.
Objects with allocated storage duration persist until freed, but you have asked to exclude those.
Thread storage duration is either likely not applicable to your situation or is effectively equivalent to static storage duration, which I will discuss below.
This means your options are:
Let the caller pass you an object in which to return data. That object may have any storage duration—your function does not need to know since it will neither allocate nor release it. If you do this, the caller must provide an object large enough to return the data. If this size is not known in advance, you can either provide a separate function to calculate it (which the caller will then use to allocate the necessary space) or incorporate that into your function as a special mode in which it provides the size required without providing the data yet.
Use an object with static storage duration. Since this object is created when the program starts, you cannot adjust the size within your function. You must build a size limit into the program. A considerable problem with this approach is the function has only one object to return, so only one can be in use at a time. This means that, once the function is called, it should not be called again until the caller has finished using the data in the object. This is both a severe limitation in program design and an opportunity for bugs, so it is rarely used.
Thus, a typical solution looks like this:
size_t HowMuchSpaceIsNeeded(char original[1000][1000])
{
… Calculate size.
return SizeNeeded;
}
void modify(char destination[][1000], char original[1000][1000])
{
… Put results in destination.
}
A variation for safety is:
void modify(char destination[][1000], size_t size, char original[1000][1000])
{
if (size < amount needed)
… Report error (possibly by return value, or program abort).
… Put results in destination.
}
Then the caller does something like:
size_t size = HowMuchSpaceIsNeeded(original);
char (*results)[1000] = malloc(size);
if (!results)
… Report error.
modify(results, size, original)
… Work with results.
free(results);
As Davistor notes, a function can return an array embedded in a structure. In terms of C semantics, this avoids the object lifetime problem by returning a value, not an object. (The entire contents of the structure is the value of the structure.) In terms of actual hardware implementation, it is largely equivalent to the caller-passes-an-object method above. (The reasoning here is based on the logic of how computers work, not on the C specification: In order for a function to return a value that requires a lot of space to represent, the caller must provide the required space to the called function.) Generally, the caller will allocate space on the stack and provide that to the called function. This may be faster than a malloc, but it may also use a considerable amount of stack space. Usually, we avoid using sizable amounts of stack space, to avoid overflowing the stack.
Although you cannot return an array type in C, you can return a struct containing one:
#include <string.h>
#define NSTRINGS 100
#define STR_LEN 100
typedef struct stringtable {
char table[NSTRINGS][STR_LEN];
} stringtable;
stringtable modify ( const stringtable* const input )
{
stringtable result;
memcpy( &result, input, sizeof(result) );
return result;
}
I would generally recommend that you use Eric Postpischil’s solution, however. One way this might not be efficient is if you need to write to a specific variable or location. In that case, you could pass in its address, but here, you would need to create a large temporary array and copy it.
You cannot return a pointer to memory allocated inside a function without dynamic allocation. In your case, you will allocate result[1000][1000] on the stack in a zone which will be deallocated once the function returns. Besides dynamic allocation, you have the option of passing a buffer as an argument to your function:
void modify(char original[1000][1000], char result[][]) { ... }
Now the result matrix has to be allocated outside the modify function and its lifetime will not depend on the function's lifetime. Basically you pass the function an already allocated matrix where the result will be written.
You can't return pointers to the local variables, because lifetime of the memory to which they point is limited to the scope.
Basically result is a pointer to the stack-allocated array first element, so returning it and dereferencing it later will result in undefined behavior.
To bypass this issue, there are few work-arounds.
One of those, I saw in couple of projects, but I don't recommend it, because it is unsafe.
char** modify(char original[1000][1000]){
// `result` is static array, which lifetime is equal to the lifetime of the program
// Calling modify more than one time will result in overwriting of the `result`.
static char result[1000][1000];
return result;
}
Another approach will be to receive result pointer as function argument, so the caller will allocate storage for it.
void modify(char original[1000][1000], char (*result)[1000]){
result[0][1] = 42;
//...
}
void main() {
char result[1000][1000];
modify(someOriginal, result);
}
Anyway, I recommend you to read some decent book about C language and how a computer memory works.
You can use a linked list starts with the first string and ends with the last string .

return substring of string

I have a large string, where I want to use pieces of it but I don't want to necessarily copy them, so I figured I can make a structure that marks the beginning and length of the useful chunk from the big string, and then create a function that reads it.
struct descriptor {
int start;
int length;
};
So far so good, but when I got to writing the function I realized that I can't really return the chunk without copying into memory...
char* getSegment(char* string, struct descriptor d) {
char* chunk = malloc(d.length + 1);
strncpy(chunk, string + d.start, d.length);
chunk[d.length] = '\0';
return chunk;
}
So the questions I have are:
Is there any way that I can return the piece of string without copying it
If not, how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Answering your two questions:
No
The caller should provide buffer for the copied string
I would personally pass the pointer to the descrpiptor
char* getSegment(const char* string, const char *buff, struct descriptor *d)
Is there any way that I can return the piece of string without copying it
A string includes the terminating null character, so unless the part code wants is the tail, a pointer to a "piece of string" and still be a string, is not possible.
how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Create temporary space with a variable length array (since C99 and optional supported in C11). Good until the end of the block. At which point, the memory is released and should not be further used.
char* getSegment(char* string, struct descriptor d, char *dest) {
// form result in `dest`
return dest;
}
Usage
char *t;
{
struct descriptor des = bar();
char *large_string = foo();
char sub[des.length + 1u]; //VLA
t = getSegment(large_string, des, sub);
puts(t); // use sub or t;
}
// do not use `t` here, invalid pointer.
Recall size is of concern. If code is returning large sub-strings, best to malloc() a buffer and oblige the calling code to free it when done.
Is there any way that I can return the piece of string without copying it
You're right that if you want to use the chunks in conjunction with any of the many C functions that expect to work with null-terminated character arrays, then you have to make copies. Otherwise, adding the terminators modifies the original string.
If you're prepared to handle the chunks as fixed-length, unterminated arrays, however, then you can represent them without copying as a combination of a pointer to the first character and a length. Some standard library functions work with user-specified string lengths, thus supporting operations on such segments without null termination. You would need to be very careful with them, however.
If you take that approach, I would recommend colocating the pointer and length in a structure. For example,
struct string_segment {
char *start;
size_t length;
};
You could declare variables of this type, pass and return objects of this type, and create compound literals of this type without any dynamic memory allocation, thus avoiding opening any avenue for memory leakage.
If not, how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Returning dynamically-allocated objects does not automatically create a memory leak -- it merely confers a responsibility on the caller to free the allocated memory. It is when the caller fails to either satisfy that responsibility or pass it on to other code that a memory leak occurs. Several standard library functions indeed do return dynamically-allocated objects, and it's not so unusual in third-party libraries. The canonical example (other than malloc() itself) would probably be the POSIX-standard strdup() function.
If your function returns a pointer to a dynamically-allocated object -- whether a copied string, or a chunk definition structure -- then it should document the responsibility to free that falls on callers. You must ensure that you satisfy your obligation when you call it from your own code, but having clearly documented the function's behavior, you cannot take responsibility for errors other callers may make by failing to fulfill their obligations.

Weird situation when returning char *

I am pretty new to C programming and I have several functions returning type char *
Say I declare char a[some_int];, and I fill it later on. When I attempt to return it at the end of the function, it will only return the char at the first index. One thing I noticed, however, is that it will return the entirety of a if I call any sort of function on it prior to returning it. For example, my function to check the size of a string (calling something along the lines of strLength(a);).
I'm very curious what the situation is with this exactly. Again, I'm new to C programming (as you probably can tell).
EDIT: Additionally, if you have any advice concerning the best method of returning this, please let me know. Thanks!
EDIT 2: For example:
I have char ret[my_strlen(a) + my_strlen(b)]; in which a and b are strings and my_strlen returns their length.
Then I loop through filling ret using ret[i] = a[i]; and incrementing.
When I call my function that prints an input string (as a test), it prints out how I want it, but when I do
return ret;
or even
char *ptr = ret;
return ptr;
it never supplies me with the full string, just the first char.
A way not working to return a chunk of char data is to return it in memory temporaryly allocated on the stack during the execution of your function and (most probably) already used for another purpose after it returned.
A working alternative would be to allocate the chunk of memory ont the heap. Make sure you read up about and understand the difference between stack and heap memory! The malloc() family of functions is your friend if you choose to return your data in a chunk of memory allocated on the heap (see man malloc).
char* a = (char*) malloc(some_int * sizeof(char)) should help in your case. Make sure you don't forget to free up memory once you don't need it any more.
char* ret = (char*) malloc((my_strlen(a) + my_strlen(b)) * sizeof(char)) for the second example given. Again don't forget to free once the memory isn't used any more.
As MByD correctly pointed out, it is not forbidden in general to use memory allocated on the stack to pass chunks of data in and out of functions. As long as the chunk is not allocated on the stack of the function returning this is also quite well.
In the scenario below function b will work on a chunk of memory allocated on the stackframe created, when function a entered and living until a returns. So everything will be pretty fine even though no memory allocated on the heap is involved.
void b(char input[]){
/* do something useful here */
}
void a(){
char buf[BUFFER_SIZE];
b(buf)
/* use data filled in by b here */
}
As still another option you may choose to leave memory allocation on the heap to the compiler, using a global variable. I'd count at least this option to the last resort category, as not handled properly, global variables are the main culprits in raising problems with reentrancy and multithreaded applications.
Happy hacking and good luck on your learning C mission.

Return a pointer to a dynamically-allocated struct vs. requiring allocated memory from the calling function?

In C, it is possible for functions to return pointers to memory that that function dynamically-allocated and require the calling code to free it. It's also common to require that the calling code supplies a buffer to a second function, which then sets the contents of that buffer. For example:
struct mystruct {
int a;
char *b;
};
struct mystruct *get_a_struct(int a, char*b)
{
struct mystruct *m = malloc(sizeof(struct mystruct));
m->a = a;
m->b = b;
return m;
}
int init_a_struct(int a, char*b, struct mystruct *s)
{
int success = 0;
if (a < 10) {
success = 1;
s->a = a;
s->b = b;
}
return success;
}
Is one or the other method preferable? I can think of arguments for both: for the get_a_struct method the calling code is simplified because it only needs to free() the returned struct; for the init_a_struct method there is a very low likelihood that the calling code will fail to free() dynamically-allocated memory since the calling code itself probably allocated it.
It depends on the specific situation but in general supplying the allocated buffer seems to be preferable.
As mentioned by Jim, DLLs can cause problems if called function allocates memory. That would be the case if you decide to distribute the code as a Dll and get_a_struct is exported to/is visible by the users of the DLL. Then the users have to figure out, hopefully from documentation, if they should free the memory using free, delete or other OS specific function. Furthermore, even if they use the correct function to free the memory they might be using a different version of the C/C++ runtime. This can lead to bugs that are rather hard to find. Check this Raymond Chen post or search for "memory allocation dll boundaries". The typical solution is export from the DLL your own free function. So you will have the pair: get_a_struct/release_a_struct.
In the other hand, sometimes only the called function knows the amount of memory that needs to be allocated. In this case it makes more sense for the called function to do the allocation. If that is not possible, say because of the DLL boundary issue, a typical albeit ugly solution is to provide a mechanism to find this information. For example in Windows the GetCurrentDirectory function will return the required buffer size if you pass 0 and NULL as its parameters.
I think that providing the already allocated struct as an argument is preferable, because in most cases you wouldn't need to call malloc/calloc in the calling code, and therefore worrying about free'ing it. Example:
int init_struct(struct some_struct *ss, args...)
{
// init ss
}
int main()
{
struct some_struct foo;
init_struct(&foo, some_args...);
// free is not needed
}
The "pass an pointer in is preferred", unless it's absolutely required that every object is a "new object allocated from the heap" for some logistical reason - e.g. it's going to be put into a linked list as a node, and the linked-list handler will eventually destroy the elements by calling free - or some other situation where "all things created from here will go to free later on).
Note that "not calling malloc" is always the preferred solution if possible. Not only does calling malloc take some time, it also means that some place, you will have to call free on the allocated memory, and every allocated object takes several bytes (typically 12-40 bytes) of "overhead" - so allocating space for small objects is definitely wasteful.
I agree with other answers that passing the allocated struct is preferred, but there is one situation where returning a pointer may be preferred.
In case you need to explicitly free some resource at the end (close a file or socket, or free some memory internal to the struct, or join a thread, or something else that would require a destructor in C++), I think it may be better to allocate internally, then returning the pointer.
I think it so because, in C, it means some kind of a contract: if you allocate your own struct, you shouldn't have to do anything to destroy it, and it be automatically cleared at the end of the function. On the other hand, if you received some dynamically allocated pointer, you feel compelled to call something to destroy it at the end, and this destroy_a_struct function is where you will put the other cleanup tasks needed, alongside free.

C when to allocate and free memory - before function call, after function call...etc

I am working with my first straight C project, and it has been a while since I worked on C++ for that matter. So the whole memory management is a bit fuzzy.
I have a function that I created that will validate some input. In the simple sample below, it just ignores spaces:
int validate_input(const char *input_line, char** out_value){
int ret_val = 0; /*false*/
int length = strlen(input_line);
out_value =(char*) malloc(sizeof(char) * length + 1);
if (0 != length){
int number_found = 0;
for (int x = 0; x < length; x++){
if (input_line[x] != ' '){ /*ignore space*/
/*get the character*/
out_value[number_found] = input_line[x];
number_found++; /*increment counter*/
}
}
out_value[number_found + 1] = '\0';
ret_val = 1;
}
return ret_val;
}
Instead of allocating memory inside the function for out_value, should I do it before I call the function and always expect the caller to allocate memory before passing into the function? As a rule of thumb, should any memory allocated inside of a function be always freed before the function returns?
I follow two very simple rules which make my life easier.
1/ Allocate memory when you need it, as soon as you know what you need. This will allow you to capture out-of-memory errors before doing too much work.
2/ Every allocated block of memory has a responsibility property. It should be clear when responsibility passes through function interfaces, at which point responsibility for freeing that memory passes with the memory. This will guarantee that someone has a clearly specified requirement to free that memory.
In your particular case, you need to pass in a double char pointer if you want the value given back to the caller:
int validate_input (const char *input_line, char **out_value_ptr) {
: :
*out_value_ptr =(char*) malloc(length + 1); // sizeof(char) is always 1
: :
(*out_value_ptr)[number_found] = input_line[x];
: :
As long as you clearly state what's expected by the function, you could either allocate the memory in the caller or the function itself. I would prefer outside of the function since you know the size required.
But keep in mind you can allow for both options. In other words, if the function is passed a char** that points to NULL, have it allocate the memory. Otherwise it can assume the caller has done so:
if (*out_value_ptr == NULL)
*out_value_ptr =(char*) malloc(length + 1);
You should free that memory before the function returns in your above example. As a rule of thumb you free/delete allocated memory before the scope that the variable was defined in ends. In your case the scope is your function so you need to free it before your function ends. Failure to do this will result in leaked memory.
As for your other question I think it should be allocated going in to the function since we want to be able to use it outside of the function. You allocate some memory, you call your function, and then you free your memory. If you try and mix it up where allocation is done in the function, and freeing is done outside it gets confusing.
The idea of whether the function/module/object that allocates memory should free it is somewhat of a design decision. In your example, I (personal opinion here) think it is valid for the function to allocate it and leave it up to the caller to free. It makes it more usable.
If you do this, you need to declare the output parameter differently (either as a reference in C++ style or as char** in C style. As defined, the pointer will exist only locally and will be leaked.
A typical practice is to allocate memory outside for out_value and pass in the length of the block in octets to the function with the pointer. This allows the user to decide how they want to allocate that memory.
One example of this pattern is the recv function used in sockets:
ssize_t recv(int socket, void *buffer, size_t length, int flags);
Here are some guidelines for allocating memory:
Allocate only if necessary.
Huge objects should be dynamically
allocated. Most implementations
don't have enough local storage
(stack, global / program memory).
Set up ownership rules for the
allocated object. Owner should be
responsible for deleting.
Guidelines for deallocating memory:
Delete if allocated, don't delete
objects or variables that were not
dynamically allocated.
Delete when not in use any more.
See your object ownership rules.
Delete before program exits.
In this example you should be neither freeing or allocating memory for out_value. It is typed as a char*. Hence you cannot "return" the new memory to the caller of the function. In order to do that you need to take in a char**
In this particular scenario the buffer length is unknown before the caller makes the call. Additionally making the same call twice will produce different values since you are processing user input. So you can't take the approach of call once get the length and call the second time with the allocated buffer. Hence the best approach is for the function to allocate the memory and pass the responsibility of freeing onto the caller.
First, this code example you give is not ANSI C. It looks more like C++. There is not "<<" operator in C that works as an output stream to something called "cout."
The next issue is that if you do not free() within this function, you will leak memory. You passed in a char * but once you assign that value to the return value of malloc() (avoid casting the return value of malloc() in the C programming language) the variable no longer points to whatever memory address you passed in to the function. If you want to achieve that functionality, pass a pointer to a char pointer char **, you can think of this as passing the pointer by reference in C++ (if you want to use that sort of language in C, which I wouldn't).
Next, as to whether you should allocate/free before or after a function call depends on the role of the function. You might have a function whose job it is to allocate and initialize some data and then return it to the caller, in which case it should malloc() and the caller should free(). However, if you are just doing some processing with a couple of buffers like, you may tend to prefer the caller to allocate and deallocate. But for your case, since your "validate_input" function looks to be doing nothing more than copying a string without the space, you could just malloc() in the function and leave it to the caller. Although, since in this function, you simply allocate the same size as the whole input string, it almost seems as if you might as well have the caller to all of it. It all really depends on your usage.
Just make sure you do not lose pointers as you are doing in this example
Some rough guidelines to consider:
Prefer letting the caller allocate the memory. This lets it control how/where that memory is allocated. Calling malloc() directly in your code means your function is dictating a memory policy.
If there's no way to tell how much memory may be needed in advance, your function may need to handle the allocation.
In cases where your function does need to allocate, consider letting the caller pass in an allocator callback that it uses instead of calling malloc directly. This lets your function allocate when it needs and as much as it needs, but lets the caller control how and where that memory is allocated.

Resources