This doesn't seem to have any problems running but I'm thinking when it returns, buff might automagically be freed which would in turn free *string and cause problems the next time I allocate and that memory gets overwritten (or worse, etc etc). I don't want to test by trial and error because I may simply have been lucky in my tests so far. Am I doing this wrong?
void strCat1000(char *concatDest, char *format, ...)
{
char buff[1000];
va_list arg_ptr;
va_start(arg_ptr, format);
vsnprintf(buff, sizeof(buff), format, arg_ptr);
va_end(arg_ptr);
free(concatDest);
concatDest=buff;
}
More concisely, is concatDest pointing to freed memory after this function returns?
concatDest=buff;
}
This last statement will not change the observable behavior of the program and is very likely to be just optimized out by any decent compiler.
Remember C passes arguments by value and you are not modifying the original pointer but a local copy in your function.
concatDest=buff;
Not a good idea because when the function returns, all its local variables are destroyed. So if you try to access data (outside this function) that was stored in buff whose address was assigned to concatDest, you will Segmentationf fault
You should be,
taking a length parameter
passing concatDest to vsnprintf
..but then you're just left with snprintf
buff is allocated on the stack, and assigning concatDest to its address won't work since the memory it points to goes away as soon as the stack is popped (when you return).
More concisely: Yes, the memory has been freed. It may not have been overwritten (so it might appear to work for a little bit) but it will be very soon.
buff is a local variable, so it is invalid (popped off the stack) when the function returns. But then so is concatDest, so there's no problem -- you just have a local variable that points at a local variable and both go away at the same time.
Now what you might be thinking of is if you call this function like:
strCat1000(string, "some format", some other args);
but in this case, string is not affected by changes to concatDest in the function -- arguments are passed by value in C. Of course, this means that your function has no effect
at all -- it prints into a temp buffer and then destroys that buffer, but it has no effect on
string or *string in the process.
Related
In this question I'm referring to another stackoverflow question because I don't have enough points to comment it.
See: Valgrind Reports Invalid Realloc
In short: Why must the instruction buffer = trim(buffer); be written outside the function? Why can't it be written inside the function as phrase = (char*)realloc(phrase, strlen(phrase)+1);?
In depth: Assuimng I pass any pointer to a function - such as const *str; - then by executing str++; I can create a side effect and change the starting position of the string.
But why can't I just reassign another value to the function through dynamic memory management functions such as malloc and realloc?
Why can't I just set str = (char*) realloc(str, strlen(str) + 1 * sizeof(char));?
What makes this side effect different from the other one?
Couldn't I hypothetically just move str where ever I want to?
C passes all arguments by value. Look at the parameters of a called function like local variables of this function, initialized by copies of the arguments given by the caller.
We can "emulate" pass by reference, if we pass a pointer. By dereferencing the pointer, we can access the "referenced" object living outside the called function. But this pointer is still passed by value, meaning it is a copy of the argument, initializing the parameter.
Note: The references of C++ and other languages are nothing else than pointers, under the hood. There are additional semantics, though. You might want to look at generated machine code.
So you can do anything you want with that pointer in the parameter, overwrite it, increment or decrement it, even NULL it. This has no effect of the source of the pointer in the caller.
The problem of the question you link can be boiled down to:
char* called(char* pointer)
{
return realloc(pointer, /* some irrelevant value */);
}
void caller(void)
{
char* buffer = malloc(/* some irrelevant value */);
/* ignore returned pointer */ called(buffer);
free(buffer); /* here Valgrind reports an error */
}
We need to differentiate multiple cases for realloc() here.
realloc() returns NULL, because there is not enough memory to satisfy the request. The former address is still valid.
realloc() returns the same address, because it can satisfy the request this way. Since returned and former address are equal, both are valid.
realloc() returns a new address. The former address is now invalid.
Of these, the third case is the common one, and it leads to the documented issue.
Because buffer in caller() is not changed by called(), simply because called() cannot access it, it still holds the former address. Now when free() is called with this invalid address, the error is detected.
To correct this error, caller() needs to use the returned value. The Right WayTM to do this is:
void caller(void)
{
char* buffer = malloc(/* some irrelevant value */);
char* new_buffer = called(buffer);
if (new_buffer != NULL) {
buffer = new_buffer;
} else {
/* handle the re-allocation error, the address in buffer is still valid */
}
free(buffer);
}
An alternative is to pass the pointer to buffer to called() and let it modify buffer correctly. But this type of redirection often generates worse readable code. However, for a convenience function you might decide to go this route.
I have a function like this:
void readString(char* str){
str="asd";
}
Can I know if str will be dealloced? Or must I free it?
Note: I can't use string library as I am programming a microprocessor.
free() must only be called if malloc(), calloc() or realloc() was used to allocate memory. This is not the case in the posted code so calling free() is unrequired.
The "asd" is a string literal and exists for the lifetime of the program (has static storage duration).
Your function does nothing.
It doesn't "read" a string. All it does it assign the address of a string literal (a constant block of memory somewhere that is initialized to the text of the string) to the function's local variable str. The function then exits, causing that local variable to stop existing.
Nothing is returned, and the pointer is not de-referenced (which would in turn be wrong since it's only a char *, not a char * *), so nothing happens outside the function. The caller doesn't "get" any value, and thus has nothing to call free() on, so that problem can never even occur.
String will not be deallocated because it is stored in static memory. You didn't allocate it, you don't free it
No, there is no memory leak. In your case it is statically allocated.
In general you have to make up your own rules about who can or must free memory, and you should document your code so it is clear what the requirements are.
In the example given, readString() only overwrites its own private copy of the pointer, and when it returns the caller will not see that anything has changed. Consequently the caller will have the same duty to free() its pointer as it had before it called readString(), and there will be no leak.
However, if readString() instead accepted a char **, so that it could modify the caller's copy of the pointer, then the outcome would be that it would not be legal to call free() after calling readString(), as the pointer's new value is not part of the malloc heap.
If the previous value of that pointer variable had been a malloc()ed object, then the caller should have freed it before allowing the pointer to be overwritten. It would be truly horrible to have readString() call free() in that case, because it would turn a variable which must eventually be freed into one which must never be freed, and the program flow would be very hard to follow.
This code is useless and meaningless as for as I am concerned. Here are different ways of calling your function definition and why I say this!
int main (int argc, char *argv[], char *envp[])
{
char a, *b, *c;
b = malloc (10);
readString(&a); // Case-1, Valid calling.
readString(b); // Case-2, Valid calling.
readString(c); // Case-3, Invalid calling. Unallocated location.
}
Case-1: This is the only case, where it matters to the caller about what you do in your function. You may use the passed character as you wish. The only meaningful assignment would be something like this. Doing 'str = "asd";' would probably dump the core or mess with the caller's stack or data segment memory(if address of a global variable was passed) and create a complicated debugging nightmare!
void readString(char* str){
*str='a';
}
Case-2: There is nothing Fatal or Syntax error in the code, but it is meaningless to do this. The only meaningful thing would be, just using what ever passed to your function from the caller. What is the reason for assigning like this on the passed parameter? Your definition can just have a local variable and avoid parameter passing completely. That function can be called as "readString();"...
void readString(void){
char *str='asd';
}
Throughout the programs I inherited from my predecessors, there are functions of the following format:
somefunc(some_type some_parameter, char ** msg)
In other words, the last parameter is a char **, which is used to return messages.
That is: somefunc() will "change" msg.
In some cases the changing in question is of the form:
sprintf(txt,"some text. Not fixed but with a format and variables etc");
LogWar("%s",txt); //call to some logging function that uses txt
*msg = strdup(txt);
I know that each call to strdup() should have a related call to free() to release the memory it allocated.
Since that memory is used to return something, it should obviously not be freed at the end of somefunc().
But then where?
If somefunc() is called multiple times with the same msg, then that pointer will move around, I presume. So the space allocated by the previous call will be lost, right?
Somewhere before the end of the program I should certainly free(*msg). (In this case *msg is the version that is used as parameter in the calls to somefunc().)
But I think that call would only release the last allocated memory, not the memory allocated in earlier calls to somefunc(), right?
So, am I correct in saying that somefunc() should look like this:
sprintf(txt,"some text. Not fixed like here, but actually with variables etc");
LogWar("%s",txt); //call to some logging function that uses txt
free(*msg); //free up the memory that was previously assigned to msg, since we will be re-allocating it immediatly hereafter
*msg = strdup(txt);
So with a free() before the strdup().
Am I correct?
Yes, you're correct. Any old pointer returned from strdup() must be free()d before you overwrite it, or you will leak memory.
I'm sure you where being simple for clarity, but I would of course vote for something like this:
const char * set_error(char **msg, const char *text)
{
free(*msg);
*msg = strdup(text);
}
and then:
LogWar("%s",txt); //call to some logging function that uses txt
set_error(msg, txt);
See how I used encapsulation to make this pretty important sequence more well-defined, and even named?
I'm writing some functions that manipulate strings in C and return extracts from the string.
What are your thoughts on good styles for returning values from the functions.
Referring to Steve McConnell's Code Complete (section 5.8 in 1993 edition) he suggests I use
the following format:
void my_function ( char *p_in_string, char *p_out_string, int *status )
The alternatives I'm considering are:
Return the result of the function (option 2) using:
char* my_function ( char *p_in_string, int *status )
Return the status of the function (option 3) using:
int my_function ( char *p_in_string, char *p_out_string )
In option 2 above I would be returning the address of a local variable from my_function but my calling function would be using the value immediately so I consider this to be OK and assume the memory location has not been reused (correct me on this if I'm wrong).
Is this down to personal style and preference or should I be considering other issues ?
Option 3 is pretty much the unspoken(?) industry standard. If a IO-based C function that returns an integer, returns a non-zero integer value, it almost always means that the IO operation failed. You might want to refer to this Wikibook's section on return values in C/C++.
The reason that people use 0 for success is because there is only one condition of success. Then if it returns non-zero, you look up somehow what the non-zero value means in terms of errors. Perhaps a 1 means it couldn't allocate memory, 2 means the argument was invalid, 3 means there was some kind of IO error, for instance. Technically, typically you wouldn't return 1, but you'd return XXX_ERR_COULD_NOT_MALLOC or something like that.
Also, never return addresses of local variables. Unless you personally malloced it, there are no guarantees about that variable's address after you return from the function. Read the link for more info.
In option 2 above I would be returning
the address of a local variable from
my_function but my calling function
would be using the value immediately
so I consider this to be OK and assume
the memory location has not been
reused (correct me on this if I'm
wrong).
I'm sorry but you're wrong, go with Steve McConnell's method, or the last method (by the way on the first method, "int status" should be "int* status".
You're forgiven for thinking you'd be right, and it could work for the first 99,999 times you run the program, but the 100,000th time is the kicker. In a multi-threaded or even on multi process architecture you can't rely that someone or something hasn't taken that segment of memory and used it before you get to it.
Better to be safe than sorry.
The second option is problematic because you have to get memory for the result string, so you either use a static buffer (which possibly causes several problems) or you allocate memory, which in turn can easily cause memory leaks since the calling function has the responsibility to free it after use, something that is easily forgotten.
There is also option 4,
char* my_function ( char *p_in_string, char* p_out_string )
which simply returns p_out_string for convenience.
a safer way would be:
int my_function(const char* p_in_string, char* p_out_string, unsigned int max_out_length);
the function would return status, so that it's check-able immediately like in
if( my_function(....) )
and the caller would allocate the memory for the output, because
the caller will have to free it and it's best done at the same level
the caller will know how it handles memory allocation in general, not the function
void my_function ( char *p_in_string, char *p_out_string, int *status )
char* my_function ( char *p_in_string, int *status )
int my_function ( char *p_in_string, char *p_out_string )
In all cases, the input string should be const, unless my_function is explicitly being given permission to write - for example - temporary terminating zero's or markers into the input string.
The second form is only valid if my_function calls "malloc" or some variant to allocate the buffer. Its not safe in any c/c++ implementation to return pointers to local / stack scoped variables. Of course, when my_function calls malloc itself, there is a question of how the allocated buffer is free'd.
In some cases, the caller is given the responsibility for releasing the buffer - by calling free(), or, to allow different layers to use different allocators, via a my_free_buffer(void*) that you publish. A further frequent pattern is to return a pointer to a static buffer maintained by my_function - with the proviso that the caller should not expect the buffer to remain valid after the next call to my_function.
In all the cases where a pointer to an output buffer is passed in, it should be paired with the size of the buffer.
The form I most prefer is
int my_function(char const* pInput, char* pOutput,int cchOutput);
This returns 0 on failure, or the number of characters copied into pOutput on success with cchOutput being the size of pOutput to prevent my_function overruning the pOutput buffer. If pOutput is NULL, then it returns the number of characters that pOutput needs to be exactly. Including the space for a null terminator of course.
// This is one easy way to call my_function if you know the output is <1024 characters
char szFixed[1024];
int cch1 = my_function(pInput,szFixed,sizeof(szFixed)/sizeof(char));
// Otherwise you can call it like this in two passes to find out how much to alloc
int cch2 = my_function(pInput,NULL,0);
char* pBuf = malloc(cch2);
my_function(pInput,pBuf,cch2);
2nd Style:
Don't assume that memory will not be used. There can be threads that may eat up that memory and you are left with nothing but never-ending garbage.
I prefer option 3. This is so I can do error checking for the function inline, i.e. in if statements. Also, it gives me the scope to add an additional parameter for string length, should that be needed.
int my_function(char *p_in_string, char **p_out_string, int *p_out_string_len)
Regarding your option 2:
If you return a pointer to a local variable, that has been allocated on the stack, the behavior is undefined.
If you return a pointer some piece of memory you allocated yourself (malloc, calloc, ...), this would be safe (but ugly, as you might forget free()).
I vote for option 3:
It allows you to manage memory outside of my_function(...) and you can also return some status code.
I would say option 3 is the best to avoid memory management issues. You can also do error checking using the status integer.
There's also a point to consider if your function is time critical. On most architecture, it's faster to use the return value, than to use the reference pointer.
I had the case when using the function return value I could avoid memory accesses in an inner loop, but using the parameter pointer, the value was always written out to memory (the compiler doesn't know if the value will be accessed via another pointer somewhere else).
With some compiler you can even apply attributes to the return value, that can't be expressed on pointers.
With a function like strlen, for instance, some compiler know that between to calls of strlen, if the pointer wasn't changed, that the same value will be returned and thus avoid to recall the function.
In Gnu-C you can give the attribute pure or even const to the return value (when appropriate), thing which is impossible with a reference parameter.
Say you have the following function:
char *getp()
{
char s[] = "hello";
return s;
}
Since the function is returning a pointer to a local variable in the function to be used outside, will it cause a memory leak?
P.S. I am still learning C so my question may be a bit naive...
[Update]
So, if say you want to return a new char[] array (ie maybe for a substring function), what do you return exactly? Should it be pointer to an external variable ? ie a char[] that is not local to the function?
It won't cause a memory leak. It'll cause a dangling reference. The local variable is allocated on the stack and will be freed as soon as it goes out of scope. As a result, when the function ends, the pointer you are returning no longer points to a memory you own. This is not a memory leak (memory leak is when you allocate some memory and don't free it).
[Update]:
To be able to return an array allocated in a function, you should allocate it outside stack (e.g. in the heap) like:
char *test() {
char* arr = malloc(100);
arr[0] = 'M';
return arr;
}
Now, if you don't free the memory in the calling function after you finished using it, you'll have a memory leak.
No, it wont leak, since its destroyed after getp() ends;
It will result in undefined behaviour, because now you have a pointer to a memory area that no longer holds what you think it does, and that can be reused by anyone.
A memory leak would happen if you stored that array on the heap, without executing a call to free().
char* getp(){
char* p = malloc(N);
//do stuff to p
return p;
}
int main(){
char* p = getp();
//free(p) No leak if this line is uncommented
return 0;
}
Here, p is not destroyed because its not in the stack, but in the heap. However, once the program ends, allocated memory has not been released, causing a memory leak ( even though its done once the process dies).
[UPDATE]
If you want to return a new c-string from a function, you have two options.
Store it in the heap (as the example
above or like this real example that returns a duplicated string);
Pass a buffer parameter
for example:
//doesnt exactly answer your update question, but probably a better idea.
size_t foo (const char* str, size_t strleng, char* newstr);
Here, you'd have to allocate memory somewhere for newstr (could be stack OR heap) before calling foo function. In this particular case, it would return the amount of characters in newstr.
It's not a memory leak because the memory is being release properly.
But it is a bug. You have a pointer to unallocated memory. It is called a dangling reference and is a common source of errors in C. The results are undefined. You wont see any problems until run-time when you try to use that pointer.
Auto variables are destroyed at the end of the function call; you can't return a pointer to them. What you're doing could be described as "returning a pointer to the block of memory that used to hold s, but now is unused (but might still have something in it, at least for now) and that will rapidly be filled with something else entirely."
It will not cause memory leak, but it will cause undefined behavior. This case is particularly dangerous because the pointer will point somewhere in the program's stack, and if you use it, you will be accessing random data. Such pointer, when written through, can also be used to compromise program security and make it execute arbitrary code.
No-one else has yet mentioned another way that you can make this construct valid: tell the compiler that you want the array "s" to have "static storage duration" (this means it lives for the life of the program, like a global variable). You do this with the keyword "static":
char *getp()
{
static char s[] = "hello";
return s;
}
Now, the downside of this is that there is now only one instance of s, shared between every invocation of the getp() function. With the function as you've written it, that won't matter. In more complicated cases, it might not do what you want.
PS: The usual kind of local variables have what's called "automatic storage duration", which means that a new instance of the variable is brought into existence when the function is called, and disappears when the function returns. There's a corresponding keyword "auto", but it's implied anyway if you don't use "static", so you almost never see it in real world code.
I've deleted my earlier answer after putting the code in a debugger and watching the disassembly and the memory window.
The code in the question is invalid and returns a reference to stack memory, which will be overwritten.
This slightly different version, however, returns a reference to fixed memory, and works fine:
char *getp()
{
char* s = "hello";
return s;
}
s is a stack variable - it's automatically de-referenced at the end of the function. However, your pointer won't be valid and will refer to an area of memory that could be overwritten at any point.