Memory leak with strings? - c

I'm new to C, so this may be obvious, but I'm still not sure. Java took care of this for me ^^
I have a table of replacements, input string, and a function str_replace which does some work on the string. str_replace internally calls malloc to get space for the new string (it returns a newly allocated char*.
char* color_tags(char* s) {
char* out = s;
// in real, the table is much longer
static char* table[4][2] = {
{"<b>", BOLD},
{"<u>", UNDERLINE},
{"</b>", BOLD_R},
{"</u>", UNDERLINE_R},
};
for(int r=0; r<4; r++) {
// here's what bothers me
out = str_replace(table[r][0], table[r][1], out);
}
return out;
}
As you can see, the char* out is replaced by pointer to the new string, so the old string apparently ends up as a memory leak - if I don't understand it totally wrong.
What would be a better way for this?

[This is more of a comment than an answer — abacabadabacaba has already posted the answer — but I hope it will clarify things a bit.]
I would argue that the memory leak is in this statement:
str_replace internally calls malloc to get space for the new string […]
[emphasis mine] Memory management is such a fundamental concern in C that if a function allocates memory that it itself doesn't de-allocate, then that is a major property of the function, and one that needs to be documented up-front, together with information about what the caller is supposed to do about it. It should not be considered "internal" to the function, and you shouldn't have the read the entirety of the function's source-code in order to determine it. It's enough to make me suspicious of the rest of the function (and indeed, a quick glance at that function is enough to notice a lot of issues: its parameter-types should be const char * rather than char *; it should check the return-value of malloc; it could be made more efficient by keeping track of the tail of new_subject, or cleaner by using strcat, instead of the current worst-of-both-worlds; etc.).
You didn't write str_replace originally, but you can modify your own version, so you should change its documentation from this:
Search and replace a string with another string , in a string
to something like this:
Creates and returns a copy of subject, but with all occurrences of the substring search replaced by replace. The returned string is newly allocated using malloc; the caller should use free.
(Your color_tags function will need similar documentation, since it too returns a newly-allocated string using malloc.)
That documentation in hand, there's a clear chain of "ownership": the caller of str_replace takes ownership of the string it returns. So color_tags has to call free for every string returned by str_replace, except the string that color_tags itself will return (which in turn will be "owned" by the caller of color_tags). Hence abacabadabacaba's answer.

The code leaks 3 strings in total: one after each of the iterations except the last one. The solution is to deallocate each of those strings after its use. The code may look like this:
for(int r=0; r<4; r++) {
char* new_out = str_replace(table[r][0], table[r][1], out);
if (r>0) {
// out is an intermediate value which will never be used again, free it
free(out);
}
out = new_out;
}

Related

A C function that returns a char array Vs a function working with 2 char arrays

I'm a C beginner so my apologies if this doubt is too obvious.
What would be considered the most efficient way to solve this problem: Imagine that you have a char array ORIG and after working with it, you should have a new char array DEST. So, if I wanted to create a function for this goal, what would the best approach be:
A function that takes only one char array parameter ( argument ORIG ) and returning a char array DEST or
A void function that takes two char array arguments and does its job changing DEST as wished?
Thanks!
This very much depends on the nature of your function.
In your first case, the function has to allocate storage for the result (or return a pointer to some static object, but this wouldn't be thread-safe). This can be the right thing to do, e.g. for a function that duplicates a string, like POSIX' strdup(). This also means the caller must use free() on the result when it is no longer needed.
The second case requires the caller to provide the storage. This is often the idiomatic way to do these things in C, because in this case, the caller could just write
char result[256];
func(result, "bla bla");
and therefore use an automatic object to hold the result. It also has the benefit that you can use the actual return value to signal errors.
Both are ways of valid ways of doing it, but I'd suggest using the latter, since it means you can load the contents into any block of memory, while a returned array will have to be on heap, and be freed by design.
Again, both are valid ways of doing things, and this is just a guideline. What should be done usually depends on the situation.
It depends,
If you know that the length of DEST will be the same as the lenght of ORIG i would go for the 2nd approach because then you wont have to dynamiclly allocate memory for dest inside the function (and remember to free it outside the function).
If the length is different you have to dynamiclly allocate memory and you can do so in two ways:
1. Like your first approach - for returning array from a function in c you have to allocate a new array and return it's address(pointer)
2. The function can recieve two argument one is ORIG and second is a double pointer to RES , because the function recieves a double pointer it can allocate an array inside and return it via the argument.
1- is more "cleaner" way in terms of code ,and easier to use in terms of user expirience(the user is the caller)
Good luck!
In option 1 you will have to dynamically allocate (malloc) the output array. Which means you have a potential for a memory leak. In option 2 the output array is provided for you, so there is no chance of a leak, but there is a chance that the output array is not of sufficient size and you will get a buffer overrun when writing to it.
Both methods are acceptable, there might be a small performance difference in one compared to the other, but its really down to your choice.
Personally, being a cautios programmer, I would go for option 3:
/* Returns 0 on success, 1 on failure
Requires : inputSize <= outpuSize
input != output
input != null
output != null
*/
int DoStuff (char* output, size_t outputSize, char* input, size_t inputSize);
(Sorry if that's not proper C, its been decades:) )
(Edited in accordance with Felix Palmen's points.)

return substring of string

I have a large string, where I want to use pieces of it but I don't want to necessarily copy them, so I figured I can make a structure that marks the beginning and length of the useful chunk from the big string, and then create a function that reads it.
struct descriptor {
int start;
int length;
};
So far so good, but when I got to writing the function I realized that I can't really return the chunk without copying into memory...
char* getSegment(char* string, struct descriptor d) {
char* chunk = malloc(d.length + 1);
strncpy(chunk, string + d.start, d.length);
chunk[d.length] = '\0';
return chunk;
}
So the questions I have are:
Is there any way that I can return the piece of string without copying it
If not, how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Answering your two questions:
No
The caller should provide buffer for the copied string
I would personally pass the pointer to the descrpiptor
char* getSegment(const char* string, const char *buff, struct descriptor *d)
Is there any way that I can return the piece of string without copying it
A string includes the terminating null character, so unless the part code wants is the tail, a pointer to a "piece of string" and still be a string, is not possible.
how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Create temporary space with a variable length array (since C99 and optional supported in C11). Good until the end of the block. At which point, the memory is released and should not be further used.
char* getSegment(char* string, struct descriptor d, char *dest) {
// form result in `dest`
return dest;
}
Usage
char *t;
{
struct descriptor des = bar();
char *large_string = foo();
char sub[des.length + 1u]; //VLA
t = getSegment(large_string, des, sub);
puts(t); // use sub or t;
}
// do not use `t` here, invalid pointer.
Recall size is of concern. If code is returning large sub-strings, best to malloc() a buffer and oblige the calling code to free it when done.
Is there any way that I can return the piece of string without copying it
You're right that if you want to use the chunks in conjunction with any of the many C functions that expect to work with null-terminated character arrays, then you have to make copies. Otherwise, adding the terminators modifies the original string.
If you're prepared to handle the chunks as fixed-length, unterminated arrays, however, then you can represent them without copying as a combination of a pointer to the first character and a length. Some standard library functions work with user-specified string lengths, thus supporting operations on such segments without null termination. You would need to be very careful with them, however.
If you take that approach, I would recommend colocating the pointer and length in a structure. For example,
struct string_segment {
char *start;
size_t length;
};
You could declare variables of this type, pass and return objects of this type, and create compound literals of this type without any dynamic memory allocation, thus avoiding opening any avenue for memory leakage.
If not, how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Returning dynamically-allocated objects does not automatically create a memory leak -- it merely confers a responsibility on the caller to free the allocated memory. It is when the caller fails to either satisfy that responsibility or pass it on to other code that a memory leak occurs. Several standard library functions indeed do return dynamically-allocated objects, and it's not so unusual in third-party libraries. The canonical example (other than malloc() itself) would probably be the POSIX-standard strdup() function.
If your function returns a pointer to a dynamically-allocated object -- whether a copied string, or a chunk definition structure -- then it should document the responsibility to free that falls on callers. You must ensure that you satisfy your obligation when you call it from your own code, but having clearly documented the function's behavior, you cannot take responsibility for errors other callers may make by failing to fulfill their obligations.

Weird situation when returning char *

I am pretty new to C programming and I have several functions returning type char *
Say I declare char a[some_int];, and I fill it later on. When I attempt to return it at the end of the function, it will only return the char at the first index. One thing I noticed, however, is that it will return the entirety of a if I call any sort of function on it prior to returning it. For example, my function to check the size of a string (calling something along the lines of strLength(a);).
I'm very curious what the situation is with this exactly. Again, I'm new to C programming (as you probably can tell).
EDIT: Additionally, if you have any advice concerning the best method of returning this, please let me know. Thanks!
EDIT 2: For example:
I have char ret[my_strlen(a) + my_strlen(b)]; in which a and b are strings and my_strlen returns their length.
Then I loop through filling ret using ret[i] = a[i]; and incrementing.
When I call my function that prints an input string (as a test), it prints out how I want it, but when I do
return ret;
or even
char *ptr = ret;
return ptr;
it never supplies me with the full string, just the first char.
A way not working to return a chunk of char data is to return it in memory temporaryly allocated on the stack during the execution of your function and (most probably) already used for another purpose after it returned.
A working alternative would be to allocate the chunk of memory ont the heap. Make sure you read up about and understand the difference between stack and heap memory! The malloc() family of functions is your friend if you choose to return your data in a chunk of memory allocated on the heap (see man malloc).
char* a = (char*) malloc(some_int * sizeof(char)) should help in your case. Make sure you don't forget to free up memory once you don't need it any more.
char* ret = (char*) malloc((my_strlen(a) + my_strlen(b)) * sizeof(char)) for the second example given. Again don't forget to free once the memory isn't used any more.
As MByD correctly pointed out, it is not forbidden in general to use memory allocated on the stack to pass chunks of data in and out of functions. As long as the chunk is not allocated on the stack of the function returning this is also quite well.
In the scenario below function b will work on a chunk of memory allocated on the stackframe created, when function a entered and living until a returns. So everything will be pretty fine even though no memory allocated on the heap is involved.
void b(char input[]){
/* do something useful here */
}
void a(){
char buf[BUFFER_SIZE];
b(buf)
/* use data filled in by b here */
}
As still another option you may choose to leave memory allocation on the heap to the compiler, using a global variable. I'd count at least this option to the last resort category, as not handled properly, global variables are the main culprits in raising problems with reentrancy and multithreaded applications.
Happy hacking and good luck on your learning C mission.

How to free memory of temporary string?

I am playing around with my custom string library which is terrible by the way, but I am doing it for experience.
Anyways, I have some functions that allocate a block of memory for String* to use and it works fine. All of the memory used is freed up when the string_delete(string*) function is called.
But I came up with a new way of representing char*s as String*s but I am afraid the memory I reserve for it is not being freed down the road. Here is the function:
String* String_ToN(char* dest) {
String* temp = calloc(1, sizeof (struct String));
temp->length = strlen(dest);
temp->buffer = (char*) malloc(temp->length + 1);
strncpy(temp->buffer, dest, temp->length);
return temp;
}
I don't like using strdup being it is not standard c so I'll stick with malloc and strncpy.
This works and what I use it for is something like this:
String_GetLength(String*) takes in a String* parameter, so if I put a string literal in when calling it I would get an error.
So instead I go String_GetLength(String_ToN("hello")) and it returns 5 like I expected it to.
But again in String_ToN I use calloc and malloc, how would I free this memory and still be able to use ToN?
Unlike in C++, there is no automatic resource management in C (because there are no destructors). You would have to do something like:
String *hello = String_ToN("hello");
int len = String_GetLength(hello);
String_free(hello);
where String_free does all the necessary cleanup.
You need a function to delete or release your String-s, perhaps
void String_delete(String *s) {
if (!s) return;
free (s->buffer);
// you might want memset(s, 0, sizeof(*s)); to catch more bugs
free (s);
}
You might want to zero (as in the commented code) the memory before free-ing it. It might help catching dangling pointers bugs. But you could use tools like valgrind to catch them. Alternatively, using the Boehm's garbage collector is extremely useful: you can use GC_malloc instead of malloc (etc...) and don't bother calling free or GC_free. You'll find out by experience that releasing memory becomes a major issue in big programming projects (and no, RAII idiom is not a silver bullet).
As pointed out by Oli Charlesworth, you must create temporaty object. However, you could also add a flag
int dispose;
to your String structure and then set it while passing to some function. Then every function that get's your String must check this flag and if set, free the String structure. The code might look like this:
Process_String(String_ToN("Hello", 1));
then
Process_String(String *str) {
/* do smth with str */
if(str->dispose)
String_Delete(srt);
}
I agree that this design is more error prone and not how the things get normally done. So consider it just as educational example, no more, no less.

converting char** to char* or char

I have a old program in which some library function is used and i dont have that library.
So I am writing that program using libraries of c++.
In that old code some function is there which is called like this
*string = newstrdup("Some string goes here");
the string variable is declared as char **string;
What he may be doing in that function named "newstrdup" ?
I tried many things but i dont know what he is doing ... Can anyone help
The function is used to make a copy of c-strings. That's often needed to get a writable version of a string literal. They (string literals) are itself not writable, so such a function copies them into an allocated writable buffer. You can then pass them to functions that modify their argument given, like strtok which writes into the string it has to tokenize.
I think you can come up with something like this, since it is called newstrdup:
char * newstrdup(char const* str) {
char *c = new char[std::strlen(str) + 1];
std::strcpy(c, str);
return c;
}
You would be supposed to free it once done using the string using
delete[] *string;
An alternative way of writing it is using malloc. If the library is old, it may have used that, which C++ inherited from C:
char * newstrdup(char const* str) {
char *c = (char*) malloc(std::strlen(str) + 1);
if(c != NULL) {
std::strcpy(c, str);
}
return c;
}
Now, you are supposed to free the string using free when done:
free(*string);
Prefer the first version if you are writing with C++. But if the existing code uses free to deallocate the memory again, use the second version. Beware that the second version returns NULL if no memory is available for dup'ing the string, while the first throws an exception in that case. Another note should be taken about behavior when you pass a NULL argument to your newstrdup. Depending on your library that may be allowed or may be not allowed. So insert appropriate checks into the above functions if necessary. There is a function called strdup available in POSIX systems, but that one allows neither NULL arguments nor does it use the C++ operator new to allocate memory.
Anyway, i've looked with google codesearch for newstrdup functions and found quite a few. Maybe your library is among the results:
Google CodeSearch, newstrdup
there has to be a reason that they wrote a "new" version of strdup. So there must be a corner case that it handles differently. like perhaps a null string returns an empty string.
litb's answer is a replacement for strdup, but I would think there is a reason they did what they did.
If you want to use strdup directly, use a define to rename it, rather than write new code.
The line *string = newstrdup("Some string goes here"); is not showing any weirdness to newstrdup. If string has type char ** then newstrdup is just returning char * as expected. Presumably string was already set to point to a variable of type char * in which the result is to be placed. Otherwise the code is writing through an uninitialized pointer..
newstrdup is probably making a new string that is a duplicate of the passed string; it returns a pointer to the string (which is itself a pointier to the characters).
It looks like he's written a strdup() function to operate on an existing pointer, probably to re-allocate it to a new size and then fill its contents. Likely, he's doing this to re-use the same pointer in a loop where *string is going to change frequently while preventing a leak on every subsequent call to strdup().
I'd probably implement that like string = redup(&string, "new contents") .. but that's just me.
Edit:
Here's a snip of my 'redup' function which might be doing something similar to what you posted, just in a different way:
int redup(char **s1, const char *s2)
{
size_t len, size;
if (s2 == NULL)
return -1;
len = strlen(s2);
size = len + 1;
*s1 = realloc(*s1, size);
if (*s1 == NULL)
return -1;
memset(*s1, 0, size);
memcpy(*s1, s2, len);
return len;
}
Of course, I should probably save a copy of *s1 and restore it if realloc() fails, but I didn't need to get that paranoid.
I think you need to look at what is happening with the "string" variable within the code as the prototype for the newstrdup() function would appear to be identical to the library strdup() version.
Are there any free(*string) calls in the code?
It would appear to be a strange thing do to, unless it's internally keeping a copy of the duplicated string and returning a pointer back to the same string again.
Again, I would ask why?

Resources