Is frequent reallocating memory for string a good practice? - c

I have format string and I'm parsing it than replacing format specifiers with input arguments. Now I consider how to allocate memory for such result string after substitution of arguments. I could allocate this string as long as format string, but than substitution of other string in the place of
%s of any long will need to reallocate this string in some indeterministic fusion, which making it necessary to do some inelegant calculations in the code.
So I thought I could allocate this string created from format string just char by char, reallocating it each time like:
/*** for loop traversing next chars in format string ***/
// if new char
str = realloc(str, sizeof(*str) +1);
// if %s
str = realloc(str, sizeof(*str) + strlen(in_str));
// if %d
str = realloc(str, sizeof(*str) + strlen(d_str));

Usually internal library code which deals with length of mutable strings/array/lists/whatever else does it in 2^n steps - i.e. when you have 4 bytes of memory and need to allocate 5, it actually allocates 8. This will reduce number of realloc() calls, which is expensive, to ~log(n) operations.
However there could be other optimisations, depending on library.

I won't comment about issues with the code that are adequately addressed in other answers.
Calling realloc for every individual act of extension of string isn't necessarily a poor practice. It looks as if it might perform badly, and to fix that, you could implement a scheme which grows the string in larger increments, less frequently. However, how do you know that realloc doesn't already do something of that sort internally? For instance you might think you're being clever by growing a 128 byte string to 256 bytes and then to 512, rather than one character at a time. But if those happen to be the only available malloc internal block sizes, then realloc cannot help but also step through the same sizes. Okay, what about the saving in the raw number of realloc calls that are made? But those are just replaced by invocations of the more clever string growing logic.
If the performance of this format-string-building loop matters, then profile it. Make a version that reduces the realloc operations and profile that, too. Compare it on all the target platforms you write for and on which the performance matters.
If the performance isn't critical, then optimize for properties like good structure, maintainability and code reuse. A function which builds a format string doesn't have to know about string memory management. It needs an abstract interface for working with dynamic strings. That interface can provide a nice function for changing a string in-place by adding a copy of another string at its tail. Functions other than just the format-string producer can use these string operations.
The string management code can decide, in one place, whether it calls realloc every time the length of a string changes, or whether it tracks the storage size separately from the length, and consequently reduces the number of realloc calls.

Code like this:
str = realloc(str, sizeof(*str) +1);
is bad. If realloc fails it will return NULL but it will not free(str). In other words - memory leak. You need to assign the result of realloc to another pointer, then check for NULL and act accordingly.
Whether or not it is good or bad practice to use many realloc depends on what you are trying to obtain, i.e. performance, maintainability, clearness, etc. The best advice is: Write the code as you like it to be. Then profile it. Performance issues? No -> Be happy. Yes -> Rewrite the code with focus on performance.

Related

C: What size should you allocate to a string array to be passed to strcpy, to be copied into.?

If I need to copy a string, src into the array, dest, using strcpy or strncpy, should I allocate an arbitrarily large sized array (like char dest[1024] for example) or should I calculate the size of src using something like strlen and then dynamically allocate a block using malloc (strlen(src) * sizeof(char)).
The first approach seems "easier" but wouldn't I be consuming more space than I needed? And in some cases it might even fall short. On the other hand, the second approach seems more precise but tedious to do every time and I would have to deallocate the memory everytime.
Is this a matter of personal taste or is one of the above preferable over the other?
The simplest way is to use the strdup() function, which effectively merges the strlen(), malloc() and strcpy() calls into one. You will still need to call free() to release the allocated data in the same way.
It depends on several factors.
If you know the maximum size of the source string in advance and it's not too big (i.e. less than 1K or so) and the destination doesn't need to be used after the current function returns, then you can use a fixed size buffer.
If the source string could be arbitrarily large, or if you need to return the destination string from the function, then you should allocate memory dynamically. Note that if you use malloc (strlen(src) * sizeof(char)) that 1) sizeof(char) is always 1 so you can omit it, and 2) you didn't allocate space for the terminating null byte. So you would need malloc (strlen(src) + 1). Also, you can do the allocation and copying in a single operation using strdup if your system has that function.
The second approach is more secure than the first one. It will fail if you are dealing with a large string.
You should also take into consideration the END OF STRING character '\0' when allocating memory for your new string. I

How do I know how much memory to realloc?

I have one question regarding design of my application.
Here's the pseudo code:
char* buffer_to_be_filled = (char*) malloc(somesize);
fill_the_buffer(buffer_to_be_filled);
free(buffer_to_be_filled);
The problem is that I don't know how much size fill_the_buffer requires.
I was thinking about the solution inside the fill_the_buffer function.
I could maybe realloc the space inside when needed; the problem is, is there any way to find out how much space I have available?
How is this usually solved? I think that the one who allocates the buffer should also realloc the buffer as well, right?
NOTE : I'm filling the buffer using fread function, so I don't know how much space will I need.
Your function can't realloc the pointer that was passed to it, because realloc isn't guaranteed to return the same pointer it was passed (the new buffer might be too big to expand in-place). The typical solution is for the function to take a second argument specifying the size of the buffer, and to return an error code if the buffer is too small. Ideally, the error code will tell the user how big the buffer needs to be, so they can reallocate it themselves and recall the function. For example, from the man page for snprintf (which has this problem):
The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated.
It seems that fill_the_buffer() function is in a much better position to know...
- how to dimension the buffer initially and/or
- when to re-alloc the buffer and by how much.
Therefore it may be appropriate to change the API:
char * fill_the_buffer()
or maybe
char * fill_the_buffer(size_t max_amount_caller_wants)
The caller to fill_the_buffer() would still be responsible for disposing of the buffer returned by the function, but the allocation and dimensionning would be left to the function's logic.
This approach generally follow the idea of leaving implementation details to the lower level, making the upper levels more readable.
You must pass the buffer size into the fill_the_buffer function. If the buffer is not big enough, your function must return an error value (f/e -1). In case of success, your function can return the count of writen bytes. This method is common practice for the C language.
I've a suggestion if you have no problem in allocating free memory:
Allocate an initial size at the beginning of the program using malloc, (try to make a good guess for this initial allocation),then in the fill_the_buffer you may need to allocate more memory or may you don't need all allocated memory. In first case, you can allocate a suitable amount of memory (depends on to your application & your available ram) in some steps (e.g 10MB at each leakage) & then resume filling the buffer till you need more memory & repeat this while buffer fills.
In second case you can simply realloc to decrease size of allocated memory of your buffer.
But take care about using realloc specially when you want to increase size of buffer, because it usually causes big overhead (It has to find's a big enough part of free memory, then copy all of old data into new part & free the old section).

Allocating memory in C Function

Its memory wastage vs cpu utilization question.
If I want to merge 3 strings:
Approach 1: Should I take all string lengths (strlen) and then allocate.
char *s = malloc(strlen(s1)+strlen(s2)+strlen(s3)+1);
OR
Approach 2: I should assume 1025 and allocate considering the fact that I know the strings will never go beyond 1025.
#define MAX 1025
char *s = malloc(MAX);
Please suggest.
Allocate memory for all 3 strings is better.
But if you are 100% (bolded because it is very important) sure that the string never exceed the fixed length, then go ahead. You have to foresee whether you may add things in the future that may exceed the max length, and you have to consider whether user input in any way can overflow the limit.
In case you may not need everything, you can also allocate fixed buffer and truncate the rest of the string if it is too long.
If the string has potential to grow very long (a hundreds of MB), then you shouldn't use this approach. In this case, the solution depends on your application.
Your malloc line looks fine.
The problem with your other alternative, assuming a fixed length, is that you may change your mind later about the fixed length without changing all of the corresponding problems in your code.
Approach 2: I should assume 1025 and allocate considering the fact that I know the strings will never go beyond 1025.
Yes, this one definitely. But you must be 100% sure. Always prefer automatic-storage allocation when you can.
Frankly, if you're going to opt for 2, then I'd suggest stack allocation (instead of malloc):
#define MAX 1025
char s[MAX];
I am assuming that sum total of all of the strings will be <= 1024?
In this case, you better allocate memory for 1024 (approach 2). This memory you can reuse over and over.
Problem with your approach 1; you have to reallocate memory newly based on the total for that particular instance. This is going to increase your CPU cycles, if you are concerned.
If you really really know that the total size will never go above 1025 or some similar reasonably small value, if possible by all means allocate the string on the stack instead.
OTOH, if you're going to use malloc, you might as well do the little bit of extra work to figure out how much you need to allocate up front. In this case as you go the strlen() route and are worried by efficiency, then at least store the results of the strlen() calls, and use memcpy() to build the combined result string instead of the str*() functions.
How often do you need to perform this operation, how many target strings do you need to hold in memory at once? Chances are it all doesn't matter. However, put it into a function char *my_strconcat3 (const char *s1, const char *s2, const char *s3); that returns the new string. Then you have exactly one location in your code where you need to change it, if the circumstances ever change.
In C99, if the string is not so big, I like this,
len = strlen(s1)+strlen(s2)+strlen(s3)+1;
char s[len];
It is called variable-length array in stack which is higher efficiency than malloc who allocates memory from heap.

Time- and memory-efficient way to allocate memory for a string

I'm reading a file into memory in C by copying the bytes to a dynamic array. Currently, I realloc() one byte larger each time a new byte comes in. This seems inefficient.
Some suggest (I can't remember where) that doubling the memory each time more is needed is good because it's O(log n) allocation time, with the only expense of a worst case of just under half of the memory being unused.
Any recommendations on memory allocation?
If you are loading the whole file into a string you could probably use the method outlined in this question. This way you can get the size of the file in bytes and allocate your string to hold that (Don't forget the extra byte for the null character).
However, if you are dynamically growing a string it is better to increase it's size by some factor that's larger than a single byte (reallocating a string each byte is going to be very slow, especially if the string has to be allocated in a new area of memory and then copied over). Since you are reading a file doubling it is probably very reasonable. I've seen people use other methods to do this as well, for example:
I've seen people round to the next power of 2, for example 2, 4, 8, then 16 bytes. (which is essentially doubling the size of the file each time).
I've also seen people use a value that's more suited for the strings they intend to read, ie. 100 bytes at a time.
If you over allocate the string you could always get that memory back at the end with a final reallocation to the exact size you need.
Do what some suggest (increase the size of the buffer by a multiplicative factor each time you need more room). I've done this many times and it works well. If you don't like the factor of two, you can use something else. I've used Phi (the golden ratio) to good effect.
I don't have a cite for this in front of me, and it is probably an implementation-specific detail, but I believe that power-of-2-resized pointers are what are used to resize C++ STL's string objects, as characters are continually added. (It should be easy to verify this by calling the string::capacity method as characters are added.)

Determining realloc() behaviour before calling it

As I understand it, when asked to reserve a larger block of memory, the realloc() function will do one of three different things:
if free contiguous block exists
grow current block
else if sufficient memory
allocate new memory
copy old memory to new
free old memory
else
return null
Growing the current block is a very cheap operation, so this is behaviour I'd like to take advantage of. However, if I'm reallocating memory because I want to (for example) insert a char at the start of an existing string, I don't want realloc() to copy the memory. I'll end up copying the entire string with realloc(), then copying it again manually to free up the first array element.
Is it possible to determine what realloc() will do? If so, is it possible to achieve in a cross-platform way?
realloc()'s behavior is likely dependent on its specific implementation. And basing your code on that would be a terrible hack which, to say the least, violates encapsulation.
A better solution for your specific example is:
Find the size of the current buffer
Allocate a new buffer (with malloc()), greater than the previous one
Copy the prefix you want to the new buffer
Copy the string in the previous buffer to the new buffer, starting after the prefix
Release the previous buffer
As noted in the comments, case 3 in the question (no memory) is wrong; realloc() will return NULL if there is no memory available [question now fixed].
Steve McConnell in 'Code Complete' points out that if you save the return value from realloc() in the only copy of the original pointer when realloc() fails, you've just leaked memory. That is:
void *ptr = malloc(1024);
...
if ((ptr = realloc(ptr, 2048)) == 0)
{
/* Oops - cannot free original memory allocation any more! */
}
Different implementations of realloc() will behave differently. The only safe thing to assume is that the data will always be moved - that you will always get a new address when you realloc() memory.
As someone else pointed out, if you are concerned about this, maybe it is time to look at your algorithms.
Would storing your string backwards help?
Otherwise...
just malloc() more space than you need, and when you run out of room, copy to a new buffer. A simple technique is to double the space each time; this works pretty well because the larger the string (i.e. the more time copying to a new buffer will takes) the less often it needs to occur.
Using this method you can also right-justify your string in the buffer, so it's easy to add characters to the start.
If obstacks are a good match for your memory allocation needs, you can use their fast growing functionality. Obstacks are a feature of glibc, but they are also available in the libiberty library, which is fairly portable.
No - and if you think about it, it can't work. Between you checking what it's going to do and actually doing it, another process could allocate memory.
In a multi-threaded application this can't work. Between you checking what it's going to do and actually doing it, another thread could allocate memory.
If you're worried about this sort of thing, it might be time to look at the data structures you're using to see if you can fix the problem there. Depending on how these strings are constructed, you can do so quite efficiently with a well designed buffer.
Why not keep some empty buffer space in the left of the string, like so:
char* buf = malloc(1024);
char* start = buf + 1024 - 3;
start[0]='t';
start[1]='o';
start[2]='\0';
To add "on" to the beginning of your string to make it "onto\0":
start-=2;
if(start < buf)
DO_MEMORY_STUFF(start, buf);//time to reallocate!
start[0]='o';
start[1]='n';
This way, you won't have to keep copying your buffer every single time you want to do an insertion at the beginning.
If you have to do insertions at both the beginning and end, just have some space allocated at both ends; insertions in the middle will still need you to shuffle elements around, obviously.
A better approach is to use a linked list. Have each of your data objects allocated on a page, and allocate another page and have a link to it, either from the previous page or from an index page. This way you know when the next alloc fails, and you never need to copy memory.
I don't think it's possible in cross platform way.
Here is the code for ulibc implementation that might give you a clue how to do itin platform dependent way, actually it's better to find glibc source but this one was on top of google search :)

Resources