Allocating memory in C Function - c

Its memory wastage vs cpu utilization question.
If I want to merge 3 strings:
Approach 1: Should I take all string lengths (strlen) and then allocate.
char *s = malloc(strlen(s1)+strlen(s2)+strlen(s3)+1);
OR
Approach 2: I should assume 1025 and allocate considering the fact that I know the strings will never go beyond 1025.
#define MAX 1025
char *s = malloc(MAX);
Please suggest.

Allocate memory for all 3 strings is better.
But if you are 100% (bolded because it is very important) sure that the string never exceed the fixed length, then go ahead. You have to foresee whether you may add things in the future that may exceed the max length, and you have to consider whether user input in any way can overflow the limit.
In case you may not need everything, you can also allocate fixed buffer and truncate the rest of the string if it is too long.
If the string has potential to grow very long (a hundreds of MB), then you shouldn't use this approach. In this case, the solution depends on your application.

Your malloc line looks fine.
The problem with your other alternative, assuming a fixed length, is that you may change your mind later about the fixed length without changing all of the corresponding problems in your code.

Approach 2: I should assume 1025 and allocate considering the fact that I know the strings will never go beyond 1025.
Yes, this one definitely. But you must be 100% sure. Always prefer automatic-storage allocation when you can.

Frankly, if you're going to opt for 2, then I'd suggest stack allocation (instead of malloc):
#define MAX 1025
char s[MAX];

I am assuming that sum total of all of the strings will be <= 1024?
In this case, you better allocate memory for 1024 (approach 2). This memory you can reuse over and over.
Problem with your approach 1; you have to reallocate memory newly based on the total for that particular instance. This is going to increase your CPU cycles, if you are concerned.

If you really really know that the total size will never go above 1025 or some similar reasonably small value, if possible by all means allocate the string on the stack instead.
OTOH, if you're going to use malloc, you might as well do the little bit of extra work to figure out how much you need to allocate up front. In this case as you go the strlen() route and are worried by efficiency, then at least store the results of the strlen() calls, and use memcpy() to build the combined result string instead of the str*() functions.

How often do you need to perform this operation, how many target strings do you need to hold in memory at once? Chances are it all doesn't matter. However, put it into a function char *my_strconcat3 (const char *s1, const char *s2, const char *s3); that returns the new string. Then you have exactly one location in your code where you need to change it, if the circumstances ever change.

In C99, if the string is not so big, I like this,
len = strlen(s1)+strlen(s2)+strlen(s3)+1;
char s[len];
It is called variable-length array in stack which is higher efficiency than malloc who allocates memory from heap.

Related

Is frequent reallocating memory for string a good practice?

I have format string and I'm parsing it than replacing format specifiers with input arguments. Now I consider how to allocate memory for such result string after substitution of arguments. I could allocate this string as long as format string, but than substitution of other string in the place of
%s of any long will need to reallocate this string in some indeterministic fusion, which making it necessary to do some inelegant calculations in the code.
So I thought I could allocate this string created from format string just char by char, reallocating it each time like:
/*** for loop traversing next chars in format string ***/
// if new char
str = realloc(str, sizeof(*str) +1);
// if %s
str = realloc(str, sizeof(*str) + strlen(in_str));
// if %d
str = realloc(str, sizeof(*str) + strlen(d_str));
Usually internal library code which deals with length of mutable strings/array/lists/whatever else does it in 2^n steps - i.e. when you have 4 bytes of memory and need to allocate 5, it actually allocates 8. This will reduce number of realloc() calls, which is expensive, to ~log(n) operations.
However there could be other optimisations, depending on library.
I won't comment about issues with the code that are adequately addressed in other answers.
Calling realloc for every individual act of extension of string isn't necessarily a poor practice. It looks as if it might perform badly, and to fix that, you could implement a scheme which grows the string in larger increments, less frequently. However, how do you know that realloc doesn't already do something of that sort internally? For instance you might think you're being clever by growing a 128 byte string to 256 bytes and then to 512, rather than one character at a time. But if those happen to be the only available malloc internal block sizes, then realloc cannot help but also step through the same sizes. Okay, what about the saving in the raw number of realloc calls that are made? But those are just replaced by invocations of the more clever string growing logic.
If the performance of this format-string-building loop matters, then profile it. Make a version that reduces the realloc operations and profile that, too. Compare it on all the target platforms you write for and on which the performance matters.
If the performance isn't critical, then optimize for properties like good structure, maintainability and code reuse. A function which builds a format string doesn't have to know about string memory management. It needs an abstract interface for working with dynamic strings. That interface can provide a nice function for changing a string in-place by adding a copy of another string at its tail. Functions other than just the format-string producer can use these string operations.
The string management code can decide, in one place, whether it calls realloc every time the length of a string changes, or whether it tracks the storage size separately from the length, and consequently reduces the number of realloc calls.
Code like this:
str = realloc(str, sizeof(*str) +1);
is bad. If realloc fails it will return NULL but it will not free(str). In other words - memory leak. You need to assign the result of realloc to another pointer, then check for NULL and act accordingly.
Whether or not it is good or bad practice to use many realloc depends on what you are trying to obtain, i.e. performance, maintainability, clearness, etc. The best advice is: Write the code as you like it to be. Then profile it. Performance issues? No -> Be happy. Yes -> Rewrite the code with focus on performance.

using malloc when I know the maximum size of the char array

In my program, I'm declarign a character array holding the location of a config file, it should be something like:
"/home/user/.config"
now I understand the longest username can be 32 bytes long(GNU Linux), so I know that array will not hold more than 46 characters, in this case should I be using malloc or not.
should I use:
char config_file_location[46];
strcpy (config_file_location, getenv("HOME"));
strcat(config_file_location,"/.config");
or:
char *config_file_location;
config_file_location = (char *) malloc(43);
strcpy (config_file_location, getenv("HOME"));
strcat(config_file_location,"/.config");
//code goes here
free(config_file_location);
also should I use realloc in the above example to get the config_file_location to use exactly the amount of memory it is supposed to?
I'm looking for best practice info, if it is not worth doing in this case, I would like to know when it would be, and I would like to know the reason behind which approach is better.
Thanks I appreciate it.
There are two reasons why you would use dynamic allocation:
Either because the amount of memory needed isn't known at compile-time, or because the amount of memory needs to be reallocated in run-time.
Or because you need to allocate large amounts of data and don't want to burden the stack with it. Allocating too much memory on the stack can in the worst case lead to mysterious run-time crashes caused by stack overflow. To avoid this, large amounts of data should be allocated on the heap instead.
In your case, you have a fixed amount of data and 43 bytes is hardly a large amount. So there is no need to use dynamic allocation here.
Apart from the usual issues with memory leaks and heap fragmentation, you also have to consider that each call to malloc (and free) is quite time-consuming. On systems where dynamic allocation is feasible (such as Linux), it almost always makes more sense to optimize for speed instead of memory consumption.
Unless you are working in some really, really memory constrained environment, I wouldn't be worried about optimising how much memory your application uses. Just allocate a buffer on the stack that is "big enough" for the largest path you might encounter.
As to how big that is, no one is going to give you a definite answer. You could use PATH_MAX, although it has been noted even that has problems. In these situations I would just take a pragmatic approach and go for something like 256 bytes. Job done. Move on.

How do I know how much memory to realloc?

I have one question regarding design of my application.
Here's the pseudo code:
char* buffer_to_be_filled = (char*) malloc(somesize);
fill_the_buffer(buffer_to_be_filled);
free(buffer_to_be_filled);
The problem is that I don't know how much size fill_the_buffer requires.
I was thinking about the solution inside the fill_the_buffer function.
I could maybe realloc the space inside when needed; the problem is, is there any way to find out how much space I have available?
How is this usually solved? I think that the one who allocates the buffer should also realloc the buffer as well, right?
NOTE : I'm filling the buffer using fread function, so I don't know how much space will I need.
Your function can't realloc the pointer that was passed to it, because realloc isn't guaranteed to return the same pointer it was passed (the new buffer might be too big to expand in-place). The typical solution is for the function to take a second argument specifying the size of the buffer, and to return an error code if the buffer is too small. Ideally, the error code will tell the user how big the buffer needs to be, so they can reallocate it themselves and recall the function. For example, from the man page for snprintf (which has this problem):
The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated.
It seems that fill_the_buffer() function is in a much better position to know...
- how to dimension the buffer initially and/or
- when to re-alloc the buffer and by how much.
Therefore it may be appropriate to change the API:
char * fill_the_buffer()
or maybe
char * fill_the_buffer(size_t max_amount_caller_wants)
The caller to fill_the_buffer() would still be responsible for disposing of the buffer returned by the function, but the allocation and dimensionning would be left to the function's logic.
This approach generally follow the idea of leaving implementation details to the lower level, making the upper levels more readable.
You must pass the buffer size into the fill_the_buffer function. If the buffer is not big enough, your function must return an error value (f/e -1). In case of success, your function can return the count of writen bytes. This method is common practice for the C language.
I've a suggestion if you have no problem in allocating free memory:
Allocate an initial size at the beginning of the program using malloc, (try to make a good guess for this initial allocation),then in the fill_the_buffer you may need to allocate more memory or may you don't need all allocated memory. In first case, you can allocate a suitable amount of memory (depends on to your application & your available ram) in some steps (e.g 10MB at each leakage) & then resume filling the buffer till you need more memory & repeat this while buffer fills.
In second case you can simply realloc to decrease size of allocated memory of your buffer.
But take care about using realloc specially when you want to increase size of buffer, because it usually causes big overhead (It has to find's a big enough part of free memory, then copy all of old data into new part & free the old section).

Time- and memory-efficient way to allocate memory for a string

I'm reading a file into memory in C by copying the bytes to a dynamic array. Currently, I realloc() one byte larger each time a new byte comes in. This seems inefficient.
Some suggest (I can't remember where) that doubling the memory each time more is needed is good because it's O(log n) allocation time, with the only expense of a worst case of just under half of the memory being unused.
Any recommendations on memory allocation?
If you are loading the whole file into a string you could probably use the method outlined in this question. This way you can get the size of the file in bytes and allocate your string to hold that (Don't forget the extra byte for the null character).
However, if you are dynamically growing a string it is better to increase it's size by some factor that's larger than a single byte (reallocating a string each byte is going to be very slow, especially if the string has to be allocated in a new area of memory and then copied over). Since you are reading a file doubling it is probably very reasonable. I've seen people use other methods to do this as well, for example:
I've seen people round to the next power of 2, for example 2, 4, 8, then 16 bytes. (which is essentially doubling the size of the file each time).
I've also seen people use a value that's more suited for the strings they intend to read, ie. 100 bytes at a time.
If you over allocate the string you could always get that memory back at the end with a final reallocation to the exact size you need.
Do what some suggest (increase the size of the buffer by a multiplicative factor each time you need more room). I've done this many times and it works well. If you don't like the factor of two, you can use something else. I've used Phi (the golden ratio) to good effect.
I don't have a cite for this in front of me, and it is probably an implementation-specific detail, but I believe that power-of-2-resized pointers are what are used to resize C++ STL's string objects, as characters are continually added. (It should be easy to verify this by calling the string::capacity method as characters are added.)

char pointer overflow

I am new to C and I was wondering if it was possible for a pointer to be overflowed by a vulnerable c function like strcpy(). I have seen it a lot in source code, is it a way to avoid buffer overflows?
Yes it is. This is in fact the classic cause of buffer overflow vulnerabilities. The only way to avoid overflowing the buffer is to ensure that you don't do anything that can cause the overflow. In the case of strcpy the solution is to use strncpy which includes the size of the buffer into which the string is being copied.
Sure, if you don't allocate enough space for a buffer, then you certainly can:
char* ptr = (char*)malloc(3);
strcpy(ptr, "this is very, very bad"); /* ptr only has 3 bytes allocated! */
However, what's really bad is that this code could work without giving you any errors, but it may overwrite some memory somewhere that could cause your program to blow up later, seemingly randomly, and you could have no idea why. Those are the source of hours (sometimes even days) of frustration, which anyone whose spent any significant amount of time writing C will tell you.
That is why with C, you have to be extremely careful with such things, and double, triple, nth degree check your code. After that, check it again.
Some other approaches are
#define MAX_LENGTH_NAME 256
foo()
{
char a[MAX_LENGTH_NAME+1]; // You can also use malloc here
strncpy(a,"Foxy",MAX_LENGTH_NAME);
snprintf(a,MAX_LENGTH_NAME,"%s","Foxy");
}
So its good to know the size of allocated memory and then use the calls to avoid buffer overflows.
Static analysis of already written code may point out these kinds of mistakes and you can change it too.

Resources