How do I know how much memory to realloc? - c

I have one question regarding design of my application.
Here's the pseudo code:
char* buffer_to_be_filled = (char*) malloc(somesize);
fill_the_buffer(buffer_to_be_filled);
free(buffer_to_be_filled);
The problem is that I don't know how much size fill_the_buffer requires.
I was thinking about the solution inside the fill_the_buffer function.
I could maybe realloc the space inside when needed; the problem is, is there any way to find out how much space I have available?
How is this usually solved? I think that the one who allocates the buffer should also realloc the buffer as well, right?
NOTE : I'm filling the buffer using fread function, so I don't know how much space will I need.

Your function can't realloc the pointer that was passed to it, because realloc isn't guaranteed to return the same pointer it was passed (the new buffer might be too big to expand in-place). The typical solution is for the function to take a second argument specifying the size of the buffer, and to return an error code if the buffer is too small. Ideally, the error code will tell the user how big the buffer needs to be, so they can reallocate it themselves and recall the function. For example, from the man page for snprintf (which has this problem):
The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated.

It seems that fill_the_buffer() function is in a much better position to know...
- how to dimension the buffer initially and/or
- when to re-alloc the buffer and by how much.
Therefore it may be appropriate to change the API:
char * fill_the_buffer()
or maybe
char * fill_the_buffer(size_t max_amount_caller_wants)
The caller to fill_the_buffer() would still be responsible for disposing of the buffer returned by the function, but the allocation and dimensionning would be left to the function's logic.
This approach generally follow the idea of leaving implementation details to the lower level, making the upper levels more readable.

You must pass the buffer size into the fill_the_buffer function. If the buffer is not big enough, your function must return an error value (f/e -1). In case of success, your function can return the count of writen bytes. This method is common practice for the C language.

I've a suggestion if you have no problem in allocating free memory:
Allocate an initial size at the beginning of the program using malloc, (try to make a good guess for this initial allocation),then in the fill_the_buffer you may need to allocate more memory or may you don't need all allocated memory. In first case, you can allocate a suitable amount of memory (depends on to your application & your available ram) in some steps (e.g 10MB at each leakage) & then resume filling the buffer till you need more memory & repeat this while buffer fills.
In second case you can simply realloc to decrease size of allocated memory of your buffer.
But take care about using realloc specially when you want to increase size of buffer, because it usually causes big overhead (It has to find's a big enough part of free memory, then copy all of old data into new part & free the old section).

Related

What's purpose of __linecapp in getline, when it'll auto resize?

It's all about second parameter of getline in stdio.h,
I'll name it 'n' or '__linecapp' below.
According to the document:
If the
buffer is not large enough to hold the line, getline() resizes it
with realloc(3), updating *lineptr and *n as necessary.
It'll automatically update line capacity, then why should we input __linecapp?
P.S Someone ask before, but discussion didn't explain when we need it, or how to make it useful.
Heap allocation is a relatively expensive operation so you want to minimize those to be efficient.
getline() will only allocate a new buffer if requested (by setting lineptr and n to NULL). Otherwise it will reuse the buffer that lineptr points to, and for that it needs to ensure the buffer is large enough. If the size of the buffer is not known, then realloc() would have to be called on every invocation to ensure it's big enough. getline() will also resize the buffer more than 1 byte at a time (to make it amortized linear time instead of n^2), for example, I read 3 bytes (size) but it allocated 120 bytes (capacity).

C: What size should you allocate to a string array to be passed to strcpy, to be copied into.?

If I need to copy a string, src into the array, dest, using strcpy or strncpy, should I allocate an arbitrarily large sized array (like char dest[1024] for example) or should I calculate the size of src using something like strlen and then dynamically allocate a block using malloc (strlen(src) * sizeof(char)).
The first approach seems "easier" but wouldn't I be consuming more space than I needed? And in some cases it might even fall short. On the other hand, the second approach seems more precise but tedious to do every time and I would have to deallocate the memory everytime.
Is this a matter of personal taste or is one of the above preferable over the other?
The simplest way is to use the strdup() function, which effectively merges the strlen(), malloc() and strcpy() calls into one. You will still need to call free() to release the allocated data in the same way.
It depends on several factors.
If you know the maximum size of the source string in advance and it's not too big (i.e. less than 1K or so) and the destination doesn't need to be used after the current function returns, then you can use a fixed size buffer.
If the source string could be arbitrarily large, or if you need to return the destination string from the function, then you should allocate memory dynamically. Note that if you use malloc (strlen(src) * sizeof(char)) that 1) sizeof(char) is always 1 so you can omit it, and 2) you didn't allocate space for the terminating null byte. So you would need malloc (strlen(src) + 1). Also, you can do the allocation and copying in a single operation using strdup if your system has that function.
The second approach is more secure than the first one. It will fail if you are dealing with a large string.
You should also take into consideration the END OF STRING character '\0' when allocating memory for your new string. I

Is it bad practice to hide memory allocations in functions?

Should I expect the user to provide a memory chunk of sufficient size, say, for copying a file into a buffer? Or should I allocate the memory myself, and expect the user to free it when they're done? For example, the function strdup() allocates memory itself, but the function fread() expects only a buffer of sufficient size.
It depends - I've seen C APIs use all kind of patterns for this, such as:
functions that require the buffer and buffer size to be provided, and return the required size (so that you can adjust the buffer size if it was truncated); many of these allow passing NULL as a buffer if you are just asking how big the buffer should be; this allows the caller to use an existing buffer or to allocate an appropriately sized one, although with two calls;
separate functions to obtain needed size and to fill the buffer; same as above, but with a clearer interface;
functions that require buffer and buffer size, but can allocate the buffer themselves if NULL is passed as buffer; maximum flexibility and terseness, but the function signature can get confusing;
functions that just return a newly allocated string; simple to use and avoids bugs arising from unguarded truncation, but inflexible if performance is a concern; also, requires the caller to remember to free the returned value, which is avoided in the cases above if using a stack-allocated buffer;
functions that return a pointer to a static buffer, and then the caller is responsible to do whatever with it; extremely easy to use, extremely easy to misuse; requires care in case of multithreading (needs thread local storage) and if reentrancy is a concern.
The last one is generally a bad idea - it poses problems with reentrancy and thread safety; the one before it can be used but may pose efficiency problems - I generally don't want to waste time in allocations if I have already a buffer big enough. All the others are generally pretty much OK.
But besides the specifics of the interface, the most important point if you allocate stuff and/or return pointers is to clearly document who owns the pointed memory - is it a static object in your library? Is it a pointer to some internal of an object provided by the caller? Is it dynamically allocated stuff? Is the caller responsible for freeing it? Is it just the buffer that was provided as argument?
Most importantly, in case you allocated stuff, always specify how to deallocate it; notice that, if you are building a library that may be compiled as a dll/so, it's a good idea to provide your own deallocation function (even if it's just a wrapper around free) to avoid mismatches between different versions of the C runtime running in the same process. Also, it avoids tying your code to the C library allocator - today it may be fine, tomorrow it may turn out that using a custom allocator may be a better idea.
Is it bad practice to hide memory allocations in functions?
Sometimes.
An answer to show when code can be abused to detail one of the pitfalls of allowing a function total freedom in memory allocation.
A classic case occurs when the function itself determines the size needed, so the calling code lacks the information needed to to provide the memory buffer beforehand.
This is the case with getline() where the stream content throttles the size of the allocation. The problem with this, especially when the stream is stdin, is that the control over memory allocation is given to external sources and not limited by the calling code - the program. External input may overwhelm memory space - a hack.
With a modified function, such as ssize_t getline_limit(char **lineptr, size_t *n, FILE *stream, size_t limit);, the function could still provide a right-size allocation, yet still prevent a hacker abuse.
#define LIMIT 1000000
char *line = NULL;
size_t len = 0;
ssize_t nread;
while ((nread = getline_limit(&line, &len, stdin, LIMIT)) != -1) {
An example where this is not an issue would be an allocation with a well bounded use.
// Convert `double` to its decimal character representation allocating a right-size buffer
// At worst a few thousand characters
char *double_to_string_exact_alloc(int x)
Functions that perform memory allocation need some level of control to prevent unlimited memory allocation either with a specific parameter or by nature of the task.
C library functions refrain from returning allocated memory. That's at least part of the reason why strdup is not part of the standard library, along with a popular scanf extension for reading C strings of unlimited length.
Your library could choose either way. Using pre-allocated buffers is more flexible, because it lets users pass you statically allocated buffers. This flexibility comes at a cost, because user's code becomes more verbose.
If you choose to allocate memory for a custom struct dynamically, it is a good idea to make a matching function for deallocating the struct once it becomes unnecessary to the user.

malloc() in C not working as expected

I'm new to C. Sorry if this has already been answered, I could'n find a straight answer, so here we go..
I'm trying to understand how malloc() works in C. I have this code:
#define MAXLINE 100
void readInput(char **s)
{
char temp[MAXLINE];
printf("Please enter a string: ");
scanf("%s", temp);
*s = (char *)malloc((strlen(temp)+1)*sizeof(char)); // works as expected
//*s = (char *)malloc(2*sizeof(char)); // also works even when entering 10 chars, why?
strcpy ((char *)*s, temp);
}
int main()
{
char *str;
readInput(&str);
printf("Your string is %s\n", str);
free(str);
return 0;
}
The question is why doesn't the program crash (or at least strip the remaining characters) when I call malloc() like this:
*s = (char *)malloc(2*sizeof(char)); // also works even when entering 10 chars, why?
Won't this cause a buffer overflow if I enter a string with more than two characters? As I understood malloc(), it allocates a fixed space for data, so surely allocating the space for only two chars would allow the string to be maximum of one usable character ('0\' being the second), but it still is printing out all the 10 chars entered.
P.S. I'm using Xcode if that makes any difference.
Thanks,
Simon
It works out fine because you're lucky! Usually, a block a little larger than just 2 bytes is given to your program by your operating system.
If the OS actually gave you 16 bytes when you asked for 2 bytes, you could write 16 bytes without the OS taking notice of it. However if you had another malloc() in your program which used the other 14 bytes, you would write over that variables content.
The OS doesn't care about you messing about inside your own program. Your program will only crash if you write outside what the OS has given you.
Try to write 200 bytes and see if it crashes.
Edit:
malloc() and free() uses some of the heap space to maintain information about allocated memory. This information is usually stored in between the memory blocks. If you overflow a buffer, this information may get overwritten.
Yes writing more data into an allocated buffer is a buffer overflow. However there is no buffer overflow check in C and if there happens to be valid memory after your buffer than your code will appear to work correctly.
However what you have done is write into memory that you don't own and likely have corrupted the heap. Your next call to free or malloc will likely crash, or if not the next call, some later call could crash, or you could get lucky and malloc handed you a larger buffer than you requested, in which case you'll never see an issue.
Won't this cause a buffer overflow if I enter a string with more than two characters?
Absolutely. However, C does no bounds checking at runtime; it assumes you knew what you were doing when you allocated the memory, and that you know how much is available. If you go over the end of the buffer, you will clobber whatever was there before.
Whether that causes your code to crash or not depends on what was there before and what you clobbered it with. Not all overflows will kill your program, and overflow in the heap may not cause any (obvious) problems at all.
This is because even if you did not allocate the memory, the memory exists.
You are accessing data that is not yours, and probably that with a good debugger, or static analyzer you would have seen the error.
Also if you have a variable that is just behind the block you allocated it will probably be overriden by what you enter.
Simply this is one of the case of undefined behavior. You are unlucky that you are getting the expected result.
It does cause a buffer overflow. But C doesn’t do anything to prevent a buffer overflow. Neither do most implementations of malloc.
In general, a crash from a buffer overflow only occurs when...
It overflows a page—the unit of memory that malloc actually gets from the operating system. Malloc will fulfill many individual allocation requests from the same page of memory.
The overflow corrupts the memory that follows the buffer. This doesn’t cause an immediate crash. It causes a crash later when other code runs that depends upon the contents of that memory.
(...but these things depend upon the specifics of the system involved.)
It is entirely possible, if you are lucky, that a buffer overflow will never cause a crash. Although it may create other, less noticeable problems.
malloc() is the function call which is specified in Stdlib.h header file. If you are using arrays, you have to fix your memory length before utilize it. But in malloc() function, you can allocate the memory when you need and in required size. When you allocate the memory through malloc() it will search the memory modules and find the free block. even the memory blocks are in different places, it will assign a address and connect all the blocks.
when your process finish, you can free it. Free means, assigning a memory is in RAM only. once you process the function and make some data, you will shift the data to hard disk or any other permenant storage. afterwards, you can free the block so you can use for another data.
If you are going through pointer function, with out malloc() you can not make data blocks.
New() is the keyword for c++.
When you don't know when you are programming how big is the space of memory you will need, you can use the function malloc
void *malloc(size_t size);
The malloc() function shall allocate unused space for an object whose size in bytes is specified by size and whose value is unspecified.
how does it work is the question...
so
your system have the free chain list, that lists all the memory spaces available, the malloc search this list until it finds a space big enough as you required. Then it breaks this space in 2, sends you the space you required and put the other one back in the list. It breaks in pieces of size 2^n that way you wont have weird space sizes in your list, what makes it easy just like Lego.
when you call 'free' your block goes back to the free chain list.

Determining realloc() behaviour before calling it

As I understand it, when asked to reserve a larger block of memory, the realloc() function will do one of three different things:
if free contiguous block exists
grow current block
else if sufficient memory
allocate new memory
copy old memory to new
free old memory
else
return null
Growing the current block is a very cheap operation, so this is behaviour I'd like to take advantage of. However, if I'm reallocating memory because I want to (for example) insert a char at the start of an existing string, I don't want realloc() to copy the memory. I'll end up copying the entire string with realloc(), then copying it again manually to free up the first array element.
Is it possible to determine what realloc() will do? If so, is it possible to achieve in a cross-platform way?
realloc()'s behavior is likely dependent on its specific implementation. And basing your code on that would be a terrible hack which, to say the least, violates encapsulation.
A better solution for your specific example is:
Find the size of the current buffer
Allocate a new buffer (with malloc()), greater than the previous one
Copy the prefix you want to the new buffer
Copy the string in the previous buffer to the new buffer, starting after the prefix
Release the previous buffer
As noted in the comments, case 3 in the question (no memory) is wrong; realloc() will return NULL if there is no memory available [question now fixed].
Steve McConnell in 'Code Complete' points out that if you save the return value from realloc() in the only copy of the original pointer when realloc() fails, you've just leaked memory. That is:
void *ptr = malloc(1024);
...
if ((ptr = realloc(ptr, 2048)) == 0)
{
/* Oops - cannot free original memory allocation any more! */
}
Different implementations of realloc() will behave differently. The only safe thing to assume is that the data will always be moved - that you will always get a new address when you realloc() memory.
As someone else pointed out, if you are concerned about this, maybe it is time to look at your algorithms.
Would storing your string backwards help?
Otherwise...
just malloc() more space than you need, and when you run out of room, copy to a new buffer. A simple technique is to double the space each time; this works pretty well because the larger the string (i.e. the more time copying to a new buffer will takes) the less often it needs to occur.
Using this method you can also right-justify your string in the buffer, so it's easy to add characters to the start.
If obstacks are a good match for your memory allocation needs, you can use their fast growing functionality. Obstacks are a feature of glibc, but they are also available in the libiberty library, which is fairly portable.
No - and if you think about it, it can't work. Between you checking what it's going to do and actually doing it, another process could allocate memory.
In a multi-threaded application this can't work. Between you checking what it's going to do and actually doing it, another thread could allocate memory.
If you're worried about this sort of thing, it might be time to look at the data structures you're using to see if you can fix the problem there. Depending on how these strings are constructed, you can do so quite efficiently with a well designed buffer.
Why not keep some empty buffer space in the left of the string, like so:
char* buf = malloc(1024);
char* start = buf + 1024 - 3;
start[0]='t';
start[1]='o';
start[2]='\0';
To add "on" to the beginning of your string to make it "onto\0":
start-=2;
if(start < buf)
DO_MEMORY_STUFF(start, buf);//time to reallocate!
start[0]='o';
start[1]='n';
This way, you won't have to keep copying your buffer every single time you want to do an insertion at the beginning.
If you have to do insertions at both the beginning and end, just have some space allocated at both ends; insertions in the middle will still need you to shuffle elements around, obviously.
A better approach is to use a linked list. Have each of your data objects allocated on a page, and allocate another page and have a link to it, either from the previous page or from an index page. This way you know when the next alloc fails, and you never need to copy memory.
I don't think it's possible in cross platform way.
Here is the code for ulibc implementation that might give you a clue how to do itin platform dependent way, actually it's better to find glibc source but this one was on top of google search :)

Resources