I have a program that reads chars into a dynamic string buffer. We do not know the size of the string, and it is a requirement that we do not simply set a fixed-size, "large-enough" buffer.
The relevant function works like this:
char* read_field(FILE* data)
{
int size = 8;
char *field = malloc(size);
if (field == NULL)
exit(1);
char *tmp = NULL;
int idx = 0;
int ch = EOF;
while (ch) {
ch = fgetc(data);
// Double size if full
if (size <= idx) {
size *= 2;
tmp = realloc(field, size);
if (!tmp)
exit(1);
field = tmp;
}
field[idx++] = ch;
// Relevant termination in my use case
if (ch == ';' || ch == '\n')
ch = 0;
}
printf("field: %s\n"); // value correct, but sometimes valgrind error
return field; // field is free'd by the caller
}
Now the program seems to work, but when running it through Valgrind I get the errors Uninitialised value was created by a heap allocation and Conditional jump or move depends on uninitialised value(s). These error appears arbitrarily (sometimes) when I call functions like printf or strlen, as seen in the code above.
This problem is sorted if I use calloc instead of malloc / realloc, but then the reallocation process becomes messier.
Is the Valgrind error something that could be ignored if the program works fine? What are the implications of not initializing the memory to zero? If this can't be ignored, what's the best design to sort it out?
You should put a string terminator at the end of the string.
PS:
If you want to clear some memory use memset, it's faster than a for cycle
use calloc , its much better than malloc and memset.
Example
char *string = calloc( 100 , sizeof(char*));
// Calloc automatically fills the memory blocks
// Its much faster than malloc and memset
// In addition , only in C you don't need typecast for memory allocators
Related
I created a function designed to get user input. It requires that memory be allocated to the variable holding the user input; however, that variable is returned at the end of the function. What is the proper method to free the allocated memory/return the value of the variable?
Here is the code:
char *input = malloc(MAX_SIZE*sizeof(char*));
int i = 0;
char c;
while((c = getchar()) != '\n' && c != EOF) {
input[i++] = c;
}
return input;
Should I return the address of input and free it after it is used?
Curious as to the most proper method to free the input variable.
It's quite simple, as long as you pass to free() the same pointer returned by malloc() it's fine.
For example
char *readInput(size_t size)
{
char *input;
int chr;
input = malloc(size + 1);
if (input == NULL)
return NULL;
while ((i < size) && ((chr = getchar()) != '\n') && (chr != EOF))
input[i++] = chr;
input[size] = '\0'; /* nul terminate the array, so it can be a string */
return input;
}
int main(void)
{
char *input;
input = readInput(100);
if (input == NULL)
return -1;
printf("input: %s\n", input);
/* now you can free it */
free(input);
return 0;
}
What you should never do is something like
free(input + n);
because input + n is not the pointer return by malloc().
But your code, has other issues you should take care of
You are allocating space for MAX_SIZE chars so you should multiply by sizeof(char) which is 1, instead of sizeof(char *) which would allocate MAX_SIZE pointers, and also you could make MAX_SIZE a function parameter instead, because if you are allocating a fixed buffer, you could define an array in main() with size MAX_SIZE like char input[MAX_SIZE], and pass it to readInput() as a parameter, thus avoiding malloc() and free().
You are allocating that much space but you don't prevent overflow in your while loop, you should verify that i < MAX_SIZE.
You could write a function with return type char*, return input, and ask the user to call free once their done with the data.
You could also ask the user to pass in a properly sized buffer themselves, together with a buffer size limit, and return how many characters were written to the buffer.
This is a classic c case. A function mallocs memory for its result, the caller must free the returned value. You are now walking onto the thin ice of c memory leaks. 2 reasons
First ; there is no way for you to communicate the free requirement in an enforceable way (ie the compiler or runtime can't help you - contrast with specifying what the argument types are ). You just have to document it somewhere and hope that the caller has read your docs
Second: even if the caller knows to free the result he might make a mistake, some error path gets taken that doesnt free the memory. This doesnt cause an immediate error, things seem to work, but after running for 3 weeks your app crashes after running out of memory
This is why so many 'modern' languages focus on this topic, c++ smart pointers, Java, C#, etc garbage collection,...
I wrote a function to read a string with fgets that uses realloc() to make the buffer grow when needed:
char * read_string(char * message){
printf("%s", message);
size_t buffsize = MIN_BUFFER;
char *buffer = malloc(buffsize);
if (buffer == NULL) return NULL;
char *p;
for(p = buffer ; (*p = getchar()) != '\n' && *p != EOF ; ++p)
if (p - buffer == buffsize - 1) {
buffer = realloc(buffer, buffsize *= 2) ;
if (buffer == NULL) return NULL;
}
*p = 0;
p = malloc(p - buffer + 1);
if (p == NULL) return NULL;
strcpy(p, buffer);
free(buffer);
return p;
}
I compiled the program and tried it, and it worked like expected. But when I run it with valgrind, the function returns NULL when the read string is >= MIN_BUFFER and valgrind says:
(...)
==18076== Invalid write of size 1
==18076== at 0x8048895: read_string (programme.c:73)
==18076== by 0x804898E: main (programme.c:96)
==18076== Address 0x41fc02f is 0 bytes after a block of size 7 free'd
==18076== at 0x402BC70: realloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==18076== by 0x8048860: read_string (programme.c:76)
(...)
==18076== Warning: silly arg (-48) to malloc()
(...)
I added a printf statement between *p=0; and p=malloc... and it confirmed that the arg passed had a value of -48.
I didn't know that programs don't run the same way when launched alone and with valgrind. Is there something wrong in my code or is it just a valgrind bug?
When you realloc the buffer, your pointer 'p' still points at the old buffer.
That will stomp memory, and also cause future allocations to use bogus values.
realloc returns a pointer to a new buffer of the requested size with the same contents as the pointer passed in, assuming that the pointer passed in was previously returned by malloc or realloc. It does not guarantee that it's the same pointer. Valgrind very likely modifies the behavior of realloc, but keeps it within the specification.
Since you are resizing memory in a loop, you would be better served by tracking your position in buffer as an offset from the beginning of buffer rather than a pointer.
As man 3 realloc says
...The function may move the memory block to a new location.
What this means, is that
p = malloc(p - buffer + 1);
is the problem. If realloc() was called, buffer might be pointing to a new block of memory and expression
(p - buffer)
does not make any sense.
I am writing a program where the input will be taken from stdin. The first input will be an integer which says the number of strings to be read from stdin.
I just read the string character-by-character into a dynamically allocated memory and displays it once the string ends.
But when the string is larger than allocated size, I am reallocating the memory using realloc. But even if I use memcpy, the program works. Is it undefined behavior to not use memcpy? But the example Using Realloc in C does not use memcpy. So which one is the correct way to do it? And is my program shown below correct?
/* ss.c
* Gets number of input strings to be read from the stdin and displays them.
* Realloc dynamically allocated memory to get strings from stdin depending on
* the string length.
*/
#include <stdio.h>
#include <stdlib.h>
int display_mem_alloc_error();
enum {
CHUNK_SIZE = 31,
};
int display_mem_alloc_error() {
fprintf(stderr, "\nError allocating memory");
exit(1);
}
int main(int argc, char **argv) {
int numStr; //number of input strings
int curSize = CHUNK_SIZE; //currently allocated chunk size
int i = 0; //counter
int len = 0; //length of the current string
int c; //will contain a character
char *str = NULL; //will contain the input string
char *str_cp = NULL; //will point to str
char *str_tmp = NULL; //used for realloc
str = malloc(sizeof(*str) * CHUNK_SIZE);
if (str == NULL) {
display_mem_alloc_error();
}
str_cp = str; //store the reference to the allocated memory
scanf("%d\n", &numStr); //get the number of input strings
while (i != numStr) {
if (i >= 1) { //reset
str = str_cp;
len = 0;
}
c = getchar();
while (c != '\n' && c != '\r') {
*str = (char *) c;
printf("\nlen: %d -> *str: %c", len, *str);
str = str + 1;
len = len + 1;
*str = '\0';
c = getchar();
if (curSize/len == 1) {
curSize = curSize + CHUNK_SIZE;
str_tmp = realloc(str_cp, sizeof(*str_cp) * curSize);
if (str_tmp == NULL) {
display_mem_alloc_error();
}
memcpy(str_tmp, str_cp, curSize); // NB: seems to work without memcpy
printf("\nstr_tmp: %d", str_tmp);
printf("\nstr: %d", str);
printf("\nstr_cp: %d\n", str_cp);
}
}
i = i + 1;
printf("\nEntered string: %s\n", str_cp);
}
return 0;
}
/* -----------------
//input-output
gcc -o ss ss.c
./ss < in.txt
// in.txt
1
abcdefghijklmnopqrstuvwxyzabcdefghij
// output
// [..snip..]
Entered string:
abcdefghijklmnopqrstuvwxyzabcdefghij
-------------------- */
Thanks.
Your program is not quite correct. You need to remove the call to memcpy to avoid an occasional, hard to diagnose bug.
From the realloc man page
The realloc() function changes the size of the memory block pointed to
by ptr to size bytes. The contents will be unchanged in the range
from the start of the region up to the minimum of the old and new
sizes
So, you don't need to call memcpy after realloc. In fact, doing so is wrong because your previous heap cell may have been freed inside the realloc call. If it was freed, it now points to memory with unpredictable content.
C11 standard (PDF), section 7.22.3.4 paragraph 2:
The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.
So in short, the memcpy is unnecessary and indeed wrong. Wrong for two reasons:
If realloc has freed your previous memory, then you are accessing memory that is not yours.
If realloc has just enlarged your previous memory, you are giving memcpy two pointers that point to the same area. memcpy has a restrict qualifier on both its input pointers which means it is undefined behavior if they point to the same object. (Side note: memmove doesn't have this restriction)
Realloc enlarge the memory size where reserved for your string. If it is possible to enlarge it without moving the datas, those will stay in place. If it cannot, it malloc a lager memory plage, and memcpy itself the data contained in the previous memory plage.
In short, it is normal that you dont have to call memcpy after realloc.
From the man page:
The realloc() function tries to change the size of the allocation pointed
to by ptr to size, and returns ptr. If there is not enough room to
enlarge the memory allocation pointed to by ptr, realloc() creates a new
allocation, copies as much of the old data pointed to by ptr as will fit
to the new allocation, frees the old allocation, and returns a pointer to
the allocated memory. If ptr is NULL, realloc() is identical to a call
to malloc() for size bytes. If size is zero and ptr is not NULL, a new,
minimum sized object is allocated and the original object is freed. When
extending a region allocated with calloc(3), realloc(3) does not guaran-
tee that the additional memory is also zero-filled.
I saw a function on this site a while ago, that I took and adapted a bit for my use.
It's a function that uses getc and stdin to retrieve a string and allocate precisely as much memory as it needs to contain the string. It then just returns a pointer to the allocated memory which is filled with said string.
My question is are there any downsides (besides having to manually free the allocated memory later) to this function? What would you do to improve it?
char *getstr(void)
{
char *str = NULL, *tmp = NULL;
int ch = -1, sz = 0, pt = 0;
while(ch)
{
ch = getc(stdin);
if (ch == EOF || ch == 0x0A || ch == 0x0D) ch = 0;
if (sz <= pt)
{
sz++;
tmp = realloc(str, sz * sizeof(char));
if(!tmp) return NULL;
str = tmp;
}
str[pt++] = ch;
}
return str;
}
After using your suggestions here is my updated code, I decided to just use 256 bytes for the buffer since this function is being used for user input.
char *getstr(void)
{
char *str, *tmp = NULL;
int ch = -1, bff = 256, pt = 0;
str = malloc(bff);
if(!str)
{
printf(\nError! Memory allocation failed!");
return 0x00;
}
while(ch)
{
ch = getc(stdin);
if (ch == EOF || ch == '\n' || ch == '\r') ch = 0;
if (bff <= pt)
{
bff += 256;
tmp = realloc(str, bff);
if(!tmp)
{
free(str);
printf("\nError! Memory allocation failed!");
return 0x00;
}
str = tmp;
}
str[pt++] = ch;
}
tmp = realloc(str, pt);
if(!tmp)
{
free(str);
printf("\nError! Memory allocation failed!");
return 0x00;
}
str = tmp;
return str;
}
It's excessively frugal IMO, and makes the mistake of sacrificing performance in order to save infinitesmal amounts of memory, which is pointless in most settings, I think. Allocation calls like realloc are potentially laborous for the system, and here it is done for every byte.
It would be better to just have a local buffer, say 4KB, to read into, then allocate the return string based on the length of what is actually read into that. Keep in mind that the stack* on a normal system is 4-8MB anyway, whether you use it all or not. If the string read turns out to be longer than 4KB, you could write a similar loop that allocates and copies into the return string. So a similar idea, but heap allocation would occur every 4096 bytes rather than every byte, so, eg, you have the initial buffer of 4096, when that is exhausted you malloc 4096 for the return string and copy in, continue reading into the buffer (from the beginning), and if another 1000 bytes is read you realloc to 5097 and return that.
I think it is a common mistake of beginners to get obsessed with minimizing heap allocation by approaching it byte by byte. Even KB by KB is a little small; the system allocates in pages (4 KB) and you might as well align yourself with that.
*the memory provided for local storage inside a function.
Yes, the main problem is that realloc is pretty slow and calling it repeatedly for each character is generally a bad idea.
Try allocating a fixed amount of memory to start with, say N=100 characters and when you need more, get something like 2*N, then 4*N and so on. You'll overspend only up to twice the memory but save a lot in running time.
It depends on '\n'=='0xa' and '\r' =='\0d' for no good reason. If you mean \r and \n, use them.
It may be unreasonably slow, reallocating for every character you read.
sizeof(char) is guaranteed to be 1, so it's pointless.
If you've allocated a block of memory, then realloc fails, you're returning NULL without returning or freeing str, thus leaking the memory.
The interface provides no way to indicate partial failure, as in #4. All you can do is return a string or not. Given an immense input string, you have no way to indicate that you've read part but not all of it.
Here are the first few observations, other answers include some more:
It's growing the buffer by 1 byte at a time, thus doing needlessly many realloc() calls.
If realloc() fails, the previous buffer is lost.
It's not getline(), although it's more portable of course.
It's also not very portable to hardcode ASCII values for line feed and carriage return, use '\n' and '\r' instead.
I'm trying to read a line from a file character by character and place the characters in a string; here' my code:
char *str = "";
size_t len = 1; /* I also count the terminating character */
char temp;
while ((temp = getc(file)) != EOF)
{
str = realloc(str, ++len * sizeof(char));
str[len-2] = temp;
str[len-1] = '\0';
}
The program crashes on the realloc line. If I move that line outside of the loop or comment it out, it doesn't crash. If I'm just reading the characters and then sending them to stdout, it all works fine (ie. the file is opened correctly). Where's the problem?
You can't realloc a pointer that wasn't generated with malloc in the first place.
You also have an off-by-one error that will give you some trouble.
Change your code to:
char *str = NULL; // realloc can be called with NULL
size_t len = 1; /* I also count the terminating character */
char temp;
while ((temp = getc(file)) != EOF)
{
str = (char *)realloc(str, ++len * sizeof(char));
str[len-2] = temp;
str[len-1] = '\0';
}
Your issue is because you were calling realloc with a pointer to memory that was not allocated with either malloc or realloc which is not allowed.
From the realloc manpage:
realloc() changes the size of the memory block pointed to by ptr to size bytes.
The contents will be unchanged to the minimum of the old and new
sizes; newly allocated memory will be uninitialized. If ptr is NULL,
then the call is equivalent to malloc(size), for all values of size;
if size is equal to zero, and ptr is not NULL, then the call is
equivalent to free(ptr). Unless ptr is NULL, it must have been
returned by an earlier call to malloc(), calloc() or realloc(). If
the area pointed to was moved, a free(ptr) is done.
On a side note, you should really not grow the buffer one character at a time, but keep two counter, one for the buffer capacity, and one for the number of character used and only increase the buffer when it is full. Otherwise, your algorithm will have really poor performance.
You can't realloc a string literal. Also, reallocing every new char isn't a very efficient way of doing this. Look into getline, a gnu extension.