Any Downsides to this Method of String Retrieval? - c

I saw a function on this site a while ago, that I took and adapted a bit for my use.
It's a function that uses getc and stdin to retrieve a string and allocate precisely as much memory as it needs to contain the string. It then just returns a pointer to the allocated memory which is filled with said string.
My question is are there any downsides (besides having to manually free the allocated memory later) to this function? What would you do to improve it?
char *getstr(void)
{
char *str = NULL, *tmp = NULL;
int ch = -1, sz = 0, pt = 0;
while(ch)
{
ch = getc(stdin);
if (ch == EOF || ch == 0x0A || ch == 0x0D) ch = 0;
if (sz <= pt)
{
sz++;
tmp = realloc(str, sz * sizeof(char));
if(!tmp) return NULL;
str = tmp;
}
str[pt++] = ch;
}
return str;
}
After using your suggestions here is my updated code, I decided to just use 256 bytes for the buffer since this function is being used for user input.
char *getstr(void)
{
char *str, *tmp = NULL;
int ch = -1, bff = 256, pt = 0;
str = malloc(bff);
if(!str)
{
printf(\nError! Memory allocation failed!");
return 0x00;
}
while(ch)
{
ch = getc(stdin);
if (ch == EOF || ch == '\n' || ch == '\r') ch = 0;
if (bff <= pt)
{
bff += 256;
tmp = realloc(str, bff);
if(!tmp)
{
free(str);
printf("\nError! Memory allocation failed!");
return 0x00;
}
str = tmp;
}
str[pt++] = ch;
}
tmp = realloc(str, pt);
if(!tmp)
{
free(str);
printf("\nError! Memory allocation failed!");
return 0x00;
}
str = tmp;
return str;
}

It's excessively frugal IMO, and makes the mistake of sacrificing performance in order to save infinitesmal amounts of memory, which is pointless in most settings, I think. Allocation calls like realloc are potentially laborous for the system, and here it is done for every byte.
It would be better to just have a local buffer, say 4KB, to read into, then allocate the return string based on the length of what is actually read into that. Keep in mind that the stack* on a normal system is 4-8MB anyway, whether you use it all or not. If the string read turns out to be longer than 4KB, you could write a similar loop that allocates and copies into the return string. So a similar idea, but heap allocation would occur every 4096 bytes rather than every byte, so, eg, you have the initial buffer of 4096, when that is exhausted you malloc 4096 for the return string and copy in, continue reading into the buffer (from the beginning), and if another 1000 bytes is read you realloc to 5097 and return that.
I think it is a common mistake of beginners to get obsessed with minimizing heap allocation by approaching it byte by byte. Even KB by KB is a little small; the system allocates in pages (4 KB) and you might as well align yourself with that.
*the memory provided for local storage inside a function.

Yes, the main problem is that realloc is pretty slow and calling it repeatedly for each character is generally a bad idea.
Try allocating a fixed amount of memory to start with, say N=100 characters and when you need more, get something like 2*N, then 4*N and so on. You'll overspend only up to twice the memory but save a lot in running time.

It depends on '\n'=='0xa' and '\r' =='\0d' for no good reason. If you mean \r and \n, use them.
It may be unreasonably slow, reallocating for every character you read.
sizeof(char) is guaranteed to be 1, so it's pointless.
If you've allocated a block of memory, then realloc fails, you're returning NULL without returning or freeing str, thus leaking the memory.
The interface provides no way to indicate partial failure, as in #4. All you can do is return a string or not. Given an immense input string, you have no way to indicate that you've read part but not all of it.

Here are the first few observations, other answers include some more:
It's growing the buffer by 1 byte at a time, thus doing needlessly many realloc() calls.
If realloc() fails, the previous buffer is lost.
It's not getline(), although it's more portable of course.
It's also not very portable to hardcode ASCII values for line feed and carriage return, use '\n' and '\r' instead.

Related

Do I have to initialize a char* after malloc?

I have a program that reads chars into a dynamic string buffer. We do not know the size of the string, and it is a requirement that we do not simply set a fixed-size, "large-enough" buffer.
The relevant function works like this:
char* read_field(FILE* data)
{
int size = 8;
char *field = malloc(size);
if (field == NULL)
exit(1);
char *tmp = NULL;
int idx = 0;
int ch = EOF;
while (ch) {
ch = fgetc(data);
// Double size if full
if (size <= idx) {
size *= 2;
tmp = realloc(field, size);
if (!tmp)
exit(1);
field = tmp;
}
field[idx++] = ch;
// Relevant termination in my use case
if (ch == ';' || ch == '\n')
ch = 0;
}
printf("field: %s\n"); // value correct, but sometimes valgrind error
return field; // field is free'd by the caller
}
Now the program seems to work, but when running it through Valgrind I get the errors Uninitialised value was created by a heap allocation and Conditional jump or move depends on uninitialised value(s). These error appears arbitrarily (sometimes) when I call functions like printf or strlen, as seen in the code above.
This problem is sorted if I use calloc instead of malloc / realloc, but then the reallocation process becomes messier.
Is the Valgrind error something that could be ignored if the program works fine? What are the implications of not initializing the memory to zero? If this can't be ignored, what's the best design to sort it out?
You should put a string terminator at the end of the string.
PS:
If you want to clear some memory use memset, it's faster than a for cycle
use calloc , its much better than malloc and memset.
Example
char *string = calloc( 100 , sizeof(char*));
// Calloc automatically fills the memory blocks
// Its much faster than malloc and memset
// In addition , only in C you don't need typecast for memory allocators

Proper Way to Free Memory of a Returned Variable

I created a function designed to get user input. It requires that memory be allocated to the variable holding the user input; however, that variable is returned at the end of the function. What is the proper method to free the allocated memory/return the value of the variable?
Here is the code:
char *input = malloc(MAX_SIZE*sizeof(char*));
int i = 0;
char c;
while((c = getchar()) != '\n' && c != EOF) {
input[i++] = c;
}
return input;
Should I return the address of input and free it after it is used?
Curious as to the most proper method to free the input variable.
It's quite simple, as long as you pass to free() the same pointer returned by malloc() it's fine.
For example
char *readInput(size_t size)
{
char *input;
int chr;
input = malloc(size + 1);
if (input == NULL)
return NULL;
while ((i < size) && ((chr = getchar()) != '\n') && (chr != EOF))
input[i++] = chr;
input[size] = '\0'; /* nul terminate the array, so it can be a string */
return input;
}
int main(void)
{
char *input;
input = readInput(100);
if (input == NULL)
return -1;
printf("input: %s\n", input);
/* now you can free it */
free(input);
return 0;
}
What you should never do is something like
free(input + n);
because input + n is not the pointer return by malloc().
But your code, has other issues you should take care of
You are allocating space for MAX_SIZE chars so you should multiply by sizeof(char) which is 1, instead of sizeof(char *) which would allocate MAX_SIZE pointers, and also you could make MAX_SIZE a function parameter instead, because if you are allocating a fixed buffer, you could define an array in main() with size MAX_SIZE like char input[MAX_SIZE], and pass it to readInput() as a parameter, thus avoiding malloc() and free().
You are allocating that much space but you don't prevent overflow in your while loop, you should verify that i < MAX_SIZE.
You could write a function with return type char*, return input, and ask the user to call free once their done with the data.
You could also ask the user to pass in a properly sized buffer themselves, together with a buffer size limit, and return how many characters were written to the buffer.
This is a classic c case. A function mallocs memory for its result, the caller must free the returned value. You are now walking onto the thin ice of c memory leaks. 2 reasons
First ; there is no way for you to communicate the free requirement in an enforceable way (ie the compiler or runtime can't help you - contrast with specifying what the argument types are ). You just have to document it somewhere and hope that the caller has read your docs
Second: even if the caller knows to free the result he might make a mistake, some error path gets taken that doesnt free the memory. This doesnt cause an immediate error, things seem to work, but after running for 3 weeks your app crashes after running out of memory
This is why so many 'modern' languages focus on this topic, c++ smart pointers, Java, C#, etc garbage collection,...

Can't free an allocated string memory after appending character

I am trying to append a character to a string... that works fine unfortunately I can't free the mem of the string afterwards which causes that the string gets longer and longer.... as it reads a file every linie will be added to the string which obviously shouldn't happen
char* append_char(char* string, char character)
{
int length = strlen(string);
string[length] = character;
string[length+1] = '\0';
return string;
}
I allocated mem for string like
char *read_string = (char *)malloc(sizeof(char)*500);
call the function append_char(read_string,buffer[0]); and free it after the whole string is build free(read_string);
I presume that once I call the append_char() , the mem allocation is going to be changed, which cause that I can't get hold of it.
Edited:
here is the function which uses the append_char()
char *read_log_file_row(char *result,int t)
{
filepath ="//home/,,,,,/mmm.txt";
int max = sizeof(char)*2;
char buffer[max];
char *return_fgets;
char *read_string = malloc(sizeof(char)*500);
file_pointer = fopen(filepath,"r");
if(file_pointer == NULL)
{
printf("Didn't work....");
return NULL;
}
int i = 0;
while(i<=t)
{
while(return_fgets = (fgets(buffer, max, file_pointer)))
{
if(buffer[0] == '\n')
{
++i;
break;
}
if(i==t)
{
append_char(read_string,buffer[0]);
}
}
if(return_fgets == NULL)
{
free(read_string);
return NULL;
/* return "\0";*/
}
if(buffer[0] != '\n')
append_char(read_string,buffer[0]);
}
fclose(file_pointer);
strcpy(result,read_string);
free(read_string);
return result;
}
Dont cast the return value of malloc() in C.
Make sure you initialize read_string to an empty string before you try to append to it, by setting read_string[0] = '\0';.
Make sure you track the current length, so you don't try to build a string that won't fit in the buffer. 500 chars allocated means max string length is 499 characters.
Not sure what you expect should happen when you do free(read_string). It sounds (from your comment to #Steve Jessop's answer) that you do something like this:
char *read_string = malloc(500);
read_string[0] = '\0'; /* Let's assume you do this. */
append_char(read_string, 'a'); /* Or whatever, many of these. */
free(read_string);
print("%c\n", *read_string); /* This invokes UNDEFINED BEHAVIOR. */
This might print an a, but that proves nothing since by doing this (accessing memory that has been free():d) your program is invoking undefined behavior, which means that anything could happen. You cannot draw conclusions from this, since the "test" is not valid. You can't free memory and then access it. If you do it, and get some "reasonable"/"correct" result, you still cannot say that the free():ing "didn't work".
No, the memory allocation is not changed in any way by append_char. All it does is change the contents of the allocation -- by moving the nul terminator one byte along, you now care about the contents of one more of your 500 bytes than you did before.
If the string gets longer than 500 bytes (including terminator), then you have undefined behavior. If you call strlen on something that isn't a nul-terminated string, for example if you pass it a pointer to uninitialized memory straight from malloc, then you have undefined behavior.
Undefined behavior is bad[*]: feel free to read up on it, but "X has undefined behavior" is in effect a way of saying "you must not do X".
[*] To be precise: it's not guaranteed not to be bad...
Have you ever initialized the string? Try *read_string=0 after allocating it. Or use calloc. Also, have your string grown beyond the allocated memory?

Allocating an array of an unknown size

Context: I'm trying to do is to make a program which would take text as input and store it in a character array. Then I would print each element of the array as a decimal. E.g. "Hello World" would be converted to 72, 101, etc.. I would use this as a quick ASCII2DEC converter. I know there are online converters but I'm trying to make this one on my own.
Problem: how can I allocate an array whose size is unknown at compile-time and make it the exact same size as the text I enter? So when I enter "Hello World" it would dynamically make an array with the exact size required to store just "Hello World". I have searched the web but couldn't find anything that I could make use of.
I see that you're using C. You could do something like this:
#define INC_SIZE 10
char *buf = (char*) malloc(INC_SIZE),*temp;
int size = INC_SIZE,len = 0;
char c;
while ((c = getchar()) != '\n') { // I assume you want to read a line of input
if (len == size) {
size += INC_SIZE;
temp = (char*) realloc(buf,size);
if (temp == NULL) {
// not enough memory probably, handle it yourself
}
buf = temp;
}
buf[len++] = c;
}
// done, note that the character array has no '\0' terminator and the length is represented by `len` variable
Typically, on environments like a PC where there are no great memory constraints, I would just dynamically allocate, (language-dependent) an array/string/whatever of, say, 64K and keep an index/pointer/whatever to the current end point plus one - ie. the next index/location to place any new data.
if you use cpp language, you can use the string to store the input characters,and access the character by operator[] , like the following codes:
std::string input;
cin >> input;
I'm going to guess you mean C, as that's one of the commonest compiled languages where you would have this problem.
Variables that you declare in a function are stored on the stack. This is nice and efficient, gets cleaned up when your function exits, etc. The only problem is that the size of the stack slot for each function is fixed and cannot change while the function is running.
The second place you can allocate memory is the heap. This is a free-for-all that you can allocate and deallocate memory from at runtime. You allocate with malloc(), and when finished, you call free() on it (this is important to avoid memory leaks).
With heap allocations you must know the size at allocation time, but it's better than having it stored in fixed stack space that you cannot grow if needed.
This is a simple and stupid function to decode a string to its ASCII codes using a dynamically-allocated buffer:
char* str_to_ascii_codes(char* str)
{
size_t i;
size_t str_length = strlen(str);
char* ascii_codes = malloc(str_length*4+1);
for(i = 0; i<str_length; i++)
snprintf(ascii_codes+i*4, 5, "%03d ", str[i]);
return ascii_codes;
}
Edit: You mentioned in a comment wanting to get the buffer just right. I cut corners with the above example by making each entry in the string a known length, and not trimming the result's extra space character. This is a smarter version that fixes both of those issues:
char* str_to_ascii_codes(char* str)
{
size_t i;
int written;
size_t str_length = strlen(str), ascii_codes_length = 0;
char* ascii_codes = malloc(str_length*4+1);
for(i = 0; i<str_length; i++)
{
snprintf(ascii_codes+ascii_codes_length, 5, "%d %n", str[i], &written);
ascii_codes_length = ascii_codes_length + written;
}
/* This is intentionally one byte short, to trim the trailing space char */
ascii_codes = realloc(ascii_codes, ascii_codes_length);
/* Add new end-of-string marker */
ascii_codes[ascii_codes_length-1] = '\0';
return ascii_codes;
}

Reading line from file causes crash

I'm trying to read a line from a file character by character and place the characters in a string; here' my code:
char *str = "";
size_t len = 1; /* I also count the terminating character */
char temp;
while ((temp = getc(file)) != EOF)
{
str = realloc(str, ++len * sizeof(char));
str[len-2] = temp;
str[len-1] = '\0';
}
The program crashes on the realloc line. If I move that line outside of the loop or comment it out, it doesn't crash. If I'm just reading the characters and then sending them to stdout, it all works fine (ie. the file is opened correctly). Where's the problem?
You can't realloc a pointer that wasn't generated with malloc in the first place.
You also have an off-by-one error that will give you some trouble.
Change your code to:
char *str = NULL; // realloc can be called with NULL
size_t len = 1; /* I also count the terminating character */
char temp;
while ((temp = getc(file)) != EOF)
{
str = (char *)realloc(str, ++len * sizeof(char));
str[len-2] = temp;
str[len-1] = '\0';
}
Your issue is because you were calling realloc with a pointer to memory that was not allocated with either malloc or realloc which is not allowed.
From the realloc manpage:
realloc() changes the size of the memory block pointed to by ptr to size bytes.
The contents will be unchanged to the minimum of the old and new
sizes; newly allocated memory will be uninitialized. If ptr is NULL,
then the call is equivalent to malloc(size), for all values of size;
if size is equal to zero, and ptr is not NULL, then the call is
equivalent to free(ptr). Unless ptr is NULL, it must have been
returned by an earlier call to malloc(), calloc() or realloc(). If
the area pointed to was moved, a free(ptr) is done.
On a side note, you should really not grow the buffer one character at a time, but keep two counter, one for the buffer capacity, and one for the number of character used and only increase the buffer when it is full. Otherwise, your algorithm will have really poor performance.
You can't realloc a string literal. Also, reallocing every new char isn't a very efficient way of doing this. Look into getline, a gnu extension.

Resources