C using malloc and realloc to dynamically increase string length - c

Currently learning memory management in C, and I am currently running into issues increasing string length as a loop iterates.
The method I am trying to figure out logically works like this:
// return string with "X" removed
char * notX(char * string){
result = "";
if(for int = 0; i < strlen(string); i++){
if (string[i] != 'X') {
result += string[i];
}
}
return result;
}
Simple enough to do in other languages, but managing the memory in C makes it a bit challenging. Difficulties I run into is when I use malloc and realloc to initialize and change size of my string. In my code I currently tried:
char * notX(char * string){
char* res = malloc(sizeof(char*)); // allocate memory for string of size 1;
res = ""; // attempted to initialize the string. Fairly certain this is incorrect
char tmp[2]; // temporary string to hold value to be concatenated
if(for int = 0; i < strlen(string); i++){
if (string[i] != 'X') {
res = realloc(res, sizeof(res) + sizeof(char*)); // reallocate res and increasing its size by 1 character
tmp[0] = string[i];
tmp[1] = '\0';
strcat(res, tmp);
}
}
return result;
}
Note, I have found success in initializing result to be some large array like:
char res[100];
However, I would like to learn how to address this issue with out initializing an array with a fixed size since that might potentially be wasted memory space, or not enough memory.

realloc needs the number of bytes to allocate. size is incremented for each character added to res. size + 2 is used to provide for the current character being added and the terminating zero.
Check the return of realloc. NULL means a failure. Using tmp allows the return of res if realloc fails.
char * notX(char * string){
char* res = NULL;//so realloc will work on first call
char* tmp = NULL;//temp pointer during realloc
size_t size = 0;
size_t index = 0;
while ( string[index]) {//not the terminating zero
if ( string[index] != 'X') {
if ( NULL == ( tmp = realloc(res, size + 2))) {//+ 2 for character and zero
fprintf ( stderr, "realloc problem\n");
if ( res) {//not NULL
res[size] = 0;//terminate
}
return res;
}
res = tmp;//assign realloc pointer back to res
res[size] = string[index];
++size;
}
++index;//next character
}
if ( res) {//not NULL
res[size] = 0;//terminate
}
return res;
}

2 main errors in this code:
the malloc and the realloc function with parameter that call sizeof(char*). In this case the result of sizeof(char*) is the size of a pointer, not of a char, so you have to substitute the char* with char in the sizeof function.
res = ""; is incorrect. You primarly have a memory leak because you lose the pointer to the just allocated memory in malloc function, secondary but not less important, you have an undefined behavior when call realloc function over res initialized as an empty string ( or better a constant string), after the above initialization the memory is no longer dinamically managed. To substitute this initialization i think a memset to 0 is the best solution.

Related

My own substring function | valgrind showing some malloc errors I do not understand

Task:
Allocate (with malloc(3)) and return a substring from the string s. The substring begins at index start and is of maximum size len.
Return value: The substring. NULL if the allocation fails.
Hello, after a few hours I decided to ask for some clarifications. I have the following functions and some error from Valgrind I can't understand, that shows up even if everything is correct. (ft_strlen(s) I call from my own library, where also lib for malloc is put).
char *ft_substr(char const *s, unsigned int start, size_t len)
{
unsigned int x;
char *a;
unsigned int i;
i = 0;
if (s == NULL)
return (0);
if (start > ft_strlen(s))
{
if (!(a = (char *)malloc(0*sizeof(char))))
return (0);
return (a);
}
if ((start + len) < ft_strlen(s))
x = len;
else
x = ft_strlen(s) - start;
if (!(a = (char *)malloc((x + 1) * sizeof(char))))
return(0);
while (i < x)
{
a[i] = s[start + i];
i++;
}
a[i] = '\0';
return (a);
}
I left there one error on purpose. If I am suppose to return null if allocation fails, why below instead of 0 should be 1? Anyway it does not change the errors presented below.
if (!(a = (char *)malloc(0 * sizeof(char))))
ERRORS:
==4817== Invalid read of size 1
==4817== at 0x483FED4: strcmp (in /usr/lib/x86_64-linux-gnu/valgrind vgpreload_memcheck-amd64-linux.so)
==4817== by 0x4039BC: main (ft_substr_test.cpp:28)
==4817== Address 0x4dad0d0 is 0 bytes after a block of size 0 alloc'd
==4817== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4817== by 0x403B58: ft_substr (in /home/tony/42cursus/0lvl_libft_1week/libftTester/a.out)
==4817== by 0x4039A4: main (ft_substr_test.cpp:27)
==4817==
Your function has multiple problems:
the type of start, x and i should be size_t.
malloc(0) has implementation defined behavior. You should allocate at least 1 byte for the null terminator and set it before returning the pointer to the empty string or return NULL if the specification says you should.
the function should call ft_strlen() just once.
the special case for start > ft_strlen(s) can be handled in the general case if an empty string should be returned.
Here is a modified version:
char *ft_substr(char const *s, size_t start, size_t len) {
size_t i, slen;
char *a;
if (s == NULL) {
return NULL;
}
slen = ft_strlen(s);
if (start > slen) {
start = slen;
}
if (len > slen - start) {
len = slen - start;
}
if (!(a = malloc((len + 1) * sizeof(char)))) {
return NULL;
}
for (i = 0; i < len; i++) {
a[i] = s[start + i];
}
a[i] = '\0';
return a;
}
PS: you may need to reformat the code to fit the local 42 norminette...
On Linux systems, calling malloc(0) will not necessarily return a NULL pointer. It could return a pointer that your can't write to but can pass to free.
So when you return the result of malloc(0) from the function the calling function sees a non-null pointer and attempts to dereference it. Since this pointer essentially points to a buffer of size 0, attempting to read it reads past the end of the buffer, which is what valgrind is complaining about.
You can fix this by either returning NULL:
if (start > ft_strlen(s))
{
return NULL;
}
Or by allocating space for an empty string and setting the null byte:
if (start > ft_strlen(s))
{
if (!(a = malloc(1)))
return NULL;
*a = 0;
return a;
}
A few other notes:
sizeof(char) is defined to be 1, so you can leave it out of size calculations
Don't cast the return value of malloc.
Use NULL instead of 0 for null pointers
Parenthesis aren't required around the expression in a return statement.

Splitting a string and store to the heap algorithm question

For this code below that I was writing. I was wondering, if I want to split the string but still retain the original string is this the best method?
Should the caller provided the ** char or should the function "split" make an additional malloc call and memory manage the ** char?
Also, I was wondering if this is the most optimizing method, or could I optimize the code better than this?
I still have not debug the code yet, I am a bit undecided whether if the caller manage the ** char or the function manage the pointer ** char.
#include <stdio.h>
#include <stdlib.h>
size_t split(const char * restrict string, const char splitChar, char ** restrict parts, const size_t maxParts){
size_t size = 100;
size_t partSize = 0;
size_t len = 0;
size_t newPart = 1;
char * tempMem;
/*
* We just reverse a long page of memory
* At reaching the space character that is the boundary of the new
*/
char * mem = (char*) malloc( sizeof(char) * size );
if ( mem == NULL ) return 0;
for ( size_t i = 0; string[i] != 0; i++ ) {
// If it is a split char we at a new part
if ( string[i] == splitChar) {
// If the last character was not the split character
// Then mem[len] = 0 and increase the len by 1.
if (newPart == 0) mem[len++] = 0;
newPart = 1;
continue;
} else {
// If this is a new part
// and not a split character
// we make a new pointer
if ( newPart == 1 ){
// if reach maxpart we break.
// It is okay here, to not worry about memory
if ( partSize == maxParts ) break;
parts[partSize++] = &mem[len];
newPart = 0;
}
mem[len++] = string[i];
if ( len == size ){
// if ran out of memory realloc.
tempMem = (char*)realloc(mem, sizeof(char) * (size << 1) );
// if fail quit loop
if ( tempMem == NULL ) {
// If we can't get more memory the last part could be corrupted
// We have to return.
// Otherwise the code below can seg.
// There maybe a better way than this.
return partSize--;
}
size = size << 1;
mem = tempMem;
}
}
}
// If we got here and still in a newPart that is fine no need
// an additional character.
if ( newPart != 1 ) mem[len++] = 0;
// realloc to give back the unneed memory
if ( len < size ) {
tempMem = (char*) realloc(mem, sizeof(char) * len );
// If the resizing did not fail but yielded a different
// memory block;
if ( tempMem != NULL && tempMem != mem ){
for ( size_t i = 0; i < partSize; i++ ){
parts[i] = tempMem + (parts[i] - mem);
}
}
}
return partSize;
}
int main(){
char * tStr = "This is a super long string just to test the str str adfasfas something split";
char * parts[10];
size_t len = split(tStr, ' ', parts, 10);
for (size_t i = 0; i < len; i++ ){
printf("%zu: %s\n", i, parts[i]);
}
}
What is "best" is very subjective, as well as use case dependent.
I personally would keep the parameters as input only, define a struct to contain the split result, and probably return such by value. The struct would probably contain pointers to memory allocation, so would also create a helper function free that memory. The parts might be stored as list of strings (copy string data) or index&len pairs for the original string (no string copies needed, but original string needs to remain valid).
But there are dozens of very different ways to do this in C, and all a bit klunky. You need to choose your flavor of klunkiness based on your use case.
About being "more optimized": unless you are coding for a very small embedded device or something, always choose a more robust, clear, easier to use, harder to use wrong over more micro-optimized. The useful kind of optimization turns, for example, O(n^2) to O(n log n). Turning O(3n) to O(2n) of a single function is almost always completely irrelevant (you are not going to do string splitting in a game engine inner rendering loop...).

Buffer Overflow - Char Array not removed from stack after exiting function

I am trying to concatenate a few strings to a buffer. However, if I call the function repeatedly, the size of my buffer will keep growing.
void print_message(char *str) {
char message[8196];
sender *m = senderlist;
while(m) {
/* note: stricmp() is a case-insensitive version of strcmp() */
if(stricmp(m->sender,str)==0) {
strcat(message,m->sender);
strcat(message,", ");
}
m = m->next;
}
printf("strlen: %i",strlen(message));
printf("Message: %s\n",message);
return;
}
The size of message will continuously grow until the length will be 3799.
Example:
1st. call: strlen = 211
2nd call: strlen = 514
3rd call: strlen = 844
...
nth call: strlen = 3799
nth +1 call: strlen = 3799
nth +2 call: strlen = 3799
My understanding was, that statically allocated variables like char[] will automatically be freed upon exiting the function, and I'm not dynamically allocating anything on the heap.
And why will suddenly stop growing at 3799 bytes? Thanks for any pointers.
Add one more statement after the buffer definition
char message[8196];
message[0] = '\0';
Or initialize the buffer when it is defined
char message[8196] = { '\0' };
or
char message[8196] = "";
that is fully equivalent to the previous initialization.
The problem with your code is that the compiler does not initialize the buffer if you wiil not specify initialization explicitly. So array message contains some garbage but function strcat at first searches the terminating zero in the buffer that to append a new string. So your program has undefined behaviour.
What you are seeing is the growing of the senderlist or likely garbage in message. Fortunately not exceeding 8196.
The message array must start with the empty string. At the moment doing a strcat adds to garbage.
char message[8196];
sender *m = senderlist;
int len = 0;
*message = '\0';
while(m) {
/* note: stricmp() is a case-insensitive version of strcmp() */
if(stricmp(m->sender,str)==0) {
int sender_len = strlen(m->sender);
if (len + sender_len + 2 + 1 < sizeof(message)) {
strcpy(message + len, m->sender);
len += sender_len;
strcpy(message + len, ", ");
len += 2;
} else {
// Maybe appending "..." instead (+ 3 + 1 < ...).
break;
}
}
m = m->next;
}
printf("strlen: %i",strlen(message));
printf("Message: %s\n",message);
"Deallocation" is not the same as wiping the data; in fact, C generally leaves the data unerased for performance reasons.

How to free a returned malloced string?

I have the following two functions. Function get_string_data(line) mallocs a string and returns it. Later I use it like this:
char *get_string_data(char *) {
char *sec_tok, *result;
Split *split;
split = split_string(line, ' ');
sec_tok = split -> tail;
if (starts_with_char(sec_tok, '\"') && ends_with_char(sec_tok, '\"')) {
result = (char *) malloc(strlen(sec_tok) + 1);
strcpy(result, sec_tok);
free(split);
result++;
*(result + (strlen(result) - 1)) = '\0';
return result;
}
free(split);
return NULL;
}
void handle_string_instr(char *line) {
char* data = get_string_data(line);
...a few lines later, after I used the data...
free(data);
... end of the world happens here...
}
Now on attempt to free the string everything crashes (Program received signal SIGABRT, Aborted.). Why does this happen, and what is the correct way to free the memory?
Here is the problem code
result = (char *) malloc(strlen(sec_tok) + 1);
...
result++;
...
return result;
At this point the get_string_data method is no longer returning a pointer to the memory that was allocated. It is instead returning a pointer into the memory that was allocated. You can only pass pointers to memory that was allocated to free. In this case you don't and this is why it is crashing
Also a simpler way of null terminating the string would be the following
size_t length = strlen(sec_tok);
result = (char*)malloc(length + 1);
...
result[length] = '\0';
free(line) get_string_data possibly moves the pointer to some location in "line" which is not the correct pointer to return to free().

Return the contiguous block in c

I create an array (char *charheap;) of length 32 bytes in the heap, and initialize all the elements to be \0. Here is my main function:
int main(void) {
char *str1 = alloc_and_print(5, "hello");
char *str2 = alloc_and_print(5, "brian");
}
char *alloc_and_print(int s, const char *cpy) {
char *ncb = char_alloc(s);// allocate the next contiguous block
if (ret == NULL) {
printf("Failed\n");
} else {
strcpy(ncb, cpy);
arr_print();// print the array
}
return ncb;
}
Here is what I implement:
/char_alloc(s): find the FIRST contiguous block of s+1 NULL ('\0')
characters in charheap that does not contain the NULL terminator
of some previously allocated string./
char *char_alloc(int s) {
int len = strlen(charheap);
for (int i = 0; i < len; i++) {
if (charheap[0] == '\0') {
char a = charheap[0];
return &a;
} else if (charheap[i] == '\0') {
char b = charheap[i+1];
return &b;
}
}
return NULL;
}
Expected Output: (\ means \0)
hello\\\\\\\\\\\\\\\\\\\\\\\\\\\
hello\brian\\\\\\\\\\\\\\\\\\\\\
This solution is completely wrong and I just print out two failed. :(
Actually, the char_alloc should return a pointer to the start of contiguous block but I don't know how to implement it properly. Can someone give me a hint or clue ?
Your function is returning a pointer to a local variable, therefore the caller receives a pointer to invalid memory. Just return the pointer into the charheap, which is what you want.
return &charheap[0]; /* was return &a; which is wrong */
return &charheap[i+1]; /* was return &b; which is wrong */
Your for loop uses i < len for the terminating condition, but, since charheap is \0 filled, strlen() will return a size of 0. You want to iterate through the whole charheap, so just use the size of that array (32 in this case).
int len = 32; /* or sizeof(charheap) if it is declared as an array */
The above two fixes should be enough to get your program to behave as you expect (see demonstration).
However, you do not place a check to make sure there is enough room in your heap to accept the allocation check. Your allocation should fail if the distance between the start of the available memory and the end of the charheap is less than or equal to the desired size. You can enforce this easily enough by setting the len to be the last point you are willing to check before you know there will not be enough space.
int len = 32 - s;
Finally, when you try to allocate a third string, your loop will skip over the first allocated string, but will overwrite the second allocated string. Your loop logic needs to change to skip over each allocated string. You first check if the current location in your charheap is free or not. If it is not, you advance your position by the length of the string, plus one more to skip over the '\0' terminator for the string. If the current location is free, you return it. If you are not able to find a free location, you return NULL.
char *char_alloc(int s) {
int i = 0;
int len = 32 - s;
while (i < len) {
if (charheap[i] == '\0') return &charheap[i];
i += strlen(charheap+i) + 1;
}
return NULL;
}

Resources