Just finished putting this function together from some man documentation, it takes a char* and appends a const char* to it, if the size of the char* is too small it reallocates it to something a little bigger and finally appends it. Its been a long time since I used c, so just checking in.
// append with realloc
int append(char *orig_str, const char *append_str) {
int result = 0; // fail by default
// is there enough space to append our data?
int req_space = strlen(orig_str) + strlen(append_str);
if (req_space > strlen(orig_str)) {
// just reallocate enough + 4096
int new_size = req_space;
char *new_str = realloc(orig_str, req_space * sizeof(char));
// resize success..
if(new_str != NULL) {
orig_str = new_str;
result = 1; // success
} else {
// the resize failed..
fprintf(stderr, "Couldn't reallocate memory\n");
}
} else {
result = 1;
}
// finally, append the data
if (result) {
strncat(orig_str, append_str, strlen(append_str));
}
// return 0 if Ok
return result;
}
This is not usable because you never tell the caller where the memory is that you got back from realloc.
You will need to either return a pointer, or pass orig_str by reference.
Also (as pointed out in comments) you need to do realloc(orig_str, req_space + 1); to allow space for the null terminator.
Your code has a some inefficient logic , compare with this fixed version:
bool append(char **p_orig_str, const char *append_str)
{
// no action required if appending an empty string
if ( append_str[0] == 0 )
return true;
size_t orig_len = strlen(*p_orig_str);
size_t req_space = orig_len + strlen(append_str) + 1;
char *new_str = realloc(*p_orig_str, req_space);
// resize success..
if(new_str == NULL)
{
fprintf(stderr, "Couldn't reallocate memory\n");
return false;
}
*p_orig_str = new_str;
strcpy(new_str + orig_len, append_str);
return true;
}
This logic doesn't make any sense:
// is there enough space to append our data?
int req_space = strlen(orig_str) + strlen(append_str);
if (req_space > strlen(orig_str)) {
As long as append_str has non-zero length, you're always going to have to re-allocate.
The main problem is that you're trying to track the size of your buffers with strlen. If your string is NUL-terminated (as it should be), your perceived buffer size is always going to be the exact length of the data in it, ignoring any extra.
If you want to work with buffers like this, you need to track the size in a separate size_t, or keep some sort of descriptor like this:
struct buffer {
void *buf;
size_t alloc_size;
size_t used_amt; /* Omit if strings are NUL-terminated */
}
Related
The following is a quote from an exam (1% top university).
I failed, because my answer differed from the "approved" answer.
I have a hunch that his (professor, known expert in C) answer is not correct.
The following, is the question followed by the "approved" answer.
There is a potential bug in the following function. What is it and how would I fix it?
Hint: this has something to do with the use of the realloc() function. Please identify the line numbers you would change and what you would replace them with.
BOOLEAN lengthen_string(char* string, const char newcontents[])
{
int newlen = strlen(string) + strlen(newcontents) + 1;
string = realloc(string, newlen);
if (!string) {
perror("malloc");
return FALSE;
}
strcat(string, newcontents);
return TRUE;
}
The "correct" answer provided by the professor was:
line 4: realloc returns a NULL pointer when it fails to allocate. This means that on failure the original data is lost.
To fix this, assign the result of realloc to a temporary variable and test that first.
Ie: line 4:
char * temp=realloc(string, newlen);
if(!temp) ... (all remains the same)
after old line 9, string = temp;
Any thoughts?
BTW, my answer was that #string is a local variable, and the prototype of the function should be char **string, where the caller passes a pointer to its string pointer, and then the callee would assign any realloc() return value to *string
Any thoughts?
You are both correct.
The professor is correct, in that realloc() does not alter the passed-in memory on failure, thus leaving the input string pointer intact, but if the NULL return value on failure is assigned immediately to string then the original data is lost and leaked. So a check for failure first is needed before assigning the new pointer value to string.
You are correct, in that string needs to be passed by pointer so it can be re-assigned a new value if realloc() returns a different memory address.
The correct solution would look more like this:
BOOLEAN lengthen_string(char** string, const char newcontents[])
{
if (!string)
{
errno = EINVAL;
perror("bad input");
return FALSE;
}
size_t newsize = strlen(*string) + strlen(newcontents) + 1;
char *temp = realloc(*string, newsize);
if (!temp)
{
perror("realloc failed");
return FALSE;
}
strcat(temp, newcontents);
*string = temp;
return TRUE;
}
Alternatively, there is some room for optimization, eg:
BOOLEAN lengthen_string(char** string, const char newcontents[])
{
if (!string)
{
errno = EINVAL;
perror("bad input");
return FALSE;
}
char *temp;
if (!*string)
{
temp = strdup(newcontents);
if (!temp)
{
perror("strdup failed");
return FALSE;
}
}
else
{
size_t offset = strlen(*string);
size_t size = strlen(newcontents) + 1;
temp = realloc(*string, offset + size);
if (!temp)
{
perror("realloc failed");
return FALSE;
}
memcpy(temp + offset, newcontents, size);
}
*string = temp;
return TRUE;
}
For this code below that I was writing. I was wondering, if I want to split the string but still retain the original string is this the best method?
Should the caller provided the ** char or should the function "split" make an additional malloc call and memory manage the ** char?
Also, I was wondering if this is the most optimizing method, or could I optimize the code better than this?
I still have not debug the code yet, I am a bit undecided whether if the caller manage the ** char or the function manage the pointer ** char.
#include <stdio.h>
#include <stdlib.h>
size_t split(const char * restrict string, const char splitChar, char ** restrict parts, const size_t maxParts){
size_t size = 100;
size_t partSize = 0;
size_t len = 0;
size_t newPart = 1;
char * tempMem;
/*
* We just reverse a long page of memory
* At reaching the space character that is the boundary of the new
*/
char * mem = (char*) malloc( sizeof(char) * size );
if ( mem == NULL ) return 0;
for ( size_t i = 0; string[i] != 0; i++ ) {
// If it is a split char we at a new part
if ( string[i] == splitChar) {
// If the last character was not the split character
// Then mem[len] = 0 and increase the len by 1.
if (newPart == 0) mem[len++] = 0;
newPart = 1;
continue;
} else {
// If this is a new part
// and not a split character
// we make a new pointer
if ( newPart == 1 ){
// if reach maxpart we break.
// It is okay here, to not worry about memory
if ( partSize == maxParts ) break;
parts[partSize++] = &mem[len];
newPart = 0;
}
mem[len++] = string[i];
if ( len == size ){
// if ran out of memory realloc.
tempMem = (char*)realloc(mem, sizeof(char) * (size << 1) );
// if fail quit loop
if ( tempMem == NULL ) {
// If we can't get more memory the last part could be corrupted
// We have to return.
// Otherwise the code below can seg.
// There maybe a better way than this.
return partSize--;
}
size = size << 1;
mem = tempMem;
}
}
}
// If we got here and still in a newPart that is fine no need
// an additional character.
if ( newPart != 1 ) mem[len++] = 0;
// realloc to give back the unneed memory
if ( len < size ) {
tempMem = (char*) realloc(mem, sizeof(char) * len );
// If the resizing did not fail but yielded a different
// memory block;
if ( tempMem != NULL && tempMem != mem ){
for ( size_t i = 0; i < partSize; i++ ){
parts[i] = tempMem + (parts[i] - mem);
}
}
}
return partSize;
}
int main(){
char * tStr = "This is a super long string just to test the str str adfasfas something split";
char * parts[10];
size_t len = split(tStr, ' ', parts, 10);
for (size_t i = 0; i < len; i++ ){
printf("%zu: %s\n", i, parts[i]);
}
}
What is "best" is very subjective, as well as use case dependent.
I personally would keep the parameters as input only, define a struct to contain the split result, and probably return such by value. The struct would probably contain pointers to memory allocation, so would also create a helper function free that memory. The parts might be stored as list of strings (copy string data) or index&len pairs for the original string (no string copies needed, but original string needs to remain valid).
But there are dozens of very different ways to do this in C, and all a bit klunky. You need to choose your flavor of klunkiness based on your use case.
About being "more optimized": unless you are coding for a very small embedded device or something, always choose a more robust, clear, easier to use, harder to use wrong over more micro-optimized. The useful kind of optimization turns, for example, O(n^2) to O(n log n). Turning O(3n) to O(2n) of a single function is almost always completely irrelevant (you are not going to do string splitting in a game engine inner rendering loop...).
Currently learning memory management in C, and I am currently running into issues increasing string length as a loop iterates.
The method I am trying to figure out logically works like this:
// return string with "X" removed
char * notX(char * string){
result = "";
if(for int = 0; i < strlen(string); i++){
if (string[i] != 'X') {
result += string[i];
}
}
return result;
}
Simple enough to do in other languages, but managing the memory in C makes it a bit challenging. Difficulties I run into is when I use malloc and realloc to initialize and change size of my string. In my code I currently tried:
char * notX(char * string){
char* res = malloc(sizeof(char*)); // allocate memory for string of size 1;
res = ""; // attempted to initialize the string. Fairly certain this is incorrect
char tmp[2]; // temporary string to hold value to be concatenated
if(for int = 0; i < strlen(string); i++){
if (string[i] != 'X') {
res = realloc(res, sizeof(res) + sizeof(char*)); // reallocate res and increasing its size by 1 character
tmp[0] = string[i];
tmp[1] = '\0';
strcat(res, tmp);
}
}
return result;
}
Note, I have found success in initializing result to be some large array like:
char res[100];
However, I would like to learn how to address this issue with out initializing an array with a fixed size since that might potentially be wasted memory space, or not enough memory.
realloc needs the number of bytes to allocate. size is incremented for each character added to res. size + 2 is used to provide for the current character being added and the terminating zero.
Check the return of realloc. NULL means a failure. Using tmp allows the return of res if realloc fails.
char * notX(char * string){
char* res = NULL;//so realloc will work on first call
char* tmp = NULL;//temp pointer during realloc
size_t size = 0;
size_t index = 0;
while ( string[index]) {//not the terminating zero
if ( string[index] != 'X') {
if ( NULL == ( tmp = realloc(res, size + 2))) {//+ 2 for character and zero
fprintf ( stderr, "realloc problem\n");
if ( res) {//not NULL
res[size] = 0;//terminate
}
return res;
}
res = tmp;//assign realloc pointer back to res
res[size] = string[index];
++size;
}
++index;//next character
}
if ( res) {//not NULL
res[size] = 0;//terminate
}
return res;
}
2 main errors in this code:
the malloc and the realloc function with parameter that call sizeof(char*). In this case the result of sizeof(char*) is the size of a pointer, not of a char, so you have to substitute the char* with char in the sizeof function.
res = ""; is incorrect. You primarly have a memory leak because you lose the pointer to the just allocated memory in malloc function, secondary but not less important, you have an undefined behavior when call realloc function over res initialized as an empty string ( or better a constant string), after the above initialization the memory is no longer dinamically managed. To substitute this initialization i think a memset to 0 is the best solution.
While working on a program which requires frequent memory allocation I came across behaviour I cannot explain. I've implemented a work around but I am curious to why my previous implementation didn't work. Here's the situation:
Memory reallocation of a pointer works
This may not be best practice (and if so please let me knwow) but I recall that realloc can allocate new memory if the pointer passed in is NULL. Below is an example where I read file data into a temporary buffer, then allocate appropriate size for *data and memcopy content
I have a file structure like so
typedef struct _my_file {
int size;
char *data;
}
And the mem reallocation and copy code like so:
// cycle through decompressed file until end is reached
while ((read_size = gzread(fh, buf, sizeof(buf))) != 0 && read_size != -1) {
// allocate/reallocate memory to fit newly read buffer
if ((tmp_data = realloc(file->data, sizeof(char *)*(file->size+read_size))) == (char *)NULL) {
printf("Memory reallocation error for requested size %d.\n", file->size+read_size);
// if memory was previous allocated but realloc failed this time, free memory!
if (file->size > 0)
free(file->data);
return FH_REALLOC_ERROR;
}
// update pointer to potentially new address (man realloc)
file->data = tmp_data;
// copy data from temporary buffer
memcpy(file->data + file->size, buf, read_size);
// update total read file size
file->size += read_size;
}
Memory reallocation of pointer to pointer fails
However, here is where I'm confused. Using the same thought that reallocation of a NULL pointer will allocate new memory, I parse a string of arguments and for each argument I allocate a pointer to a pointer, then allocate a pointer that is pointed by that pointer to a pointer. Maybe code is easier to explain:
This is the structure:
typedef struct _arguments {
unsigned short int options; // options bitmap
char **regexes; // array of regexes
unsigned int nregexes; // number of regexes
char *logmatch; // log file match pattern
unsigned int limit; // log match limit
char *argv0; // executable name
} arguments;
And the memory allocation code:
int i = 0;
int len;
char **tmp;
while (strcmp(argv[i+regindex], "-logs") != 0) {
len = strlen(argv[i+regindex]);
if((tmp = realloc(args->regexes, sizeof(char **)*(i+1))) == (char **)NULL) {
printf("Cannot allocate memory for regex patterns array.\n");
return -1;
}
args->regexes = tmp;
tmp = NULL;
if((args->regexes[i] = (char *)malloc(sizeof(char *)*(len+1))) == (char *)NULL) {
printf("Cannot allocate memory for regex pattern.\n");
return -1;
}
strcpy(args->regexes[i], argv[i+regindex]);
i++;
}
When I compile and run this I get a run time error "realloc: invalid pointer "
I must be missing something obvious but after not accomplishing much trying to debug and searching for solutions online for 5 hours now, I just ran two loops, one counts the numbers of arguments and mallocs enough space for it, and the second loop allocates space for the arguments and strcpys it.
Any explanation to this behaviour is much appreciated! I really am curious to know why.
First fragment:
// cycle through decompressed file until end is reached
while (1) {
char **tmp_data;
read_size = gzread(fh, buf, sizeof buf);
if (read_size <= 0) break;
// allocate/reallocate memory to fit newly read buffer
tmp_data = realloc(file->data, (file->size+read_size) * sizeof *tmp_data );
if ( !tmp_data ) {
printf("Memory reallocation error for requested size %d.\n"
, file->size+read_size);
if (file->data) {
free(file->data)
file->data = NULL;
file->size = 0;
}
return FH_REALLOC_ERROR;
}
file->data = tmp_data;
// copy data from temporary buffer
memcpy(file->data + file->size, buf, read_size);
// update total read file size
file->size += read_size;
}
Second fragment:
unsigned i; // BTW this variable is already present as args->nregexes;
for(i =0; strcmp(argv[i+regindex], "-logs"); i++) {
char **tmp;
tmp = realloc(args->regexes, (i+1) * sizeof *tmp );
if (!tmp) {
printf("Cannot allocate memory for regex patterns array.\n");
return -1;
}
args->regexes = tmp;
args->regexes[i] = strdup( argv[i+regindex] );
if ( !args->regexes[i] ) {
printf("Cannot allocate memory for regex pattern.\n");
return -1;
}
...
return 0;
}
A few notes:
the syntax ptr = malloc ( CNT * sizeof *ptr); is more robust than the sizeof(type) variant.
strdup() does exactly the same as your malloc+strcpy()
the for(;;) loop is less error prone than a while() loop with a loose i++; at the end of the loop body. (it also makes clear that the loopcondition is never checked)
to me if ( !ptr ) {} is easyer to read than if (ptr != NULL) {}
the casts are not needed and sometimes unwanted.
I'm looking for an efficient method for appending multiple strings.
The way it should work is C++ std::string::append or JAVA StringBuffer.append.
I wrote a function which actually reallocs previous source pointer and does strcat.
I believe this is not an efficient method as compiler may implement this free and malloc.
Other way I could think of (like std::vector) is allocate memory in bulk (1KB for eg) and do strcpy. In that case every append call will check if the total required allocation is more than (1200 bytes) the amount allocated in bulk, realloc to 2KB. But in that case there will be some memory wasted.
I'm looking for a balance between the above but the preference is performance.
What other approaches are possible. Please suggest.
I would add each string to a list, and add the length of each new string to a running total. Then, when you're done, allocate space for that total, walk the list and strcpy each string to the newly allocated space.
The classical approach is to double the buffer every time it is too small.
Start out with a "reasonable" buffer, so you don't need to do realloc()s for sizes 1, 2, 4, 8, 16 which are going to be hit by a large number of your strings.
Starting out at 1024 bytes means you will have one realloc() if you hit 2048, a second if you hit 4096, and so on. If rampant memory consumption scares you, cap the growth rate once it hits something suitably big, like 65536 bytes or whatever, it depends on your data and memory tolerance.
Also make sure you buffer the current length, so you can do strcpy() without having to walk the string to find the length, first.
Sample function to concatenate strings
void
addToBuffer(char **content, char *buf) {
int textlen, oldtextlen;
textlen = strlen(buf);
if (*content == NULL)
oldtextlen = 0;
else
oldtextlen = strlen(*content);
*content = (char *) realloc( (void *) *content, (sizeof(char)) * (oldtextlen+textlen+1));
if ( oldtextlen != 0 ) {
strncpy(*content + oldtextlen, buf, textlen + 1);
} else {
strncpy(*content, buf, textlen + 1);
}
}
int main(void) {
char *content = NULL;
addToBuffer(&content, "test");
addToBuffer(&content, "test1");
}
I would do something like this:
typedef struct Stringbuffer {
int capacity; /* Maximum capacity. */
int length; /* Current length (excluding null terminator). */
char* characters; /* Pointer to characters. */
} Stringbuffer;
BOOL StringBuffer_init(Stringbuffer* buffer) {
buffer->capacity = 0;
buffer->length = 0;
buffer->characters = NULL;
}
void StringBuffer_del(Stringbuffer* buffer) {
if (!buffer)
return;
free(buffer->characters);
buffer->capacity = 0;
buffer->length = 0;
buffer->characters = NULL;
}
BOOL StringBuffer_add(Stringbuffer* buffer, char* string) {
int len;
int new_length;
if (!buffer)
return FALSE;
len = string ? strlen(string) : 0;
if (len == 0)
return TRUE;
new_length = buffer->length + len;
if (new_length >= new_capacity) {
int new_capacity;
new_capacity = buffer->capacity;
if (new_capacity == 0)
new_capacity = 16;
while (new_length >= new_capacity)
new_capacity *= 2;
new_characters = (char*)realloc(buffer->characters, new_capacity);
if (!new_characters)
return FALSE;
buffer->capacity = new_capacity;
buffer->characters = new_characters;
}
memmove(buffer->characters + buffer->length, string, len);
buffer->length = new_length;
buffer->characters[buffer->length] = '\0';
return TRUE;
}