Splitting a string and store to the heap algorithm question - c

For this code below that I was writing. I was wondering, if I want to split the string but still retain the original string is this the best method?
Should the caller provided the ** char or should the function "split" make an additional malloc call and memory manage the ** char?
Also, I was wondering if this is the most optimizing method, or could I optimize the code better than this?
I still have not debug the code yet, I am a bit undecided whether if the caller manage the ** char or the function manage the pointer ** char.
#include <stdio.h>
#include <stdlib.h>
size_t split(const char * restrict string, const char splitChar, char ** restrict parts, const size_t maxParts){
size_t size = 100;
size_t partSize = 0;
size_t len = 0;
size_t newPart = 1;
char * tempMem;
/*
* We just reverse a long page of memory
* At reaching the space character that is the boundary of the new
*/
char * mem = (char*) malloc( sizeof(char) * size );
if ( mem == NULL ) return 0;
for ( size_t i = 0; string[i] != 0; i++ ) {
// If it is a split char we at a new part
if ( string[i] == splitChar) {
// If the last character was not the split character
// Then mem[len] = 0 and increase the len by 1.
if (newPart == 0) mem[len++] = 0;
newPart = 1;
continue;
} else {
// If this is a new part
// and not a split character
// we make a new pointer
if ( newPart == 1 ){
// if reach maxpart we break.
// It is okay here, to not worry about memory
if ( partSize == maxParts ) break;
parts[partSize++] = &mem[len];
newPart = 0;
}
mem[len++] = string[i];
if ( len == size ){
// if ran out of memory realloc.
tempMem = (char*)realloc(mem, sizeof(char) * (size << 1) );
// if fail quit loop
if ( tempMem == NULL ) {
// If we can't get more memory the last part could be corrupted
// We have to return.
// Otherwise the code below can seg.
// There maybe a better way than this.
return partSize--;
}
size = size << 1;
mem = tempMem;
}
}
}
// If we got here and still in a newPart that is fine no need
// an additional character.
if ( newPart != 1 ) mem[len++] = 0;
// realloc to give back the unneed memory
if ( len < size ) {
tempMem = (char*) realloc(mem, sizeof(char) * len );
// If the resizing did not fail but yielded a different
// memory block;
if ( tempMem != NULL && tempMem != mem ){
for ( size_t i = 0; i < partSize; i++ ){
parts[i] = tempMem + (parts[i] - mem);
}
}
}
return partSize;
}
int main(){
char * tStr = "This is a super long string just to test the str str adfasfas something split";
char * parts[10];
size_t len = split(tStr, ' ', parts, 10);
for (size_t i = 0; i < len; i++ ){
printf("%zu: %s\n", i, parts[i]);
}
}

What is "best" is very subjective, as well as use case dependent.
I personally would keep the parameters as input only, define a struct to contain the split result, and probably return such by value. The struct would probably contain pointers to memory allocation, so would also create a helper function free that memory. The parts might be stored as list of strings (copy string data) or index&len pairs for the original string (no string copies needed, but original string needs to remain valid).
But there are dozens of very different ways to do this in C, and all a bit klunky. You need to choose your flavor of klunkiness based on your use case.
About being "more optimized": unless you are coding for a very small embedded device or something, always choose a more robust, clear, easier to use, harder to use wrong over more micro-optimized. The useful kind of optimization turns, for example, O(n^2) to O(n log n). Turning O(3n) to O(2n) of a single function is almost always completely irrelevant (you are not going to do string splitting in a game engine inner rendering loop...).

Related

C using malloc and realloc to dynamically increase string length

Currently learning memory management in C, and I am currently running into issues increasing string length as a loop iterates.
The method I am trying to figure out logically works like this:
// return string with "X" removed
char * notX(char * string){
result = "";
if(for int = 0; i < strlen(string); i++){
if (string[i] != 'X') {
result += string[i];
}
}
return result;
}
Simple enough to do in other languages, but managing the memory in C makes it a bit challenging. Difficulties I run into is when I use malloc and realloc to initialize and change size of my string. In my code I currently tried:
char * notX(char * string){
char* res = malloc(sizeof(char*)); // allocate memory for string of size 1;
res = ""; // attempted to initialize the string. Fairly certain this is incorrect
char tmp[2]; // temporary string to hold value to be concatenated
if(for int = 0; i < strlen(string); i++){
if (string[i] != 'X') {
res = realloc(res, sizeof(res) + sizeof(char*)); // reallocate res and increasing its size by 1 character
tmp[0] = string[i];
tmp[1] = '\0';
strcat(res, tmp);
}
}
return result;
}
Note, I have found success in initializing result to be some large array like:
char res[100];
However, I would like to learn how to address this issue with out initializing an array with a fixed size since that might potentially be wasted memory space, or not enough memory.
realloc needs the number of bytes to allocate. size is incremented for each character added to res. size + 2 is used to provide for the current character being added and the terminating zero.
Check the return of realloc. NULL means a failure. Using tmp allows the return of res if realloc fails.
char * notX(char * string){
char* res = NULL;//so realloc will work on first call
char* tmp = NULL;//temp pointer during realloc
size_t size = 0;
size_t index = 0;
while ( string[index]) {//not the terminating zero
if ( string[index] != 'X') {
if ( NULL == ( tmp = realloc(res, size + 2))) {//+ 2 for character and zero
fprintf ( stderr, "realloc problem\n");
if ( res) {//not NULL
res[size] = 0;//terminate
}
return res;
}
res = tmp;//assign realloc pointer back to res
res[size] = string[index];
++size;
}
++index;//next character
}
if ( res) {//not NULL
res[size] = 0;//terminate
}
return res;
}
2 main errors in this code:
the malloc and the realloc function with parameter that call sizeof(char*). In this case the result of sizeof(char*) is the size of a pointer, not of a char, so you have to substitute the char* with char in the sizeof function.
res = ""; is incorrect. You primarly have a memory leak because you lose the pointer to the just allocated memory in malloc function, secondary but not less important, you have an undefined behavior when call realloc function over res initialized as an empty string ( or better a constant string), after the above initialization the memory is no longer dinamically managed. To substitute this initialization i think a memset to 0 is the best solution.

Is this appender, with realloc function safe?

Just finished putting this function together from some man documentation, it takes a char* and appends a const char* to it, if the size of the char* is too small it reallocates it to something a little bigger and finally appends it. Its been a long time since I used c, so just checking in.
// append with realloc
int append(char *orig_str, const char *append_str) {
int result = 0; // fail by default
// is there enough space to append our data?
int req_space = strlen(orig_str) + strlen(append_str);
if (req_space > strlen(orig_str)) {
// just reallocate enough + 4096
int new_size = req_space;
char *new_str = realloc(orig_str, req_space * sizeof(char));
// resize success..
if(new_str != NULL) {
orig_str = new_str;
result = 1; // success
} else {
// the resize failed..
fprintf(stderr, "Couldn't reallocate memory\n");
}
} else {
result = 1;
}
// finally, append the data
if (result) {
strncat(orig_str, append_str, strlen(append_str));
}
// return 0 if Ok
return result;
}
This is not usable because you never tell the caller where the memory is that you got back from realloc.
You will need to either return a pointer, or pass orig_str by reference.
Also (as pointed out in comments) you need to do realloc(orig_str, req_space + 1); to allow space for the null terminator.
Your code has a some inefficient logic , compare with this fixed version:
bool append(char **p_orig_str, const char *append_str)
{
// no action required if appending an empty string
if ( append_str[0] == 0 )
return true;
size_t orig_len = strlen(*p_orig_str);
size_t req_space = orig_len + strlen(append_str) + 1;
char *new_str = realloc(*p_orig_str, req_space);
// resize success..
if(new_str == NULL)
{
fprintf(stderr, "Couldn't reallocate memory\n");
return false;
}
*p_orig_str = new_str;
strcpy(new_str + orig_len, append_str);
return true;
}
This logic doesn't make any sense:
// is there enough space to append our data?
int req_space = strlen(orig_str) + strlen(append_str);
if (req_space > strlen(orig_str)) {
As long as append_str has non-zero length, you're always going to have to re-allocate.
The main problem is that you're trying to track the size of your buffers with strlen. If your string is NUL-terminated (as it should be), your perceived buffer size is always going to be the exact length of the data in it, ignoring any extra.
If you want to work with buffers like this, you need to track the size in a separate size_t, or keep some sort of descriptor like this:
struct buffer {
void *buf;
size_t alloc_size;
size_t used_amt; /* Omit if strings are NUL-terminated */
}

return a space-less string from a function

I have a fucntion which in it I want to return a string (i.e array of chars) with no spaces at all. This is my code, which in my understanding is not right:
char *ignoreSpace( char helpArr[], int length ){
int i = 0; int j = 0;
char withoutSpace[length];
while ( i < length ){
/*if not a space*/
if ( isspace( helpArr[i] ) == FALSE )
withoutSpace[j] = helpArr[i];
i++;
}
return *withoutSpace;
}
My intention in the line:
return *withoutSpace;
Is to return the content of the array withoutSpace so I could parse a string with no spaces at all.
Can you please tell me how can I make it any better?
Your current solution will lose the result of withoutSpace when the function returns as it is only defined in that function's scope.
A better pattern would be to accept a third argument to the function which is a pointer to a char[] to write the result into - in much the same way the standard functions do, (eg strcpy.
char* ignoreSpace(char* src, char* dst, int length) {
// copy from src to dst, ignoring spaces
// ...
// ...
return dst;
}
Try this (assuming null terminated string)
void ignoreSpace(char *str) {
int write_pos = 0, read_pos = 0;
for (; str[read_pos]; ++read_pos) {
if (!isspace(str[read_pos]) {
str[write_pos++] = str[read_pos];
}
}
str[write_pos] = 0;
}
You cannot return a pointer to a local variable from a function, because as soon as you leave the function all local variables are detroyed and no longer valid.
You must either
Allocate space with malloc in your function and return a pointer
to that allocated memory
not return a pointer from the function butmodify directly the
original string.
First solution :
char *ignoreSpace(char helpArr[], int length)
{
int i=0; int j=0;
char *withoutSpace = malloc(length) ;
while(i <= length)
{
/*if not a space*/
if(isspace(helpArr[i]) == FALSE)
withoutSpace[j++] = helpArr[i];
i++;
}
return withoutSpace;
}
Second solution:
char *ignoreSpace(char helpArr[], int length)
{
int i=0; int j=0;
while(i <= length)
{
/*if not a space*/
if(isspace(helpArr[i]) == FALSE)
helpArr[j++] = helpArr[i];
i++;
}
return helpArr;
}
There are some other small correction in my code. Finding out which ones is left as an exercise to the reader.
You don't increment j, ever. In the case that the current character of the source string is not a space, you probably would like to store it in your output string and then also increment the j by one; so that you'd store the next possible character into the next slot instead of overwriting the 0th one again and again.
So change this:
...
withoutSpace[j] = helpArr[i];
...
into this:
...
withoutSpace[j++] = helpArr[i];
...
And then also append your withoutSpace with a 0 or '\0' (they are the same), so that any string processing function may know its end. Also return the pointer, since you should do that, not the *withoutSpace or withoutSpace[0] (they are the same):
char *ignoreSpace( char helpArr[], int length ){
int i = 0; int j = 0;
char * withoutSpace = malloc( length * sizeof * withoutSpace ); // <-- changed this
while ( i < length ){
/*if not a space*/
if ( isspace( helpArr[i] ) == FALSE )
withoutSpace[j++] = helpArr[i]; // <-- replaced j with j++
i++;
}
withoutSpace[j] = 0; // <-- added this
return withoutSpace;
}
And then you should be good to go, assuming that you can have variable-length arrays.
Edit: Well, variable-length arrays or not, you better just use dynamic memory allocation by using malloc or calloc or something, because else, as per comments, you'd be returning a local pointer variable. Of course, this requires you to manually free the allocated memory in the end.

Initializing an infinite number of char **

I'm making a raytracing engine in C using the minilibX library.
I want to be able to read in a .conf file the configuration for the scene to display:
For example:
(Az#Az 117)cat universe.conf
#randomcomment
obj:eye:x:y:z
light:sun:100
light:moon:test
The number of objects can vary between 1 and the infinite.
From now on, I'm reading the file, copying each line 1 by 1 in a char **tab, and mallocing by the number of objects found, like this:
void open_file(int fd, struct s_img *m)
{
int i;
char *s;
int curs_obj;
int curs_light;
i = 0;
curs_light = 0;
curs_obj = 0;
while (s = get_next_line(fd))
{
i = i + 1;
if (s[0] == 'l')
{
m->lights[curs_light] = s;
curs_light = curs_light + 1;
}
else if (s[0] == 'o')
{
m->objs[curs_obj] = s;
curs_obj = curs_obj + 1;
}
else if (s[0] != '#')
{
show_error(i, s);
stop_parsing(m);
}
}
Now, I want to be able to store each information of each tab[i] in a new char **tab, 1 for each object, using the ':' as a separation.
So I need to initialize and malloc an undetermined number of char **tab. How can I do that?
(Ps: I hope my code and my english are good enough for you to understand. And I'm using only the very basic function, like read, write, open, malloc... and I'm re-building everything else, like printf, get_line, and so on)
You can't allocate an indeterminate amount of memory; malloc doesn't support it. What you can do is to allocate enough memory for now and revise that later:
size_t buffer = 10;
char **tab = malloc(buffer);
//...
if (indexOfObjectToCreate > buffer) {
buffer *= 2;
tab = realloc(tab, buffer);
}
I'd use an alternative approach (as this is c, not c++) and allocate simply large buffers as we go by:
char *my_malloc(size_t n) {
static size_t space_left = 0;
static char *base = NULL;
if (base==NULL || space_left < n) base=malloc(space_left=BIG_N);
base +=n; return base-n;
}
Disclaimer: I've omitted the garbage collection stuff and testing return values and all safety measures to keep the routine short.
Another way to think this is to read the file in to a large enough mallocated array (you can check it with ftell), scan the buffer, replace delimiters, line feeds etc. with ascii zero characters and remember the starting locations of keywords.

C string append

I'm looking for an efficient method for appending multiple strings.
The way it should work is C++ std::string::append or JAVA StringBuffer.append.
I wrote a function which actually reallocs previous source pointer and does strcat.
I believe this is not an efficient method as compiler may implement this free and malloc.
Other way I could think of (like std::vector) is allocate memory in bulk (1KB for eg) and do strcpy. In that case every append call will check if the total required allocation is more than (1200 bytes) the amount allocated in bulk, realloc to 2KB. But in that case there will be some memory wasted.
I'm looking for a balance between the above but the preference is performance.
What other approaches are possible. Please suggest.
I would add each string to a list, and add the length of each new string to a running total. Then, when you're done, allocate space for that total, walk the list and strcpy each string to the newly allocated space.
The classical approach is to double the buffer every time it is too small.
Start out with a "reasonable" buffer, so you don't need to do realloc()s for sizes 1, 2, 4, 8, 16 which are going to be hit by a large number of your strings.
Starting out at 1024 bytes means you will have one realloc() if you hit 2048, a second if you hit 4096, and so on. If rampant memory consumption scares you, cap the growth rate once it hits something suitably big, like 65536 bytes or whatever, it depends on your data and memory tolerance.
Also make sure you buffer the current length, so you can do strcpy() without having to walk the string to find the length, first.
Sample function to concatenate strings
void
addToBuffer(char **content, char *buf) {
int textlen, oldtextlen;
textlen = strlen(buf);
if (*content == NULL)
oldtextlen = 0;
else
oldtextlen = strlen(*content);
*content = (char *) realloc( (void *) *content, (sizeof(char)) * (oldtextlen+textlen+1));
if ( oldtextlen != 0 ) {
strncpy(*content + oldtextlen, buf, textlen + 1);
} else {
strncpy(*content, buf, textlen + 1);
}
}
int main(void) {
char *content = NULL;
addToBuffer(&content, "test");
addToBuffer(&content, "test1");
}
I would do something like this:
typedef struct Stringbuffer {
int capacity; /* Maximum capacity. */
int length; /* Current length (excluding null terminator). */
char* characters; /* Pointer to characters. */
} Stringbuffer;
BOOL StringBuffer_init(Stringbuffer* buffer) {
buffer->capacity = 0;
buffer->length = 0;
buffer->characters = NULL;
}
void StringBuffer_del(Stringbuffer* buffer) {
if (!buffer)
return;
free(buffer->characters);
buffer->capacity = 0;
buffer->length = 0;
buffer->characters = NULL;
}
BOOL StringBuffer_add(Stringbuffer* buffer, char* string) {
int len;
int new_length;
if (!buffer)
return FALSE;
len = string ? strlen(string) : 0;
if (len == 0)
return TRUE;
new_length = buffer->length + len;
if (new_length >= new_capacity) {
int new_capacity;
new_capacity = buffer->capacity;
if (new_capacity == 0)
new_capacity = 16;
while (new_length >= new_capacity)
new_capacity *= 2;
new_characters = (char*)realloc(buffer->characters, new_capacity);
if (!new_characters)
return FALSE;
buffer->capacity = new_capacity;
buffer->characters = new_characters;
}
memmove(buffer->characters + buffer->length, string, len);
buffer->length = new_length;
buffer->characters[buffer->length] = '\0';
return TRUE;
}

Resources