Function to split string sometimes gives segmentation fault - c

I have the following function to split a string. Most of the time it works fine, but sometimes it randomly causes a segmentation fault.
char** splitString(char* string, char* delim){
int count = 0;
char** split = NULL;
char* temp = strtok(string, delim);
while(temp){
split = realloc(split, sizeof(char*) * ++count);
split[count - 1] = temp;
temp = strtok(NULL, " ");
}
int i = 0;
while(split[i]){
printf("%s\n", split[i]);
i++;
}
split[count - 1][strlen(split[count - 1]) - 1] = '\0';
return split;
}

You have a number of subtle issues, not the least of which your function will segfault if you pass a string literal. You need to make a copy of the string you will be splitting as strtok modifies the string. If you pass a string literal (stored in read-only memory), your compiler has no way of warning unless you have declared string as const char *string;
To avoid these problems, simply make a copy of the string you will tokeninze. That way, regardless how the string you pass to the function was declared, you avoid the problem altogether.
You should also pass a pointer to size_t as a parameter to your function in order to make the number of token available back in the calling function. That way you do not have to leave a sentinel NULL as the final pointer in the pointer to pointer to char you return. Just pass a pointer and update it to reflect the number of tokens parsed in your function.
Putting those pieces together, and cleaning things up a bit, you could use the following to do what you are attempting to do:
char **splitstr (const char *str, char *delim, size_t *n)
{
char *cpy = strdup (str), *p = cpy; /* copy of str & pointer */
char **split = NULL; /* pointer to pointer to char */
*n = 0; /* zero 'n' */
for (p = strtok (p, delim); p; p = strtok (NULL, delim)) {
void *tmp = realloc (split, sizeof *split * (*n + 1));
if (!tmp) { /* validate realloc succeeded */
fprintf (stderr, "splitstr() error: memory exhausted.\n");
break;
}
split = tmp; /* assign tmp to split */
split[(*n)++] = strdup (p); /* allocate/copy to split[n] */
}
free (cpy); /* free cpy */
return split; /* return split */
}
Adding a short example program, you could do the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **splitstr (const char *str, char *delim, size_t *n)
{
char *cpy = strdup (str), *p = cpy; /* copy of str & pointer */
char **split = NULL; /* pointer to pointer to char */
*n = 0; /* zero 'n' */
for (p = strtok (p, delim); p; p = strtok (NULL, delim)) {
void *tmp = realloc (split, sizeof *split * (*n + 1));
if (!tmp) { /* validate realloc succeeded */
fprintf (stderr, "splitstr() error: memory exhausted.\n");
break;
}
split = tmp; /* assign tmp to split */
split[(*n)++] = strdup (p); /* allocate/copy to split[n] */
}
free (cpy); /* free cpy */
return split; /* return split */
}
int main (void) {
size_t n = 0; /* number of strings */
char *s = "My dog has fleas.", /* string to split */
*delim = " .\n", /* delims */
**strings = splitstr (s, delim, &n); /* split s */
for (size_t i = 0; i < n; i++) { /* output results */
printf ("strings[%zu] : %s\n", i, strings[i]);
free (strings[i]); /* free string */
}
free (strings); /* free pointers */
return 0;
}
Example Use/Output
$ ./bin/splitstrtok
strings[0] : My
strings[1] : dog
strings[2] : has
strings[3] : fleas
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to write beyond/outside the bounds of your allocated block of memory, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/splitstrtok
==14471== Memcheck, a memory error detector
==14471== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==14471== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==14471== Command: ./bin/splitstrtok
==14471==
strings[0] : My
strings[1] : dog
strings[2] : has
strings[3] : fleas
==14471==
==14471== HEAP SUMMARY:
==14471== in use at exit: 0 bytes in 0 blocks
==14471== total heap usage: 9 allocs, 9 frees, 115 bytes allocated
==14471==
==14471== All heap blocks were freed -- no leaks are possible
==14471==
==14471== For counts of detected and suppressed errors, rerun with: -v
==14471== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.

split[count - 1][strlen(split[count - 1]) - 1] = '\0';
should look like
split[count - 1] = NULL;
You don't have anything allocated there so that you can access it and put '\0'.
After that put that line before while(split[i]) so that the while can stop when it reaches NULL.

The function strtok is not reentrant, use strtok_r() function this is a reentrant version strtok().

Related

Creating an dynamic array, but getting segmentation fault as error

i wanted to create a dynamic array, which will contain user input. But I keep getting segmentation fault as an error after my first input. I know that segmentation fault is caused due false memory access. Is there a way to locate the error in the code ?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
int length,i;
int size_arr = 1;
int size_word = 102;
char **arr;
char *word;
arr = malloc(size_arr * sizeof(char*));
word = malloc(size_word *sizeof(char));
if(arr == NULL)
{
fprintf(stderr, "No reallocation");
exit(EXIT_FAILURE);
}
else
{
while(!feof(stdin) && fgets(word, size_word , stdin) != NULL)
{
arr[i] = strndup(word,size_word);
length = strlen(word);
if(length > 102)
{
fprintf(stderr, "Word is too long");
}
if(size_arr-1 == i)
{
size_arr *= 2;
arr = realloc(arr, size_arr * sizeof(char*));
}
i++;
}
}
free(arr);
return 0;
}
Greeting,
Soreseth
You are close, you are just a bit confused on how to handle putting all the pieces together. To begin with, understand there are no arrays involved in your code. When you are building a collection using a pointer-to-pointer (e.g. char **arr), it is a two step process. arr itself is a single-pointer to an allocated block of pointers -- which you expand by one each time to add the next word by calling realloc() to reallocate an additional pointer.
The second step is to allocated storage for each word. You then assign the beginning address for that word's storage to the pointer you have allocated.
So you have one block of pointers which you expand (realloc()) to add a pointer to hold the address for the next word, you then allocate storage for that word, assigning the beginning address to your new pointer and then copy the word to that address.
(note: calling realloc() every iteration to add just one-pointer is inefficient. You solve that by adding another counter (e.g. allocated) that holds the number of pointers allocated, and a counter used, to keep track of the number of pointers you have used. You only reallocate when used == allocated. That way you can, e.g. double the number of pointers available each time -- or whatever growth scheme you choose)
Also note that strdup() and strndup() are handy, but are not part of the standard C library (they are POSIX). While most compilers will provide them, you may need the right compiler option to ensure they are available.
Let's look at your example, in the simple case, using only functions provided by the standard library. We will keep you reallocate by one scheme, and leave you to implement the used == allocated scheme to clean that up later.
When reading lines of data, you won't know how many characters you need to store until the line is read -- so just reused the same fixed-size buffer to read each line. Then you can trim the '\n' included by fgets() and get the length of the characters you need to allocate (+1 for the *nul-terminating character). Then simply allocate, assign to your new pointer and copy from the fixed buffer to the new block of storage for the word (or line). A 1K, 2K, etc.. fixed buffer is fine.
So let's collect the variables you need for your code, defining a constant for the fixed buffer size, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void)
{
char **arr = NULL, /* pointer to block of allocated pointers */
buf[MAXC]; /* fixed buffer for read-input */
int size_arr = 0; /* only 1 counter needed here */
Now let's read a line into buf and start by allocating your pointer:
while (fgets (buf, MAXC, stdin))
{
size_t len;
/* allocate pointer (one each time rather inefficient) */
void *tmp = realloc (arr, (size_arr + 1) * sizeof *arr);
if (!tmp) { /* VALIDATE */
perror ("realloc-arr");
break;
}
arr = tmp; /* assign on success */
(as noted in my comment, you never realloc() using the pointer itself because when (not if) realloc() fails, it will overwrite the pointer address (e.g. the address to your collection of pointers) with NULL.)
So above, you realloc() to the temporary pointer tmp, validate the reallocation succeeded, then assign the newly allocated block of pointers to arr.
Now trim the '\n' from buf and get the number of characters. (where strcspn() allows you to do this all in a single call):
buf[(len = strcspn (buf, "\n"))] = 0; /* trim \n, save len */
Now just allocated storage for len + 1 characters and copy from buf to arr[size_arr].
arr[size_arr] = malloc (len + 1); /* allocate for word */
if (!arr[size_arr]) { /* VALIDATE */
perror ("malloc-arr[i]");
break;
}
memcpy (arr[size_arr], buf, len + 1); /* copy buf to arr[i] */
size_arr += 1; /* increment counter */
}
(note: when reallocating 1 pointer per-iteration, only a single counter variable is needed, and note how it is not incremented until both the pointer reallocation, the allocation for your word storage is validated, and the word is copied from buf to arr[size_arr]. On failure of either allocation, the loop is broken and your size_arr will still hold the correct number of stored words)
That completes you read-loop.
Now you can use your stored collection of size_arr pointers, each pointing to an allocated and stored word as you wish. But remember, when it comes time to free the memory, that too is a 2-step process. You must free the allocated block for each word, before freeing the block of allocated pointers, e.g.
for (int i = 0; i < size_arr; i++) { /* output result */
puts (arr[i]);
free (arr[i]); /* free word storage */
}
free(arr); /* free pointers */
Done.
The complete program is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void)
{
char **arr = NULL, /* pointer to block of allocated pointers */
buf[MAXC]; /* fixed buffer for read-input */
int size_arr = 0; /* only 1 counter needed here */
while (fgets (buf, MAXC, stdin))
{
size_t len;
/* allocate pointer (one each time rather inefficient) */
void *tmp = realloc (arr, (size_arr + 1) * sizeof *arr);
if (!tmp) { /* VALIDATE */
perror ("realloc-arr");
break;
}
arr = tmp; /* assign on success */
buf[(len = strcspn (buf, "\n"))] = 0; /* trim \n, save len */
arr[size_arr] = malloc (len + 1); /* allocate for word */
if (!arr[size_arr]) { /* VALIDATE */
perror ("malloc-arr[i]");
break;
}
memcpy (arr[size_arr], buf, len + 1); /* copy buf to arr[i] */
size_arr += 1; /* increment counter */
}
for (int i = 0; i < size_arr; i++) { /* output result */
puts (arr[i]);
free (arr[i]); /* free word storage */
}
free(arr); /* free pointers */
}
Example Use/Output
Test it out. Do something creative like read, store and output your source file to make sure it works, e.g.
$ ./bin/dynarr < dynarr.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void)
{
char **arr = NULL, /* pointer to block of allocated pointers */
buf[MAXC]; /* fixed buffer for read-input */
int size_arr = 0; /* only 1 counter needed here */
while (fgets (buf, MAXC, stdin))
{
size_t len;
/* allocate pointer (one each time rather inefficient) */
void *tmp = realloc (arr, (size_arr + 1) * sizeof *arr);
if (!tmp) { /* VALIDATE */
perror ("realloc-arr");
break;
}
arr = tmp; /* assign on success */
buf[(len = strcspn (buf, "\n"))] = 0; /* trim \n, save len */
arr[size_arr] = malloc (len + 1); /* allocate for word */
if (!arr[size_arr]) { /* VALIDATE */
perror ("malloc-arr[i]");
break;
}
memcpy (arr[size_arr], buf, len + 1); /* copy buf to arr[i] */
size_arr += 1; /* increment counter */
}
for (int i = 0; i < size_arr; i++) { /* output result */
puts (arr[i]);
free (arr[i]); /* free word storage */
}
free(arr); /* free pointers */
}
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/dynarr < dynarr.c
==30995== Memcheck, a memory error detector
==30995== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==30995== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==30995== Command: ./bin/dynarr
==30995==
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
<snipped code>
}
==30995==
==30995== HEAP SUMMARY:
==30995== in use at exit: 0 bytes in 0 blocks
==30995== total heap usage: 84 allocs, 84 frees, 13,462 bytes allocated
==30995==
==30995== All heap blocks were freed -- no leaks are possible
==30995==
==30995== For counts of detected and suppressed errors, rerun with: -v
==30995== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.

Loading strings divided by delimiter to array

I have an array that needs to be filled with values from a string looking like this:
value0;value1;value2;value3;\n
I tried using strtok() but couldn't really figure out how to properly load more than 2 elements into table.
Desirable output is something like
arrayValues[0] = value0;
arrayValues[1] = value1;
etc.
You need to use strtok() and realloc(). Both are a bit difficult to use
char input[] = "value0;value1;value2;value3\n";
char **arrayValues = NULL;
int N = 0;
char *token = strtok(input, ";");
while(token != 0)
{
N++;
arrayValues = realloc(arrayValues, N * sizeof(char *));
if(!arrayValues)
/* out of memory - very unlikely to happen */
arrayValues[N-1] = strdup(token);
token = strtok(NULL, ";");
}
/* print out to check */
for(i=0;i<N;i++)
printf("***%s***\n", arrayValues[i]);
Note that the delimiter ';' is overwritten, if you retain it as you specified you'll have to add it to the end of the strings, which is fiddly and probably not what you really want.
At its simplest form, if the string you need to separate will remain in scope during the time you are making use of the individual tokens, then there is no need to allocate. Simply declare an array of pointers with a sufficient number of pointers for the tokens you have, and as you tokenize your string, just assign the address for the beginning of each token to the pointers in your array of pointers. That way, the pointers in your array simply point to the place within the original string where each of your tokens are found. Example:
#include <stdio.h>
#include <string.h>
#define MAXS 16
int main (void) {
char str[] = "value0;value1;value2;value3;\n",
*array[MAXS] = {NULL}, /* array of pointers */
*p = str, /* pointer to str */
*delim = ";\n"; /* delimiters */
int i, n = 0; /* loop var & index - n */
p = strtok (p, delim); /* get 1st token */
while (n < MAXS && p) { /* check bounds/validate token */
array[n++] = p; /* add pointer to array */
p = strtok (NULL, delim); /* get next token */
}
for (i = 0; i < n; i++) /* output tokens */
printf ("array[%2d] : %s\n", i, array[i]);
return 0;
}
(note: strtok modifies the original string by placing nul-terminating characters (e.g. '\0') in place of the delimiters. If you need to preserve the original string, make a copy before calling strtok)
Note above, you are limited to a fixed number of pointers, so while you are separating the tokens and assigning them to pointers in your array, you need to check the number against your array bounds to prevent writing beyond the end of your array.
Example Use/Output
$ ./bin/parsestrstrtok
array[ 0] : value0
array[ 1] : value1
array[ 2] : value2
array[ 3] : value3
Taking the parsing to the next step, where your original string may not remain in scope during the time your array values are needed, you simply need to allocate storage for each token and copy each token to your newly allocated memory and assign the starting address for each new block to the pointers in your array. That way, even if you pass your array of pointers and the string to a function for parsing, the array values remain available after the function completes until you free the memory you have allocated.
You are still limited to a fixed number of pointers, but your array is now usable wherever required in your program. The additions required for this are minimal. Note, malloc and strcpy are used below and can be replaced by a single call to strdup. However, since strdup is not part of all versions of C, malloc and strcpy are used instead. (but note, strdup does make for a very convenient replacement if your compiler supports it)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXS 16
int main (void) {
char str[] = "value0;value1;value2;value3;\n",
*array[MAXS] = {NULL}, /* array of pointers */
*p = str, /* pointer to str */
*delim = ";\n"; /* delimiters */
int i, n = 0; /* loop var & index - n */
p = strtok (p, delim); /* get 1st token */
while (p) { /* validate token */
/* allocate/validate storage for token */
if (!(array[n] = malloc (strlen (p) + 1))) {
perror ("malloc failed");
exit (EXIT_FAILURE);
}
strcpy (array[n++], p); /* copy token to array */
p = strtok (NULL, delim); /* get next token */
}
for (i = 0; i < n; i++) { /* output tokens */
printf ("array[%2d] : %s\n", i, array[i]);
free (array[i]); /* free memory for tokens */
}
return 0;
}
(output is the same)
Finally, you can eliminate your dependency on a fixed number of pointers by dynamically allocating the pointers and reallocating the pointers on an as needed basis. You can start with the same number, and then allocate twice the current number of pointers when your current supply is exhausted. It is simply one additional level of allocation before you start parsing, and a requirement to realloc when you have used all the pointers at hand. Example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXS 16
int main (void) {
char str[] = "value0;value1;value2;value3;\n",
**array = NULL, /* pointer to pointer to char */
*p = str, /* pointer to str */
*delim = ";\n"; /* delimiters */
int i, n = 0, /* loop var & index - n */
nptrs = MAXS; /* number allocated pointers */
/* allocate/validate an initial number of pointers for array */
if (!(array = malloc (nptrs * sizeof *array))) {
perror ("malloc pointers failed");
exit (EXIT_FAILURE);
}
p = strtok (p, delim); /* get 1st token */
while (p) { /* validate token */
/* allocate/validate storage for token */
if (!(array[n] = malloc (strlen (p) + 1))) {
perror ("malloc failed");
exit (EXIT_FAILURE);
}
strcpy (array[n++], p); /* copy token to array */
if (n == nptrs) { /* pointer limit reached */
/* realloc 2X number of pointers/validate */
void *tmp = realloc (array, nptrs * 2 * sizeof *array);
if (!tmp) {
perror ("realloc - pointers");
goto memfull; /* don't exit, array has original values */
}
array = tmp; /* assign new block to array */
nptrs *= 2; /* update no. allocated pointers */
}
p = strtok (NULL, delim); /* get next token */
}
memfull:;
for (i = 0; i < n; i++) { /* output tokens */
printf ("array[%2d] : %s\n", i, array[i]);
free (array[i]); /* free memory for tokens */
}
free (array); /* free memory for pointers */
return 0;
}
note: you should validate your memory use with a memory use and error checking program like valgrind on Linux. There are similar tools for every platform. Just run your code though the checker and validate there are no memory error and that all memory you have allocated has been properly freed.
Example:
$ valgrind ./bin/parsestrstrtokdbl
==15256== Memcheck, a memory error detector
==15256== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==15256== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==15256== Command: ./bin/parsestrstrtokdbl
==15256==
array[ 0] : value0
array[ 1] : value1
array[ 2] : value2
array[ 3] : value3
==15256==
==15256== HEAP SUMMARY:
==15256== in use at exit: 0 bytes in 0 blocks
==15256== total heap usage: 5 allocs, 5 frees, 156 bytes allocated
==15256==
==15256== All heap blocks were freed -- no leaks are possible
==15256==
==15256== For counts of detected and suppressed errors, rerun with: -v
==15256== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
**note above you see 5 allocations (1 for the pointers and 1 for each token). All memory has been freed and there are no memory errors.
There are probably a dozen more approaches you can take to inch-worm down your string picking out tokens, but this is the general progression of how to expand on the approach using strtok. Let me know if you have any further questions.
You can just use good old strchr function to hunt for the substring ';' terminator and malloc and realloc for memory allocations.
Make note that input str is modified (reused). In that string ';' are replaced by '\0'.
(If you need str untouched than allocate another buffer, copy the str to it and point p1,p2 pointers to it.)
The arrayValues holds pointers to the substrings:
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
int main ()
{
char **arrayValues = malloc(sizeof(char *)); // allocate memory for string pointers
char str[] = "value1;value2;value3\n"; // input
char *p1 = str; // init pointer helpers
char *p2 = str;
int n = 0; // substring counter
// OPTIONAL if you want to get rid of ending '\n'
//--s
size_t len = strlen(str);
if(len>0)
if(str[len-1] == '\n')
str[len-1] = 0;
//--ee
while(p1 != NULL)
{
p1 = strchr(p1,';'); // find ';'
if(p1 != NULL)
{
arrayValues[n] = p2; // begining of the substring
*p1 = 0; // terminate the substring string; get rid of ';'
n++; // count the substrings
arrayValues = realloc( arrayValues, (n+1) * sizeof(char *)); // allocate more memory for next pointer
p2 = p1+1; // move the ponter after the ';'
p1 = p1+1; // we start the search for next ';'
}
else
{
arrayValues[n] = p2; // this is the last (or first) substring
n++;
}
} // while
// Output:
for (int j=0; j<n; j++)
{
printf("%s \n", arrayValues[j]);
}
printf("------");
free(arrayValues);
return 0;
}
Output:
value1
value2
value3
------

how to connect/link words in an string array using one loop?

i have an array with n words .. i want to attach the strings togther ..
for example if the array have the following strings: "hello" "world" "stack77"
i want the function to return :"helloworldstach7 " any help how i can do something like this without Recursion and with one loop and i can only use from the string library the two functions strcpy and strlen !!
any ideas ! thanks
I NEED TO USE ONE LOOP ONLY !
char *connect(char**words,int n){
int i=0;
while(words){
strcpy(words+i,
i saw many many solutions but they all use other string functions , where i only want to use strcpy and strlen .
If to use only the two mentioned standard string functions then the function can look as it is shown in the demonstrative program.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char * connect( char **words, size_t n )
{
size_t length = 0;
for ( size_t i = 0; i < n; i++ ) length += strlen( words[i] );
char *s = malloc( length + 1 );
size_t pos = 0;
for ( size_t i = 0; i < n; i++ )
{
strcpy( s + pos, words[i] );
pos += strlen( words[i] );
}
s[pos] = '\0';
return s;
}
int main( void )
{
char * s[] = { "Hello", " ", "World" };
char *p = connect( s, sizeof( s ) / sizeof( *s ) );
puts( p );
free( p );
}
The program output is
Hello World
If to use only one loop then the function can look the following way
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char * connect( char **words, size_t n )
{
char *s = calloc( 1, sizeof( char ) );
if ( s != NULL )
{
size_t pos = 0;
for ( size_t i = 0; s != NULL && i < n; i++ )
{
size_t length = strlen( words[i] );
char *tmp = realloc( s, pos + length + 1 );
if ( tmp != NULL )
{
s = tmp;
strcpy( s + pos, words[i] );
pos += length;
}
else
{
free( s );
s = NULL;
}
}
}
return s;
}
int main( void )
{
char * s[] = { "Hello", " ", "World" };
char *p = connect( s, sizeof( s ) / sizeof( *s ) );
if ( p != NULL ) puts( p );
free( p );
}
In addition to using strcpy, you can also use sprintf. Each of the functions in the printf family returns the number of characters actually output allowing you to compute an offset in your final string without an additional function call. Now, there is nothing wrong with using a strcpy/strlen approach, and in fact, that is probably the preferred approach, but be aware that there are always multiple ways of doing things within the parameters you have given. Also note that the printf family offers a wealth of formatting benefits in the event you would need to include additional information along with the concatenation of strings.
For example, using sprintf to concatenate each string while saving the number of characters in each nc as the offset for writing the next string to the resulting buffer buf, while using a ternary operator to control the addition of a space between the words based on your loop counter, you could do something similar to the following:
char *compress (char **p, int n)
{
char *buf = NULL; /* buffer to hold concatendated string */
size_t total = 0; /* total number of characters required */
int nc = 0; /* number of chars added (counter) */
for (int i = 0; i < n; i++) /* get total required length */
total += strlen (p[i]) + 1; /* including spaces between */
if (!(buf = malloc (total + 1))) /* allocate/validate mem */
return buf; /* return NULL on error */
for (int i = 0; i < n; i++) /* add each word to buf, save nc */
nc += sprintf (buf + nc, i ? " %s" : "%s", p[i]);
*(buf + nc) = 0; /* affirmatively nul-terminate buf */
return buf;
}
note: each memory allocation with malloc, calloc or realloc should be validated to insure it succeeds, and the error handled in the event of failure. (here NULL is returned if allocation fails).
Putting that together in a short example, you could do something similar to the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *compress (char **p, int n)
{
char *buf = NULL; /* buffer to hold concatendated string */
size_t total = 0; /* total number of characters required */
int nc = 0; /* number of chars added (counter) */
for (int i = 0; i < n; i++) /* get total required length */
total += strlen (p[i]) + 1; /* including spaces between */
if (!(buf = malloc (total + 1))) /* allocate/validate mem */
return buf; /* return NULL on error */
for (int i = 0; i < n; i++) /* add each word to buf, save nc */
nc += sprintf (buf + nc, i ? " %s" : "%s", p[i]);
*(buf + nc) = 0; /* affirmatively nul-terminate buf */
return buf;
}
int main (void) {
char *sa[] = { "My", "dog", "has", "too many", "fleas." },
*result = compress (sa, sizeof sa/sizeof *sa);
if (result) { /* check return */
printf ("result: '%s'\n", result); /* print string */
free (result); /* free memory */
}
return 0;
}
Example Use/Output
$ ./bin/strcat_sprintf
result: 'My dog has too many fleas.'
Memory/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to write beyond/outside the bounds of your allocated block of memory, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/strcat_sprintf
==27595== Memcheck, a memory error detector
==27595== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==27595== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==27595== Command: ./bin/strcat_sprintf
==27595==
result: 'My dog has too many fleas.'
==27595==
==27595== HEAP SUMMARY:
==27595== in use at exit: 0 bytes in 0 blocks
==27595== total heap usage: 1 allocs, 1 frees, 28 bytes allocated
==27595==
==27595== All heap blocks were freed -- no leaks are possible
==27595==
==27595== For counts of detected and suppressed errors, rerun with: -v
==27595== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have any additional questions.
compress with a Single Loop
As mentioned in the comment to MarianD's answer, you can use a single loop, reallocating your buffer with the addition of each word to the final string, but that is less efficient than getting the total number of characters required and then allocating once. However, there are many occasions where that is exactly what you will be required to do. Basically, you will simply get the length of each word and then allocate memory for that word (and the space between it and the next and for the nul-byte) using realloc instead of malloc (or calloc). realloc acts just like malloc for the first allocation, thereafter it resizes the buffer maintaining its current contents.
note: never realloc the buffer directly (e.g. buf = realloc (buf, newsize);), instead, always use a temporary pointer. Why? IF realloc fails, NULL is returned by realloc which causes you to lose the reference to your original buf (e.g. it will result in buf = NULL;), meaning that the address for your original buf is lost (and you have created a memory leak).
Putting that together, you could do something like the following:
char *compress (char **p, int n)
{
char *buf = NULL; /* buffer to hold concatendated string */
size_t bufsz = 0; /* current allocation size for buffer */
int nc = 0; /* number of chars added (counter) */
for (int i = 0; i < n; i++) { /* add each word to buf */
size_t len = strlen (p[i]) + 1; /* get length of word */
void *tmp = realloc (buf, bufsz + len); /* realloc buf */
if (!tmp) /* validate reallocation */
return buf; /* return current buffer */
buf = tmp; /* assign reallocated block to buffer */
bufsz += len; /* increment bufsz to current size */
nc += sprintf (buf + nc, i ? " %s" : "%s", p[i]);
}
*(buf + nc) = 0; /* affirmatively nul-terminate buf */
return buf;
}
Memory/Error Check
$ valgrind ./bin/strcat_sprintf_realloc
==28175== Memcheck, a memory error detector
==28175== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28175== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28175== Command: ./bin/strcat_sprintf_realloc
==28175==
result: 'My dog has too many fleas.'
==28175==
==28175== HEAP SUMMARY:
==28175== in use at exit: 0 bytes in 0 blocks
==28175== total heap usage: 5 allocs, 5 frees, 68 bytes allocated
==28175==
==28175== All heap blocks were freed -- no leaks are possible
==28175==
==28175== For counts of detected and suppressed errors, rerun with: -v
==28175== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
note: now there are 5 allocations instead of 1.
Let me know if you have any questions.

Memory leak from malloc through tokenise function in C

I'm encountering a memory leak when I exit my program (found using valgrind) from the tokenize function I've written. Outside of this function, after I assign the tokens to other variables, I call free(tokens) where appropriate, but this doesn't fix the problem. Any help would be hugely appreciated!
Code:
/**
* Splits user input into an array of tokens.
**/
char ** tokenize(const char * s, int * n)
{
/* Sets array of strings and allocates memory, sized appropriately. */
int i;
char * token;
char ** tokens = malloc((BUF_LEN + EXTRA_SPACES) *sizeof(*token));
char buf[BUF_LEN];
strncpy(buf, s, BUF_LEN);
/* Defines first token by a whitespace. */
token = strtok(buf, " ");
i = 0;
/* While loop defines all consequent tokens also with a whitespace. */
while (token)
{
tokens[i] = malloc((strlen(token)+EXTRA_SPACES) *sizeof(*token));
strncpy(tokens[i], token, strlen(token));
i++;
token = strtok(NULL, " ");
}
* n = i;
return tokens;
}
I added a function to free your array and checked it with valgrind that there is no memory leak.
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <memory.h>
size_t BUF_LEN = 32;
int EXTRA_SPACES = 16;
int length = 0;
char ** tokenize(const char * s, int * n)
{
/* Sets array of strings and allocates memory, sized appropriately. */
int i;
char * token;
char ** tokens = malloc((BUF_LEN + EXTRA_SPACES) *sizeof(*token));
char buf[BUF_LEN];
strncpy(buf, s, BUF_LEN);
/* Defines first token by a whitespace. */
token = strtok(buf, " ");
i = 0;
/* While loop defines all consequent tokens also with a whitespace. */
while (token)
{
tokens[i] = malloc((strlen(token)+EXTRA_SPACES) *sizeof(*token));
strncpy(tokens[i], token, strlen(token));
i++;
token = strtok(NULL, " ");
length++;
}
* n = i;
return tokens;
}
/* deallocates an array of arrays of char*, calling free() on each */
void free_argv(char **argv, unsigned rows) {
for (unsigned row = 0; row < rows; row++) {
free(argv[row]);
}
free(argv);
}
int main ()
{
int i = 12;
char ** ch = tokenize("abc", &i);
free_argv(ch, (unsigned) length);
}
Output
valgrind ./a.out
==28962== Memcheck, a memory error detector
==28962== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28962== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28962== Command: ./a.out
==28962==
==28962==
==28962== HEAP SUMMARY:
==28962== in use at exit: 0 bytes in 0 blocks
==28962== total heap usage: 2 allocs, 2 frees, 67 bytes allocated
==28962==
==28962== All heap blocks were freed -- no leaks are possible
==28962==
==28962== For counts of detected and suppressed errors, rerun with: -v
==28962== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I call free(tokens) where appropriate
free(tokens); is not enough, you must call free for each allocated item:
for (i = 0; i < n; i++) {
free(tokens[i]);
}
free(tokens);

How to get a substring from a string in C?

I want to create a function in C that gets the a substring from a string. This is what I have so far:
char* substr(char* src, int start, int len){
char* sub = malloc(sizeof(char)*(len+1));
memcpy(sub, &src[start], len);
sub[len] = '\0';
return sub;
}
int main(){
char* test = malloc(sizeof(char)*5); // the reason I don't use char* = "test"; is because I wouldn't be able to use free() on it then
strcpy(test, "test");
char* sub = substr(test, 1, 2); // save the substr in a new char*
free(test); // just wanted the substr from test
printf("%s\n", sub); // prints "es"
// ... free when done with sub
free(sub);
}
Is there any way I can save the substring into test without having to create a new char*? If I do test = substr(test, 1, 2), the old value of test no longer has a pointer pointing to it, so it's leaked memory (I think. I'm a noob when it comes to C languages.)
void substr(char* str, char* sub , int start, int len){
memcpy(sub, &str[start], len);
sub[len] = '\0';
}
int main(void)
{
char *test = (char*)malloc(sizeof(char)*5);
char *sub = (char*)malloc(sizeof(char)*3);
strcpy(test, "test");
substr(test, sub, 1, 2);
printf("%s\n", sub); // prints "es"
free(test);
free(sub);
return 0;
}
Well you could always keep the address of the malloc'd memory is a separate pointer:
char* test = malloc(~~~)
char* toFree = test;
test = substr(test,1,2);
free(toFree);
But most of the features and capabilities of shuffling this sort of data around has already been done in string.h. One of those functions probably does the job you want get done. movemem() as others have pointed out, could move the substring to the start of your char pointer, viola!
If you specifically want to make a new dynamic string to play with while keeping the original separate and safe, and also want to be able to overlap these pointers.... that's tricky. You could probably do it if you passed in the source and destination and then range-checked the affected memory, and free'd the source if there was overlap... but that seems a little over-complicated.
I'm also loathe to malloc memory that I trust higher levels to free, but that's probably just me.
As an aside,
char* test = "test";
Is one of those niche cases in C. When you initialize a pointer to a string literal (stuff in quotes), it puts the data in a special section of memory just for text data. You can (rarely) edit it, but you shouldn't, and it can't grow.
There are a number of ways to do this, and the way you approached it is a good one, but there are several areas where you seemed a bit confused. First, there is no need to allocated test. Simply using a pointer is fine. You could simply do char *test = "test"; in your example. No need to free it then either.
Next, when you are beginning to allocate memory dynamically, you need to always check the return to make sure your allocation succeeded. Otherwise, you can easily segfault if you attempt to write to a memory location when there has been no memory allocated.
In your substr, you should also validate the range of start and len you send to the function to insure you are not attempting to read past the end of the string.
When dealing with only positive numbers, it is better to use type size_t or unsigned. There will never be a negative start or len in your code, so size_t fits the purpose nicely.
Lastly, it is good practice to always check that a pointer to a memory block to be freed actually holds a valid address to prevent freeing a block of memory twice, etc... (e.g. if (sub) free (sub);)
Take a look at the following and let me know if you have questions. I changed the code to accept command line arguments from string, start and len, so the use is:
./progname the_string_to_get_sub_from start len
I hope the following helps.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char* substr (char* src, size_t start, size_t len)
{
/* validate indexes */
if (start + len > strlen (src)) {
fprintf (stderr, "%s() error: invalid substring index (start+len > length).\n", __func__);
return NULL;
}
char* sub = calloc (1, len + 1);
/* validate allocation */
if (!sub) {
fprintf (stderr, "%s() error: memory allocation failed.\n", __func__);
return NULL;
}
memcpy (sub, src + start, len);
// sub[len] = '\0'; /* by using calloc, sub is filled with 0 (null) */
return sub;
}
int main (int argc, char **argv) {
if (argc < 4 ) {
fprintf (stderr, "error: insufficient input, usage: %s string ss_start ss_length\n", argv[0]);
return 1;
}
char* test = argv[1]; /* no need to allocate test, a pointer is fine */
size_t ss_start = (size_t)atoi (argv[2]); /* convert start & length from */
size_t ss_lenght = (size_t)atoi (argv[3]); /* the command line arguments */
char* sub = substr (test, ss_start, ss_lenght);
if (sub) /* validate sub before use */
printf("\n sub: %s\n\n", sub);
if (sub) /* validate sub before free */
free(sub);
return 0;
}
Output
$ ./bin/str_substr test 1 2
sub: es
If you choose an invalid start / len combination:
$ ./bin/str_substr test 1 4
substr() error: invalid substring index (start+len > length).
Verify All Memory Freed
$ valgrind ./bin/str_substr test 1 2
==13515== Memcheck, a memory error detector
==13515== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==13515== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==13515== Command: ./bin/str_substr test 1 2
==13515==
sub: es
==13515==
==13515== HEAP SUMMARY:
==13515== in use at exit: 0 bytes in 0 blocks
==13515== total heap usage: 1 allocs, 1 frees, 4 bytes allocated
==13515==
==13515== All heap blocks were freed -- no leaks are possible
==13515==
==13515== For counts of detected and suppressed errors, rerun with: -v
==13515== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
Let's break down what is being talked about:
You allocate some memory, and you create the variable test to point to it.
You allocate some more memory, and you'd like to store that pointer in the variable named test as well.
You have 2 pieces of information that you claim you'd like to store in the same pointer. You can't do this!
Solution 1
Use two variables. I don't know why this isn't acceptable...
char *input = "hello";
char *output = substr(input, 2, 3);
Solution 2
Have your input parameter not be heap memory. There's a number of ways we could do this:
// Use a string literal
char *test = substr("test", 2, 2);
// Use a stack allocated string
char s[] = "test";
char *test = substr(s, 2, 2);
Personally...
If you're already passing the length of the substring to the function, I'd personally rather see that function just get passed the piece of memory that it will push the data into. Something like:
char *substr(char *dst, char *src, size_t offset, size_t length) {
memcpy(dst, src + offset, length);
dst[length] = '\0';
return dst;
}
int main() {
char s[5] = "test";
char d[3] = "";
substr(d, s, 2, 2);
}
In C, string functions quickly run into memory management. So somehow the space for the sub-string needs to exist and passed to the function or the function can allocate it.
const char source[] = "Test";
size_t start, length;
char sub1[sizeof source];
substring1(source, sub1, start, length);
// or
char *sub2 = substring2(source, start, length);
...
free(sub2);
Code needs to specify what happens when 1) the start index is greater than other original string's length and 2) the length similarly exceeds the original string. These are 2 important steps not done OP's code.
void substring1(const char *source, char *dest, size_t start, size_t length) {
size_t source_len = strlen(source);
if (start > source_len) start = source_len;
if (start + length > source_len) length = source_len - start;
memmove(dest, &source[start], length);
dest[length] = 0;
}
char *substring2(const char *source, size_t start, size_t length) {
size_t source_len = strlen(source);
if (start > source_len) start = source_len;
if (start + length > source_len) length = source_len - start;
char *dest = malloc(length + 1);
if (dest == NULL) {
return NULL;
}
memcpy(dest, &source[start], length);
dest[length] = 0;
return dest;
}
By using memmove() vs. memcpy() in substring1(), code could use the same destination buffer as the source buffer. memmove() is well defined, even if buffers overlap.
substring1(source, source, start, length);

Resources