C split string function

C split string function - c

I am trying to implement function to split strings, but i keep getting segmentation faults. I am working on Windows XP, and therefore i also had to implement strdup(), because Windows API doesn't provide it. Can anyone tell me what's wrong with the following piece of code.
char** strspl(char* str, char* del)
{
int size = 1;
for(int i = 0; i < strlen(str);) {
if(strncmp(str + i, del, strlen(del)) == 0) {
size++;
i += strlen(del);
}
else {
i++;
}
}
char** res = (char**)malloc(size * sizeof(char*));
res[0] = strdup(strtok(str, del));
for(int i = 0; res[i] != NULL; i++) {
res[i] = strdup(strtok(NULL, del));
}
return res;
}
char* strdup(char* str) {
char* res = (char*)malloc(strlen(str));
strncpy(res, str, sizeof(str));
return res;
}
EDIT: using a debugger i found out, that program crashes after following line:
res[0] = strdup(strtok(str,del));
Also, i fixed strdup(), but there is still no progress.

You're not counting the null terminator and you are copying the wrong number of bytes
char* strdup(char* str) {
char* res = (char*)malloc(strlen(str)); /* what about the null terminator? */
strncpy(res, str, sizeof(str)); /* sizeof(str)
** is the same as
** sizeof (char*) */
return res;
}

Your strdup() function is not correct. The sizeof(str) in there is the size of the str pointer (probably 4 or 8 bytes), not the length of the string. Use the library provided _strdup() instead.

malloc does not initialize the allocated memory to \0. Have you tried using calloc instead? I suspect the seg faults are due to the res[i] != NULL comparison.

There are many things wrong with this code, but the main flaw is trying to implement this function. It will only provide you with marginal benefit, if any.
Compare the following two code snippets:
/* Using strtok... */
char *tok = strtok(str, del);
while (tok != NULL) {
/* Do something with tok. */
tok = strtok(NULL, del);
}
-
/* Using your function... */
char **res = strspl(str, del, &size);
size_t i;
for (i = 0; i < size; i++) {
/* Do something with *(res + i). */
}

Related

free allocated memory in function

i have function which is written in c ,in this function i allocate 2 string as temp and found, but i cant free temp string.
i think it may due to using of temp in result array.
can any one helps me.
here is the function.
void split(char* input, char* delim, char** result,int size) {
char* tmp = malloc((strlen(input)) * sizeof(char));
char* found = malloc((strlen(input)) * sizeof(char));
tmp=strcpy(tmp, input);
// #pragma omp parallel for
for (int i=0; i<size; i++) {
found = strstr(tmp, delim);
if (found != NULL) {
int length = found - tmp;
result[i]=malloc((length+1) * sizeof(char));
result[i] = strncpy(result[i], tmp, length);
*(result[i] + length) = '\0';
tmp = found + strlen(delim);
} else {
result[i]=malloc(strlen(tmp) * sizeof(char));
result[i] =strncpy(result[i], tmp, strlen(tmp));
}
}
// free(tmp);
free(found);
}
here size is number of sub strings after split
when i remove the comment of this line:
// free(tmp);
then this err occurs:
munmap_chunk(): invalid pointer
Aborted (core dumped)
can i ask you to help me for writing correct split function

You do assignments to tmp. That means the pointer tmp might no longer point to the same location that malloc returned.
You need to pass the exact same pointer to free that was returned by malloc.
You have the same problem with found, you assign to it and possible change where it points.
Passing an invalid pointer to free leads to undefined behavior.
You also have another problem: You go out of bounds of the original memory allocated and pointed to by tmp. That's because you seem to have forgotten that strings in C are really called null-terminated strings.
When you allocate memory for a string, you need to include space for the null-terminator at the end. And it's not counted by strlen.
Going out of bounds of allocated memory also leads to undefined behavior.

The function does not make a sense.
For starters it invokes undefined behavior
char* tmp = malloc((strlen(input)) * sizeof(char));
char* found = malloc((strlen(input)) * sizeof(char));
tmp=strcpy(tmp, input);
//...
because you allocated to enough memory to store the terminating zero character '\0' of the string input in the character array tmp.
Secondly the function has a memory leak because at first memory was allocated and its address was assigned to the pointer found and then the pointer found was reassigned in the call of strstr in the for loop.
char* found = malloc((strlen(input)) * sizeof(char));
//...
// #pragma omp parallel for
for (int i=0; i<size; i++) {
found = strstr(tmp, delim);
//...
So the address of the early allocated memory is lost and the memory can not be freed.
And this for loop
for (int i=0; i<size; i++) {
is just senseless.
You may not call free neither for tmp nor for found. The pointer found does not point to a dynamically allocated memory and the pointer tmp is being changed within the for loop.

here is my new function. i wrote it in recursive mode.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int get_number_of_occurence(char* string, char* delim) {
char* found = strstr(string, delim);
if(found == NULL || strcmp(found,delim) ==0 ||found =="" ){
return 0;
}else{
return 1+get_number_of_occurence(found+strlen(delim), delim);
}
}
int split(char* input, char* delim, char** result,int array_idx) {
char* found= strstr(input, delim);
if (found==""||strlen(input)==0){
return 0;
}else if(found == NULL){
result[array_idx]=malloc(strlen(input)+1 * sizeof(char));
strncpy(result[array_idx], input, strlen(input));
*(result[array_idx] + strlen(input)) = '\0';
return 0;
}else{
int length = found - input;
result[array_idx]=malloc((length+1) * sizeof(char));
strncpy(result[array_idx], input, length);
*(result[array_idx] + length) = '\0';
return split(found+strlen(delim),delim,result,array_idx+1);
}
}
int main () {
char * delim = "ooo";
char * input="ssooonn";
int size = get_number_of_occurence(input, delim);
printf("size is : %d \n",size);
char *splitted_values[size+1];
split(input, delim,splitted_values,0);
for (int i=0; i<(size+1); i++) {
printf("%s\n",splitted_values[i]);
free(splitted_values[i]);
}
}
in this code, first i count the number of occurrence of delimiter.
then i create array in that size and fill it with the help of split function.
thanks for helping.

Why does my string_split implementation not work?

My str_split function returns (or at least I think it does) a char** - so a list of strings essentially. It takes a string parameter, a char delimiter to split the string on, and a pointer to an int to place the number of strings detected.
The way I did it, which may be highly inefficient, is to make a buffer of x length (x = length of string), then copy element of string until we reach delimiter, or '\0' character. Then it copies the buffer to the char**, which is what we are returning (and has been malloced earlier, and can be freed from main()), then clears the buffer and repeats.
Although the algorithm may be iffy, the logic is definitely sound as my debug code (the _D) shows it's being copied correctly. The part I'm stuck on is when I make a char** in main, set it equal to my function. It doesn't return null, crash the program, or throw any errors, but it doesn't quite seem to work either. I'm assuming this is what is meant be the term Undefined Behavior.
Anyhow, after a lot of thinking (I'm new to all this) I tried something else, which you will see in the code, currently commented out. When I use malloc to copy the buffer to a new string, and pass that copy to aforementioned char**, it seems to work perfectly. HOWEVER, this creates an obvious memory leak as I can't free it later... so I'm lost.
When I did some research I found this post, which follows the idea of my code almost exactly and works, meaning there isn't an inherent problem with the format (return value, parameters, etc) of my str_split function. YET his only has 1 malloc, for the char**, and works just fine.
Below is my code. I've been trying to figure this out and it's scrambling my brain, so I'd really appreciate help!! Sorry in advance for the 'i', 'b', 'c' it's a bit convoluted I know.
Edit: should mention that with the following code,
ret[c] = buffer;
printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
it does indeed print correctly. It's only when I call the function from main that it gets weird. I'm guessing it's because it's out of scope ?
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define DEBUG
#ifdef DEBUG
#define _D if (1)
#else
#define _D if (0)
#endif
char **str_split(char[], char, int*);
int count_char(char[], char);
int main(void) {
int num_strings = 0;
char **result = str_split("Helo_World_poopy_pants", '_', &num_strings);
if (result == NULL) {
printf("result is NULL\n");
return 0;
}
if (num_strings > 0) {
for (int i = 0; i < num_strings; i++) {
printf("\"%s\" \n", result[i]);
}
}
free(result);
return 0;
}
char **str_split(char string[], char delim, int *num_strings) {
int num_delim = count_char(string, delim);
*num_strings = num_delim + 1;
if (*num_strings < 2) {
return NULL;
}
//return value
char **ret = malloc((*num_strings) * sizeof(char*));
if (ret == NULL) {
_D printf("ret is null.\n");
return NULL;
}
int slen = strlen(string);
char buffer[slen];
/* b is the buffer index, c is the index for **ret */
int b = 0, c = 0;
for (int i = 0; i < slen + 1; i++) {
char cur = string[i];
if (cur == delim || cur == '\0') {
_D printf("Copying content of buffer to ret[%i]\n", c);
//char *tmp = malloc(sizeof(char) * slen + 1);
//strcpy(tmp, buffer);
//ret[c] = tmp;
ret[c] = buffer;
_D printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
//free(tmp);
c++;
b = 0;
continue;
}
//otherwise
_D printf("{%i} Copying char[%c] to index [%i] of buffer\n", c, cur, b);
buffer[b] = cur;
buffer[b+1] = '\0'; /* extend the null char */
b++;
_D printf("Buffer is now equal to: \"%s\"\n", buffer);
}
return ret;
}
int count_char(char base[], char c) {
int count = 0;
int i = 0;
while (base[i] != '\0') {
if (base[i++] == c) {
count++;
}
}
_D printf("Found %i occurence(s) of '%c'\n", count, c);
return count;
}

You are storing pointers to a buffer that exists on the stack. Using those pointers after returning from the function results in undefined behavior.
To get around this requires one of the following:
Allow the function to modify the input string (i.e. replace delimiters with null-terminator characters) and return pointers into it. The caller must be aware that this can happen. Note that supplying a string literal as you are doing here is illegal in C, so you would instead need to do:
char my_string[] = "Helo_World_poopy_pants";
char **result = str_split(my_string, '_', &num_strings);
In this case, the function should also make it clear that a string literal is not acceptable input, and define its first parameter as const char* string (instead of char string[]).
Allow the function to make a copy of the string and then modify the copy. You have expressed concerns about leaking this memory, but that concern is mostly to do with your program's design rather than a necessity.
It's perfectly valid to duplicate each string individually and then clean them all up later. The main issue is that it's inconvenient, and also slightly pointless.
Let's address the second point. You have several options, but if you insist that the result be easily cleaned-up with a call to free, then try this strategy:
When you allocate the pointer array, also make it large enough to hold a copy of the string:
// Allocate storage for `num_strings` pointers, plus a copy of the original string,
// then copy the string into memory immediately following the pointer storage.
char **ret = malloc((*num_strings) * sizeof(char*) + strlen(string) + 1);
char *buffer = (char*)&ret[*num_strings];
strcpy(buffer, string);
Now, do all your string operations on buffer. For example:
// Extract all delimited substrings. Here, buffer will always point at the
// current substring, and p will search for the delimiter. Once found,
// the substring is terminated, its pointer appended to the substring array,
// and then buffer is pointed at the next substring, if any.
int c = 0;
for(char *p = buffer; *buffer; ++p)
{
if (*p == delim || !*p) {
char *next = p;
if (*p) {
*p = '\0';
++next;
}
ret[c++] = buffer;
buffer = next;
}
}
When you need to clean up, it's just a single call to free, because everything was stored together.

The string pointers you store into the res with ret[c] = buffer; array point to an automatic array that goes out of scope when the function returns. The code subsequently has undefined behavior. You should allocate these strings with strdup().
Note also that it might not be appropriate to return NULL when the string does not contain a separator. Why not return an array with a single string?
Here is a simpler implementation:
#include <stdlib.h>
char **str_split(const char *string, char delim, int *num_strings) {
int i, n, from, to;
char **res;
for (n = 1, i = 0; string[i]; i++)
n += (string[i] == delim);
*num_strings = 0;
res = malloc(sizeof(*res) * n);
if (res == NULL)
return NULL;
for (i = from = to = 0;; from = to + 1) {
for (to = from; string[to] != delim && string[to] != '\0'; to++)
continue;
res[i] = malloc(to - from + 1);
if (res[i] == NULL) {
/* allocation failure: free memory allocated so far */
while (i > 0)
free(res[--i]);
free(res);
return NULL;
}
memcpy(res[i], string + from, to - from);
res[i][to - from] = '\0';
i++;
if (string[to] == '\0')
break;
}
*num_strings = n;
return res;
}

Comparing char* in C using strcasecmp

I'm reading bytes from a socket and copying it into a char array.
char usrInputStr[256];
if ((rbytes = read(STDIN_FILENO, usrInputStr, 256)) < 0) {
perror("Read error: ");
exit(-1);
}
char finalStr[rbytes + 1];
memcpy(finalStr, usrInputStr, rbytes);
Now I allot an array on the heap and split the string into words and put each word in an array of char arrays. This is the code that does that.
char** currentTokens = (char**)malloc(sizeof(char*) * 256);
for(int i = 0; i < 256; i++) {
currentTokens[i] = (char*)malloc(sizeof(char) * 256);
}
int sz = splitStrToArray(finalStr, currentTokens);
The definition of the splitStrToArray function is here,this works fine.
int splitStrToArray(char* str, char** arr) {
int count = 0;
char* buffer;
int len = strlen(str);
for (int i = 0; i < len ; ++i) {
if(isspace(str[i])) {
count++;
}
}
int index = 0;
buffer = strtok(str, " ");
while(buffer != NULL) {
memcpy(arr[index], buffer, strlen(buffer));
index++;
buffer = strtok(NULL, " ");
}
return count;
}
However when I compare this with user input it doest return zero and thus the two string don't match.
if(strncasecmp(currentTokens[0], "quit") == 0) {
printf("quit" );
breakTrigger = 1;
} else if(strcasecmp(currentTokens[0], "q") == 0) {
printf("q");
breakTrigger = 1;
} else {
callback(currentTokens, sz, port);
}
I've checked currentTokens[0] and the word is correct.
When the I try to take the return value of strcasecmp in an int and print it I get Segmentation Fault.
I'm new to C, any help appreciated.

None of your strings are null-terminated so you have undefined behaviour throughout. Using memcpy to copy strings is almost never what you want.
You should consider using strdup, if available. Otherwise malloc and then strcpy.
In the particular case of finalStr, I see no good reason to perform the copy at all. Just read directly into it (and don't forget to null-terminate.) Alternatively, use the standard C library instead of the underlying posix layer.

Memory issue in C where data is being overwritten

I've the following program:
// required include statements...
static char ***out;
char* get_parameter(char* key)
{
char *querystr = /*getenv("QUERY_STRING")*/ "abcdefg=abcdefghi";
if (querystr == NULL)
return (void*)0;
char s[strlen(querystr)] ;
strcpy(s, querystr);
const char delim = '&';
const char delim2 = '=';
static size_t size = 0;
if (out == 0)
{
out = (char*) malloc(sizeof(char*));
size = split(s, &delim, out);
}
int i=0;
for (; i<size; i++)
{
if ((*out)[i] != NULL)
{
char ***iout = NULL;
iout = (char*) malloc(sizeof(char*));
int isize = split((*out)[i], &delim2, iout);
if (isize > 1 && ((*iout)[1]) != NULL && strcmp(key, (*iout)[0]) == 0)
{
size_t _size = strlen((*iout)[1]);
char* value = (char*) malloc(_size*sizeof(char));
strcpy(value, (*iout)[1]);
free(iout);
return value;
}
}
}
return (void*) 0;
}
static size_t count(const char *str, char ch)
{
if (str == NULL) return 0;
size_t count = 1;
while (*str)
if (*str++ == ch) count++;
return count;
}
size_t split(const char *const str, const char* delim, char ***out)
{
size_t size = count(str, *delim);
*out = calloc(size, sizeof(char));
char* token = NULL;
char* tmp = (char*) str;
int i=0;
while ((token = strtok(tmp, delim)) != NULL)
{
tmp = NULL;
(*out)[i] = (char*) malloc(sizeof strlen(token));
strcpy((*out)[i++], token);
}
return size;
}
main()
{
char* val = get_parameter("abcdefg");
printf("%s\n", val); // it should prints `abcdefghi`, but it prints `abcd?`
free(val);
}
as appears in the main method, the function get_parameter should prints abcdefghi, but it prints abcd? where ? is a controls character with value of 17.
Why the reset of string is not printed? I think I mis-used the malloc to allocate appropriate space.
Also, is there any tool that I can use to know the internal representation of memory for my pointers?

You're dealing with C-Strings here. You must consider 1 additional byte for the NULL-termination ('\0')
Therefore:
char s[strlen(querystr)] ;
strcpy(s, querystr);
Is incorrect.
strlen will return 4 for string "abcd" but what you want is to allocate space for "abcd\0"
So you need strlen + 1

The lines
out = (char*) malloc(sizeof(char*));
iout = (char*) malloc(sizeof(char*));
are a problem.
sizeof() returns the number of bytes required to store an object of the given type, in this case, the size of a pointer (to a char). malloc() then allocates that many bytes (apparently 4 bytes on your architecture). To fix this, you need to give malloc the desired string length instead of using sizeof.
Additionally, the line
char* value = (char*) malloc(_size*sizeof(char));
has a completely unnecessary use of sizeof(). sizeof(char) is guaranteed by the standard to be 1.

You should use gdb to run your binary step by step and see what's wrong.
Valgrind is a very good tools, it will tell you what's line overwrite in memory, etc..

copy string one place to another

copy string between comma
input
(aaa),(ddD),(sss),(ppp)
p=malloc(sizeof(char)*200);
gets(p);
i want hold input
p[0]="(aaa)"
p[1]="(ddD)"
p[2]="(sss)"
p[3]="(ppp)"

You may have to use strtok.
Here is the complete solution to all your problems:
// tokens.c
#include <stdio.h>
#include <string.h> /* for strtok, strlen and strcpy. */
#include <stdlib.h> /* for malloc, realloc and free. */
static char **tokens = NULL; /* Dynamic array of string tokens. */
static int token_count = 0; /* Number of tokens added. */
/* Grows the `tokens' array as needed and appends `tok' to it. */
static void
copy_token (char *tok)
{
if (token_count == 0)
tokens = malloc (sizeof (char*));
else
tokens = realloc (tokens, sizeof (char*) * (token_count + 1));
tokens[token_count] = malloc (strlen (tok) + 1);
strcpy (tokens[token_count], tok);
++token_count;
}
/* Extracts tokens from `s' and calls copy_token to add it to `tokens'. */
static void
tokenize_by_comma (char *s)
{
char *tok = strtok (s, ",");
while (tok != NULL)
{
copy_token (tok);
tok = strtok (NULL, ",");
}
}
/* If you run copy_after, the total length of all tokens
must not exceed BUFF_SIZE. */
#define BUFF_SIZE 1024
static char s_copy[BUFF_SIZE + 1];
/* Makes a string of all the tokens by moving `s' next to `after'. */
static char *
copy_after (const char *s, const char *after)
{
int i;
int appended = 0;
strcpy (s_copy, "");
for (i = 0; i < token_count; ++i)
{
int is_s = (strcmp (tokens[i], s) == 0);
int is_after = (strcmp (tokens[i], after) == 0);
if (is_after)
{
strcat (s_copy, after);
strcat (s_copy, ",");
strcat (s_copy, s);
appended = 1;
}
else if (!is_s)
{
strcat (s_copy, tokens[i]);
appended = 1;
}
if (i != (token_count - 1) && appended)
strcat (s_copy, ",");
appended = 0;
}
return s_copy;
}
/* Prints the `tokens'. */
static void
print_tokens ()
{
int i;
for (i = 0; i < token_count; ++i)
printf ("%s\n", tokens[i]);
}
/* Frees the memory allocated for `tokens'. */
static void
free_tokens ()
{
int i;
for (i = 0; i < token_count; ++i)
free (tokens[i]);
free (tokens);
token_count = 0;
tokens = NULL;
}
/* Test. Pass the tokens as a single command line argument. */
int
main (int argc, char **argv)
{
tokenize_by_comma (argv[1]);
print_tokens ();
if (argc == 4)
{
printf ("%s\n", copy_after (argv[2], argv[3]));
}
free_tokens ();
return 0;
}
Test run:
$ ./tokens "(aaa),(ddD),(sss),(ppp)"
(aaa)
(ddD)
(sss)
(ppp)
$ ./tokens "(aaa),(ddD),(sss),(ppp)" "(ddD)" "(ppp)"
(aaa)
(ddD)
(sss)
(ppp)
(aaa),(sss),(ppp),(ddD)

Avoid using gets(3), it lead to some interesting issues even in the early days of the Internet due to easy buffer overflow. Use the fgets(3) instead.

If you're sure you're always going to have four inputs, you can use something like:
scanf("%[^,],%[^,],%[^,],%[^,]", p[0], p[1], p[2], p[3]);
if you don't know the number of inputs, you'd probably do the reading in a loop instead:
for (i=0; i<limit; i++)
if (!scanf("%[^,],", p[i]))
break;
if (i<limit)
scanf("%[^\n]", p[i]);
or, if you prefer, you could write the loop like this:
for (i=0; i<limit && scanf("%[^,],", p[i]); i++)
;
Either way, this reads data that doesn't contain a comma followed by a comma (that is read to verify its presence) until that fails. Assuming the data is in the proper format, that will fail when there's data without a trailing comma. We then do one more read after the loop to read the remainder of the line into the last item.
Note that if your data can also contain a comma, something like:
(aaa,bbb),(ccc,ddd)
where the first data item should be "(aaa,bbb)" and the second "(ccc,ddd)", this would not work -- for something like that, you could rewrite the conversion for an individual input to something like: "%[^)])," to read up to the closing parenthesis, followed by a parenthesis followed by a comma.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C split string function - c

You're not counting the null terminator and you are copying the wrong number of bytes char* strdup(char* str) { char* res = (char)malloc(strlen(str)); / what about the null terminator? / strncpy(res, str, sizeof(str)); / sizeof(str) is the same as sizeof (char) / return res; }

Your strdup() function is not correct. The sizeof(str) in there is the size of the str pointer (probably 4 or 8 bytes), not the length of the string. Use the library provided _strdup() instead.

malloc does not initialize the allocated memory to \0. Have you tried using calloc instead? I suspect the seg faults are due to the res[i] != NULL comparison.

Related

free allocated memory in function

Why does my string_split implementation not work?

Comparing char* in C using strcasecmp

Memory issue in C where data is being overwritten

copy string one place to another

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C split string function - c

You're not counting the null terminator and you are copying the wrong number of bytes char* strdup(char* str) { char* res = (char*)malloc(strlen(str)); /* what about the null terminator? */ strncpy(res, str, sizeof(str)); /* sizeof(str) ** is the same as ** sizeof (char*) */ return res; }

Your strdup() function is not correct. The sizeof(str) in there is the size of the str pointer (probably 4 or 8 bytes), not the length of the string. Use the library provided _strdup() instead.

malloc does not initialize the allocated memory to \0. Have you tried using calloc instead? I suspect the seg faults are due to the res[i] != NULL comparison.

Related

free allocated memory in function

Why does my string_split implementation not work?

Comparing char* in C using strcasecmp

Memory issue in C where data is being overwritten

copy string one place to another

Categories

Resources

You're not counting the null terminator and you are copying the wrong number of bytes char* strdup(char* str) { char* res = (char)malloc(strlen(str)); / what about the null terminator? / strncpy(res, str, sizeof(str)); / sizeof(str) is the same as sizeof (char) / return res; }