I have been trying to write a function that takes in strings as a line and returns a pointer to an array of words. The function written below does something similar
How can I rewrite the following code1 but it should be better than code2 by being able to change the delimiter. However, code1 works but during memory allocation the same memory is duplicated for the words array. Thereby causing word duplication.
Code 1:
char *split(const char *string) {
char *words[MAX_LENGTH / 2];
char *word = (char *)calloc(MAX_WORD, sizeof(char));
memset(word, ' ', sizeof(char));
static int index = 0;
int line_index = 0;
int word_index = 0;
while (string[line_index] != '\n') {
const char c = string[line_index];
if (c == ' ') {
word[word_index+ 1] = '\0';
memcpy(words + index, &word, sizeof(word));
index += 1;
if (word != NULL) {
free(word);
char *word = (char *)calloc(MAX_WORD, sizeof(char));
memset(word, ' ', sizeof(char));
}
++line_index;
word_index = 0;
continue;
}
if (c == '\t')
continue;
if (c == '.')
continue;
if (c == ',')
continue;
word[word_index] = c;
++word_index;
++line_index;
}
index = 0;
if (word != NULL) {
free(word);
}
return *words;
}
Code 2:
char **split(char *string) {
static char *words[MAX_LENGTH / 2];
static int index = 0;
// resetting words
for (int i = 0; i < sizeof(words) / sizeof(words[0]); i++) {
words[i] = NULL;
}
const char *delimiter = " ";
char *ptr = strtok(string, delimiter);
while (ptr != NULL) {
words[index] = ptr;
ptr = strtok(NULL, delimiter);
++index;
}
index = 0;
return words;
}
However I noticed that the memory of word+index is been reassigned to the same location thereby causing word duplication.
strtok() always returns a different pointer into the initial string. This cannot produce duplicates, unless you call it twice with the same input string (maybe with new contents).
However, your function returns a pointer to a static array, which is overwritten on each call to split(), voiding the results of all previous calls. To prevent this,
either allocate new memory in each call (which must be freed by the caller):
char *words = calloc(MAX_LENGTH / 2, 1);
or return a struct instead (which is always copied by value):
struct wordlist { char *word[MAX_LENGTH / 2]; };
wordlist split(char *string)
{
wordlist list = {};
/* ... */
list.word[index] = /* ... */;
/* ... */
return list;
}
Related
I'm pretty new to C and can figure out why this function doesn't work consistently whatsoever:
char **splitString(char *string) {
char *token = strtok(string, ","), **finalValue = NULL, **temp = NULL;
size_t wordIndex = 0;
while (token != NULL) {
temp = realloc(finalValue, sizeof(char *));
if (!temp) {
freeArray(finalValue);
finalValue = NULL;
break;
}
temp[wordIndex] = malloc((strlen(token)+1)*sizeof(char));
if (temp[wordIndex] == NULL) {
freeArray(finalValue);
finalValue = NULL;
break;
}
strcpy(temp[wordIndex], token);
printf("%s\n", temp[wordIndex]);
finalValue = temp;
printf("%s\n", finalValue[wordIndex]);
wordIndex++;
token = strtok(NULL, ",");
}
return finalValue;
}
It receives a string separated by commas and its supposed to split them into different strings, all of which were created via malloc/realloc.
The problem is here: temp = realloc(finalValue, sizeof(char *)); reallocates for a single pointer. You should write:
temp = realloc(finalValue, (wordIndex + 2) * sizeof(char *));
You should also set a NULL pointer at the end of the finalValue array to mark the end of this array as the number of entries is not returned by the function in any other way.
Also note that the allocated strings are not freed when realloc() or malloc() fails.
Furthermore, you should not use strtok() because it modifies the source string. An alternative approach with strspn(), strcspn() or manual testing and strndup() is recommended.
Finally, strtok() has another shortcoming which may be counterproductive: it considers any sequence of separators as a single separator and does not produce empty tokens. This is fine if you use whitespace as separator but probably incorrect for "," where you might expect "a,,c" to produce 3 tokens: "a", "" and "c".
Here is a modified version that can handle empty tokens:
char **splitString(const char *string) {
const char *p0, *p0;
size_t i = 0, n = 1;
char **array;
for (p = string; *p; p++) {
if (*p == ',')
n++;
}
array = calloc(sizeof(*array), n + 1);
if (array != NULL) {
array[n] = NULL; /* set a null pointer at the end of the array */
for (p = p0 = string, i = 0; i < n;) {
if (*p == ',' || *p == '\0') {
if ((array[i++] = strndup(p0, p - p0)) == NULL) {
/* allocation failure: free allocated strings and array */
while (i --> 0)
free(array[i]);
free(array);
array = NULL;
break;
}
if (*p == ',')
p0 = ++p;
else
p0 = p;
} else {
p++;
}
}
}
return array;
}
strndup() is a POSIX function available on many systems and that will be part of the next version of the C Standard. If it is not available on your target, here is a simple implementation:
char *strndup(const char *s, size_t n) {
char *p;
size_t i;
for (i = 0; i < n && s[i]; i++)
continue;
p = malloc(i + 1);
if (p) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}
So I have been searching through stack overflow for a little over an hour and I don't understand why this function is giving me a segmentation error. I want to create a string array, scan strings in through scanf, dynamically change the size of each string and return the string array. Can anyone help? Thank you.
char** readScores(int* count) {
int c = 0;
char** arr =(char**)malloc(100 * sizeof(char*));
char* in;
while(scanf("%s", in) != EOF) {
arr[c] = (char*)malloc(strlen(in)+1);
strcpy(arr[c], in);
}
*count = c;
return arr;
}
char* in;
while(scanf("%s", in) != EOF) {
This tells the computer to read from standard input into the char buffer that in points to.
Which does not exist, because in is not initialised to anything (let alone a valid buffer).
I would not use scanf only fgets.
You need to allocate memory dor the arr and for every line referenced by elements of arr
char** readScores(size_t *count) {
size_t lines = 0;
char** arr = NULL, **tmp;
char* in = malloc(MAXLINE), *result;
size_t len;
if(in)
{
do{
result = fgets(in, MAXLINE, stdin);
if(result)
{
len = strlen(in);
tmp = realloc(arr, sizeof(*tmp) * (lines + 1));
if(tmp)
{
arr = tmp;
len = strlen(in);
arr[lines] = malloc(len + (len == 0));
if(arr[lines])
{
if(len) memcpy(arr[lines], in, len - 1);
arr[lines++][len] = 0;
}
else
{
// error handling
}
}
else
{
// error handling
}
}
}while(result);
free(in);
}
*count = lines;
return arr;
}
I have a variable length string that I am trying to divide from plus signs and study on:
char string[] = "var1+vari2+varia3";
for (int i = 0; i != sizeof(string); i++) {
memcpy(buf, string[0], 4);
buf[9] = '\0';
}
since variables are different in size I am trying to write something that is going to take string into loop and extract (divide) variables. Any suggestions ? I am expecting result such as:
var1
vari2
varia3
You can use strtok() to break the string by delimiter
char string[]="var1+vari2+varia3";
const char delim[] = "+";
char *token;
/* get the first token */
token = strtok(string, delim);
/* walk through other tokens */
while( token != NULL ) {
printf( " %s\n", token );
token = strtok(NULL, delim);
}
More info about the strtok() here: https://man7.org/linux/man-pages/man3/strtok.3.html
It seems to me that you don't just want to want to print the individual strings but want to save the individual strings in some buffer.
Since you can't know the number of strings nor the length of the individual string, you should allocate memory dynamic, i.e. use functions like realloc, calloc and malloc.
It can be implemented in several ways. Below is one example. To keep the example simple, it's not performance optimized in anyway.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
char** split_string(const char* string, const char* token, int* num)
{
assert(string != NULL);
assert(token != NULL);
assert(num != NULL);
assert(strlen(token) != 0);
char** data = NULL;
int num_strings = 0;
while(*string)
{
// Allocate memory for one more string pointer
char** ptemp = realloc(data, (num_strings + 1) * sizeof *data);
if (ptemp == NULL) exit(1);
data = ptemp;
// Look for token
char* tmp = strstr(string, token);
if (tmp == NULL)
{
// Last string
// Allocate memory for one more string and copy it
int len = strlen(string);
data[num_strings] = calloc(len + 1, 1);
if (data[num_strings] == NULL) exit(1);
memcpy(data[num_strings], string, len);
++num_strings;
break;
}
// Allocate memory for one more string and copy it
int len = tmp - string;
data[num_strings] = calloc(len + 1, 1);
if (data[num_strings] == NULL) exit(1);
memcpy(data[num_strings], string, len);
// Prepare to search for next string
++num_strings;
string = tmp + strlen(token);
}
*num = num_strings;
return data;
}
int main()
{
char string[]="var1+vari2+varia3";
// Split the string into dynamic allocated memory
int num_strings;
char** data = split_string(string, "+", &num_strings);
// Now data can be used as an array-of-strings
// Example: Print the strings
printf("Found %d strings:\n", num_strings);
for(int i = 0; i < num_strings; ++i) printf("%s\n", data[i]);
// Free the memory
for(int i = 0; i < num_strings; ++i) free(data[i]);
free(data);
}
Output
Found 3 strings:
var1
vari2
varia3
You can use a simple loop scanning the string for + signs:
char string[] = "var1+vari2+varia3";
char buf[sizeof(string)];
int start = 0;
for (int i = 0;;) {
if (string[i] == '+' || string[i] == '\0') {
memcpy(buf, string + start, i - start);
buf[i - start] = '\0';
// buf contains the substring, use it as a C string
printf("%s\n", buf);
if (string[i] == '\0')
break;
start = ++i;
} else {
i++;
}
}
Your code does not have any sense.
I wrote such a function for you. Analyse it as sometimes is good to have some code as a base
char *substr(const char *str, char *buff, const size_t start, const size_t len)
{
size_t srcLen;
char *result = buff;
if(str && buff)
{
if(*str)
{
srcLen = strlen(str);
if(srcLen < start + len)
{
if(start < srcLen) strcpy(buff, str + start);
else buff[0] = 0;
}
else
{
memcpy(buff, str + start, len);
buff[len] = 0;
}
}
else
{
buff[0] = 0;
}
}
return result;
}
https://godbolt.org/z/GjMEqx
I want to split a string by a delimiter and keep the delimiter in the token list
I have a function that do the same thing as strtok but with a string delimiter (instead of a set of chars) but it doesn't keep the delimiter and can't take an array of delimiters as argument
This is a function that split a string into tokens as strtok does but taking a delimiter
static char *strtokstr(char *str, char *delimiter)
{
static char *string;
char *end;
char *ret;
if (str != NULL)
string = str;
if (string == NULL)
return string;
end = strstr(string, delimiter);
if (end == NULL) {
char *ret = string;
string = NULL;
return ret;
}
ret = string;
*end = '\0';
string = end + strlen(delimiter);
return ret;
}
I want to have a char **split(char *str, char **delimiters_list) that split a string by a set of delimiters and keep the delimiter in the token list
I think I also need a function to count the number of tokens so i can malloc the return of my split function
// delimiters is an array containing ["&&", "||" and NULL]
split("ls > file&&foo || bar", delimiters) should return an array containing ["ls > file", "&&", "foo ", "||", " bar"]
How that can be achieved ?
First, you have a memory error here :
static char *string;
if (str != NULL)
string = str;
if (string == NULL)
return string;
If stris NULL, string is not initialised and you use a uninitialised value in comparaison.
if you want copy a string, you must use the strdup function, the = will just copy the pointer and not the pointer content.
And here a way to do it :
#include <stdlib.h>
#include <string.h>
char *get_delimiters(char *str, char **delims)
{
for (int i = 0; delims[i]; i++)
if (!strncmp(str, delims[i], strlen(delims[i])))
return delims[i];
return NULL;
}
char **split(char *str, char **delimiters)
{
char *string = strdup(str);
char **result = NULL;
int n = 0;
char *delim = NULL;
for (int i = 0; string[i]; i++)
if (get_delimiters(string + i, delimiters))
n++;
result = malloc((n * 2 + 2) * sizeof(char *));
if (!result)
return NULL;
result[0] = string;
n = 1;
for (int i = 0; string[i]; i++) {
delim = get_delimiters(string + i, delimiters);
if (delim) {
string[i] = '\0';
result[n++] = delim;
result[n++] = string + i + strlen(delim);
}
}
result[n] = NULL;
return result;
}
result :
[0] 'ls > file'
[1] '&&'
[2] 'foo '
[3] '||'
[4] ' bar'
remember result and string are malloced, so you must free the result and result[0]
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
char **split(char *str, char **delimiters, int number_of_delimiters, int *number_of_rows_in_return_array);
int main()
{
char **split_str;
char *delimiters[] = {
"&&",
"||"
};
int rows_in_returned_array;
split_str = split("ls > file&&foo || bar && abc ||pqwe", delimiters, 2 , &rows_in_returned_array);
int i;
for (i = 0 ; i < rows_in_returned_array ; ++i)
{
printf("\n%s\n", split_str[i]);
}
return 0;
}
char **split(char *str, char **delimiters, int number_of_delimiters, int *number_of_rows_in_return_array)
{
//temporary storage for array to be returned
char temp_store[100][200];
int row = 0;//row size of array that will be returned
char **split_str;
int i, j, k, l, mark = 0;
char temp[100];
for (i = 0 ; str[i] != '\0' ; ++i)
{
//Iterating through all delimiters to check if any is str
for (j = 0 ; j < number_of_delimiters ; ++j )
{
l = i;
for (k = 0 ; delimiters[j][k] != '\0' ; ++k)
{
if (str[i] != delimiters[j][k])
{
break;
}
++l;
}
//This means delimiter is in string
if (delimiters[j][k] == '\0')
{
//store the string before delimiter
strcpy(temp_store[row], &str[mark]);
temp_store[row ++][i - mark] = '\0';
//store string after delimiter
strcpy(temp_store[row], &str[i]);
temp_store[row ++][k] = '\0';
//mark index where this delimiter ended
mark = l;
//Set i to where delimiter ends and break so that outermost loop
//can iterate from where delimiter ends
i = l - 1;
break;
}
}
}
//store the string remaining
strcpy(temp_store[row++], &str[mark]);
//Allocate the split_str and store temp_store into it
split_str = (char **)malloc(row * sizeof(char *));
for (i=0 ; i < row; i++)
{
split_str[i] = (char *)malloc(200 * sizeof(char));
strcpy(split_str[i], temp_store[i]);
}
*number_of_rows_in_return_array = row;
return split_str;
}
This should probably work. Note that I have passed int * number_of_rows_in_return_array by ref because we need to know the row size of the retuned array.
I went into abstraction. First I created a "sentence" library, that allows for manipulating NULL terminated list of strings (char*). I wrote some initial accessors (sentence_init, sentence_size, sentence_free, sentence_add_str etc.).
Then I went to split, witch becomes really, really easy then - if a delimeter is found, add the string up the delimeter to the sentence and add the delimeter to the sentence. Then increment the string pointer position. If the delimeter is not found, add the remaining string to the sentence.
There is a real problem with double pointers tho, because char ** is not implicitly convertible to const char **. For production code, I would probably aim to refactor the code, and try to take const-correctness into account.
#define _GNU_SOURCE 1
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <assert.h>
#include <stdbool.h>
/*
* sentence - list of words
*/
/* ----------------------------------------------------------- */
// if this would be production code, I think I would go with a
// struct word_t { char *word; }; struct sentence_t { struct word_t *words; };
// Note: when sentence_add_* fail - they free *EVERYTHING*, so it doesn't work like realloc
// shared_ptr? Never heard of it.
char **sentence_init(void) {
return NULL;
}
size_t sentence_size(char * const *t) {
if (t == NULL) return 0;
size_t i;
for (i = 0; t[i] != NULL; ++i) {
continue;
}
return i;
}
void sentence_free(char * const *t) {
if (t == NULL) return;
for (char * const *i = t; *i != NULL; ++i) {
free(*i);
}
free((void*)t);
}
void sentence_printex(char * const *t, const char *fmt1, const char *delim, const char *end) {
for (char * const *i = t; *i != NULL; ++i) {
printf(fmt1, *i);
if (*(i + 1) != NULL) {
printf(delim);
}
}
printf(end);
}
void sentence_print(char * const *t) {
sentence_printex(t, "%s", " ", "\n");
}
void sentence_print_quote_words(char * const *t) {
sentence_printex(t, "'%s'", " ", "\n");
}
bool sentence_cmp_const(const char * const *t, const char * const *other) {
const char * const *t_i = t;
const char * const *o_i = other;
while (*t_i != NULL && o_i != NULL) {
if (strcmp(*t_i, *o_i) != 0) {
return false;
}
++t_i;
++o_i;
}
return *t_i == NULL && *o_i == NULL;
}
// thet's always funny, because "dupa" in my language means "as*"
char **sentence_add_strdupped(char **t, char *strdupped) {
const size_t n = sentence_size(t);
const size_t add = 1 + 1;
const size_t new_n = n + add;
void * const pnt = realloc(t, new_n * sizeof(char*));
if (pnt == NULL) goto REALLOC_FAIL;
// we have to have place for terminating NULL pointer
assert(new_n >= 2);
t = pnt;
t[new_n - 2] = strdupped;
t[new_n - 1] = NULL;
// ownership of str goes to t
return t;
// ownership of str stays in the caller
REALLOC_FAIL:
sentence_free(t);
return NULL;
}
char **sentence_add_strlened(char **t, const char *str, size_t len) {
char *strdupped = malloc(len + 1);
if (strdupped == NULL) goto MALLOC_FAIL;
memcpy(strdupped, str, len);
strdupped[len] = '\0';
t = sentence_add_strdupped(t, strdupped);
if (t == NULL) goto SENTENCE_ADD_STRDUPPED_FAIL;
return t;
SENTENCE_ADD_STRDUPPED_FAIL:
free(strdupped);
MALLOC_FAIL:
sentence_free(t);
return NULL;
}
char **sentence_add_str(char **t, const char *str) {
const size_t str_len = strlen(str);
return sentence_add_strlened(t, str, str_len);
}
/* ----------------------------------------------------------- */
/**
* Puff. Run strstr for each of the elements inside NULL delimeters dellist.
* If any returns not NULL, return the pointer as returned by strstr
* And fill dellist_found with the pointer inside dellist (can be NULL).
* Finally! A 3 star award is mine!
*/
char *str_find_any_strings(const char *str,
const char * const *dellist,
const char * const * *dellist_found) {
assert(str != NULL);
assert(dellist != NULL);
for (const char * const *i = &dellist[0]; *i != NULL; ++i) {
const char *found = strstr(str, *i);
if (found != NULL) {
if (dellist_found != NULL) {
*dellist_found = i;
}
// __UNCONST(found)
return (char*)found;
}
}
return NULL;
}
/**
* Split the string str according to the list od delimeters dellist
* #param str
* #param dellist
* #return returns a dictionary
*/
char **split(const char *str, const char * const *dellist) {
assert(str != NULL);
assert(dellist != NULL);
char **sen = sentence_init();
while (*str != '\0') {
const char * const *del_pnt = NULL;
const char *found = str_find_any_strings(str, dellist, &del_pnt);
if (found == NULL) {
// we don't want an empty string to be the last...
if (*str != '\0') {
sen = sentence_add_str(sen, str);
if (sen == NULL) return NULL;
}
break;
}
// Puff, so a delimeter is found at &str[found - str]
const size_t idx = found - str;
sen = sentence_add_strlened(sen, str, idx);
if (sen == NULL) return NULL;
assert(del_pnt != NULL);
const char *del = *del_pnt;
assert(del != NULL);
assert(*del != '\0');
const size_t del_len = strlen(del);
sen = sentence_add_strlened(sen, del, del_len);
if (sen == NULL) return NULL;
str += idx + del_len;
}
return sen;
}
int main()
{
char **sentence = split("ls > file&&foo || bar", (const char*[]){"&&", "||", NULL});
assert(sentence != NULL);
sentence_print_quote_words(sentence);
printf("cmp = %d\n", sentence_cmp_const((void*)sentence, (const char*[]){"ls > file", "&&", "foo ", "||", " bar", NULL}));
sentence_free(sentence);
return 0;
}
The program will output:
'ls > file' '&&' 'foo ' '||' ' bar'
cmp = 1
Below is my code for the following issue. I'm trying to take a string of first names and string of last names that are separated by commas and transform them into a list of full names. For example, if firstnames = "John,Jane" and lastnames = "Smith,Doe", then the output should be ["John Smith", "Jane Doe"].
I believe my issue is arising in my use of strtok since first_names[i] = name is giving me an error. Any help on this would be much appreciated!
char **combine_names(char *firstnames, char *lastnames) {
char first_names[50][50];
char last_names[50][50];
int i = 0;
char *name = strtok(firstnames, ",");
while (name != NULL) {
first_names[i] = name;
i++;
name = strtok(NULL, ",");
}
i = 0;
name = strtok(lastnames, ",");
while (name != NULL) {
last_names[i] = name;
i++;
name = strtok(NULL, ",");
}
char **names;
names = malloc(strlen(first_names) * sizeof(char*));
for (int i = 0; i < strlen(first_names); i++) {
names[i] = malloc(51 * sizeof(char));
}
int i = 0;
int j = 0;
int k = 0;
while (first_names[i] != '\0') {
while (first_names[i][j] != '\0') {
names[i][j] = first_names[i][j];
j++;
}
names[i][j] = ' ';
j++;
while (second_names[i][k] != '\0') {
names[i][j] = second_names[i][k];
j++;
k++;
}
names[i][j] = '\0';
i++;
}
names[i] = '\0';
return names;
}
The following line is causing an incompatible pointer error with the first argument. Why is that?
names = malloc(strlen(first_names) * sizeof(char*));
Using strtok() does indeed pose some problems, but the main issue is your allocating names with an invalid expression malloc(strlen(first_names) * sizeof(char*));. first_names is not a C string, strlen(first_names) does not compute the number of entries in the first_names array.
Here is a simpler and safer approach:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **combine_names(const char *firstnames, const char *lastnames) {
int n = 0;
char **names = malloc(sizeof(*names) * (n + 1));
char *p;
if (names == NULL) {
perror("cannot allocate memory\n");
}
while (*firstnames && *lastnames) {
int len1 = strcspn(firstnames, ",");
int len2 = strcspn(lastnames, ",");
int size = len1 + 1 + len2 + 1;
p = malloc(size);
if (p == NULL) {
perror("cannot allocate memory\n");
}
snprintf(p, size, "%.*s%s%.*s",
len1, firstnames,
len1 && len2 ? " " : "",
len2, lastnames);
names = realloc(names, sizeof(*names) * (n + 2));
if (names == NULL) {
perror("cannot allocate memory\n");
}
names[n++] = p;
firstnames += len1 + (firstnames[len1] == ',');
lastnames += len2 + (lastnames[len2] == ',');
}
names[n] = NULL;
return names;
}
Remember that first_names is a double array of characters. That means that first_names[i] is actually a string, or an array of characters. You can't assign directly to an array of characters, instead you have to write to the array character by character. The easiest way to do this is using string copy or
strcpy(first_names[i], name), but strcpy doesn't protect against buffer overflows. One method is to use strncpy, just be careful because this will not guarantee the string is null-terminated when the source string exceeds the size of the destination string. To fix this, use
strncpy(first_names[i], name, 50);
first_names[i][49] = '\0';
Considering the disadvantages of strncpy it's probably best to use a solution similar to #chqrlie.