Combining Strings into a List in C - c

Below is my code for the following issue. I'm trying to take a string of first names and string of last names that are separated by commas and transform them into a list of full names. For example, if firstnames = "John,Jane" and lastnames = "Smith,Doe", then the output should be ["John Smith", "Jane Doe"].
I believe my issue is arising in my use of strtok since first_names[i] = name is giving me an error. Any help on this would be much appreciated!
char **combine_names(char *firstnames, char *lastnames) {
char first_names[50][50];
char last_names[50][50];
int i = 0;
char *name = strtok(firstnames, ",");
while (name != NULL) {
first_names[i] = name;
i++;
name = strtok(NULL, ",");
}
i = 0;
name = strtok(lastnames, ",");
while (name != NULL) {
last_names[i] = name;
i++;
name = strtok(NULL, ",");
}
char **names;
names = malloc(strlen(first_names) * sizeof(char*));
for (int i = 0; i < strlen(first_names); i++) {
names[i] = malloc(51 * sizeof(char));
}
int i = 0;
int j = 0;
int k = 0;
while (first_names[i] != '\0') {
while (first_names[i][j] != '\0') {
names[i][j] = first_names[i][j];
j++;
}
names[i][j] = ' ';
j++;
while (second_names[i][k] != '\0') {
names[i][j] = second_names[i][k];
j++;
k++;
}
names[i][j] = '\0';
i++;
}
names[i] = '\0';
return names;
}
The following line is causing an incompatible pointer error with the first argument. Why is that?
names = malloc(strlen(first_names) * sizeof(char*));

Using strtok() does indeed pose some problems, but the main issue is your allocating names with an invalid expression malloc(strlen(first_names) * sizeof(char*));. first_names is not a C string, strlen(first_names) does not compute the number of entries in the first_names array.
Here is a simpler and safer approach:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **combine_names(const char *firstnames, const char *lastnames) {
int n = 0;
char **names = malloc(sizeof(*names) * (n + 1));
char *p;
if (names == NULL) {
perror("cannot allocate memory\n");
}
while (*firstnames && *lastnames) {
int len1 = strcspn(firstnames, ",");
int len2 = strcspn(lastnames, ",");
int size = len1 + 1 + len2 + 1;
p = malloc(size);
if (p == NULL) {
perror("cannot allocate memory\n");
}
snprintf(p, size, "%.*s%s%.*s",
len1, firstnames,
len1 && len2 ? " " : "",
len2, lastnames);
names = realloc(names, sizeof(*names) * (n + 2));
if (names == NULL) {
perror("cannot allocate memory\n");
}
names[n++] = p;
firstnames += len1 + (firstnames[len1] == ',');
lastnames += len2 + (lastnames[len2] == ',');
}
names[n] = NULL;
return names;
}

Remember that first_names is a double array of characters. That means that first_names[i] is actually a string, or an array of characters. You can't assign directly to an array of characters, instead you have to write to the array character by character. The easiest way to do this is using string copy or
strcpy(first_names[i], name), but strcpy doesn't protect against buffer overflows. One method is to use strncpy, just be careful because this will not guarantee the string is null-terminated when the source string exceeds the size of the destination string. To fix this, use
strncpy(first_names[i], name, 50);
first_names[i][49] = '\0';
Considering the disadvantages of strncpy it's probably best to use a solution similar to #chqrlie.

Related

splitting a long dynamic string into an array of strings in c

I'm pretty new to C and can figure out why this function doesn't work consistently whatsoever:
char **splitString(char *string) {
char *token = strtok(string, ","), **finalValue = NULL, **temp = NULL;
size_t wordIndex = 0;
while (token != NULL) {
temp = realloc(finalValue, sizeof(char *));
if (!temp) {
freeArray(finalValue);
finalValue = NULL;
break;
}
temp[wordIndex] = malloc((strlen(token)+1)*sizeof(char));
if (temp[wordIndex] == NULL) {
freeArray(finalValue);
finalValue = NULL;
break;
}
strcpy(temp[wordIndex], token);
printf("%s\n", temp[wordIndex]);
finalValue = temp;
printf("%s\n", finalValue[wordIndex]);
wordIndex++;
token = strtok(NULL, ",");
}
return finalValue;
}
It receives a string separated by commas and its supposed to split them into different strings, all of which were created via malloc/realloc.
The problem is here: temp = realloc(finalValue, sizeof(char *)); reallocates for a single pointer. You should write:
temp = realloc(finalValue, (wordIndex + 2) * sizeof(char *));
You should also set a NULL pointer at the end of the finalValue array to mark the end of this array as the number of entries is not returned by the function in any other way.
Also note that the allocated strings are not freed when realloc() or malloc() fails.
Furthermore, you should not use strtok() because it modifies the source string. An alternative approach with strspn(), strcspn() or manual testing and strndup() is recommended.
Finally, strtok() has another shortcoming which may be counterproductive: it considers any sequence of separators as a single separator and does not produce empty tokens. This is fine if you use whitespace as separator but probably incorrect for "," where you might expect "a,,c" to produce 3 tokens: "a", "" and "c".
Here is a modified version that can handle empty tokens:
char **splitString(const char *string) {
const char *p0, *p0;
size_t i = 0, n = 1;
char **array;
for (p = string; *p; p++) {
if (*p == ',')
n++;
}
array = calloc(sizeof(*array), n + 1);
if (array != NULL) {
array[n] = NULL; /* set a null pointer at the end of the array */
for (p = p0 = string, i = 0; i < n;) {
if (*p == ',' || *p == '\0') {
if ((array[i++] = strndup(p0, p - p0)) == NULL) {
/* allocation failure: free allocated strings and array */
while (i --> 0)
free(array[i]);
free(array);
array = NULL;
break;
}
if (*p == ',')
p0 = ++p;
else
p0 = p;
} else {
p++;
}
}
}
return array;
}
strndup() is a POSIX function available on many systems and that will be part of the next version of the C Standard. If it is not available on your target, here is a simple implementation:
char *strndup(const char *s, size_t n) {
char *p;
size_t i;
for (i = 0; i < n && s[i]; i++)
continue;
p = malloc(i + 1);
if (p) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}

Copying specific number of characters from a string to another

I have a variable length string that I am trying to divide from plus signs and study on:
char string[] = "var1+vari2+varia3";
for (int i = 0; i != sizeof(string); i++) {
memcpy(buf, string[0], 4);
buf[9] = '\0';
}
since variables are different in size I am trying to write something that is going to take string into loop and extract (divide) variables. Any suggestions ? I am expecting result such as:
var1
vari2
varia3
You can use strtok() to break the string by delimiter
char string[]="var1+vari2+varia3";
const char delim[] = "+";
char *token;
/* get the first token */
token = strtok(string, delim);
/* walk through other tokens */
while( token != NULL ) {
printf( " %s\n", token );
token = strtok(NULL, delim);
}
More info about the strtok() here: https://man7.org/linux/man-pages/man3/strtok.3.html
It seems to me that you don't just want to want to print the individual strings but want to save the individual strings in some buffer.
Since you can't know the number of strings nor the length of the individual string, you should allocate memory dynamic, i.e. use functions like realloc, calloc and malloc.
It can be implemented in several ways. Below is one example. To keep the example simple, it's not performance optimized in anyway.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
char** split_string(const char* string, const char* token, int* num)
{
assert(string != NULL);
assert(token != NULL);
assert(num != NULL);
assert(strlen(token) != 0);
char** data = NULL;
int num_strings = 0;
while(*string)
{
// Allocate memory for one more string pointer
char** ptemp = realloc(data, (num_strings + 1) * sizeof *data);
if (ptemp == NULL) exit(1);
data = ptemp;
// Look for token
char* tmp = strstr(string, token);
if (tmp == NULL)
{
// Last string
// Allocate memory for one more string and copy it
int len = strlen(string);
data[num_strings] = calloc(len + 1, 1);
if (data[num_strings] == NULL) exit(1);
memcpy(data[num_strings], string, len);
++num_strings;
break;
}
// Allocate memory for one more string and copy it
int len = tmp - string;
data[num_strings] = calloc(len + 1, 1);
if (data[num_strings] == NULL) exit(1);
memcpy(data[num_strings], string, len);
// Prepare to search for next string
++num_strings;
string = tmp + strlen(token);
}
*num = num_strings;
return data;
}
int main()
{
char string[]="var1+vari2+varia3";
// Split the string into dynamic allocated memory
int num_strings;
char** data = split_string(string, "+", &num_strings);
// Now data can be used as an array-of-strings
// Example: Print the strings
printf("Found %d strings:\n", num_strings);
for(int i = 0; i < num_strings; ++i) printf("%s\n", data[i]);
// Free the memory
for(int i = 0; i < num_strings; ++i) free(data[i]);
free(data);
}
Output
Found 3 strings:
var1
vari2
varia3
You can use a simple loop scanning the string for + signs:
char string[] = "var1+vari2+varia3";
char buf[sizeof(string)];
int start = 0;
for (int i = 0;;) {
if (string[i] == '+' || string[i] == '\0') {
memcpy(buf, string + start, i - start);
buf[i - start] = '\0';
// buf contains the substring, use it as a C string
printf("%s\n", buf);
if (string[i] == '\0')
break;
start = ++i;
} else {
i++;
}
}
Your code does not have any sense.
I wrote such a function for you. Analyse it as sometimes is good to have some code as a base
char *substr(const char *str, char *buff, const size_t start, const size_t len)
{
size_t srcLen;
char *result = buff;
if(str && buff)
{
if(*str)
{
srcLen = strlen(str);
if(srcLen < start + len)
{
if(start < srcLen) strcpy(buff, str + start);
else buff[0] = 0;
}
else
{
memcpy(buff, str + start, len);
buff[len] = 0;
}
}
else
{
buff[0] = 0;
}
}
return result;
}
https://godbolt.org/z/GjMEqx

Free, invalid pointer

I have a program, that splits strings based on the delimiter. I have also, 2 other functions, one that prints the returned array and another that frees the array.
My program prints the array and returns an error when the free array method is called. Below is the full code.
#include "stringsplit.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
/* Split string by another string, return split parts + NULL in array.
*
* Parameters:
* str: the string to split
* split: the string to split str with
*
* Returns:
* A dynamically reserved array of dynamically reserved string parts.
*
* For example called with "Test string split" and " ",
* returns ["Test", "string", "split", NULL].
* Or called with "Another - test" and " - ",
* returns ["Another", "test", NULL].
*/
unsigned long int getNofTokens(const char *string) {
char *stringCopy;
unsigned long int stringLength;
unsigned long int count = 0;
stringLength = (unsigned)strlen(string);
stringCopy = malloc((stringLength + 1) * sizeof(char));
strcpy(stringCopy, string);
if (strtok(stringCopy, " \t") != NULL) {
count++;
while (strtok(NULL, " \t") != NULL)
count++;
}
free(stringCopy);
return count;
}
char **split_string(const char *str, const char *split) {
unsigned long int count = getNofTokens(str);
char **result;
result = malloc(sizeof(char *) * count + 1);
char *tmp = malloc(sizeof(char) * strlen(str));
strcpy(tmp, str);
char *token = strtok(tmp, split);
int idx = 0;
while (token != NULL) {
result[idx++] = token;
token = strtok(NULL, split);
}
return result;
}
void print_split_string(char **split_string) {
for (int i = 0; split_string[i] != NULL; i++) {
printf("%s\n", split_string[i]);
}
}
void free_split_string(char **split_string) {
for (int i = 0; split_string[i] != NULL; i++) {
char *currentPointer = split_string[i];
free(currentPointer);
}
free(split_string);
}
Also, do I need to explicitly add \0 at the end of the array or does strtok add it automatically?
There are some problems in your code:
[Major] the function getNofTokens() does not take the separator string as an argument, it counts the number of words separated by blanks, potentially returning an inconsistent count to its caller.
[Major] the size allocated in result = malloc(sizeof(char *) * count + 1); is incorrect: it should be:
result = malloc(sizeof(char *) * (count + 1));
Storing the trailing NULL pointer will write beyond the end of the allocated space.
[Major] storing the said NULL terminator at the end of the array is indeed necessary, as the block of memory returned by malloc() is uninitialized.
[Major] the copy of the string allocated and parsed by split_string cannot be safely freed because the pointer tmp is not saved anywhere. The pointer to the first token will be different from tmp in 2 cases: if the string contains only delimiters (no token found) or if the string starts with a delimiter (the initial delimiters will be skipped). In order to simplify the code and make it reliable, each token could be duplicated and tmp should be freed. In fact your free_split_string() function relies on this behavior. With the current implementation, the behavior is undefined.
[Minor] you use unsigned long and int inconsistently for strings lengths and array index variables. For consistency, you should use size_t for both.
[Remark] you should allocate string copies with strdup(). If this POSIX standard function is not available on your system, write a simple implementation.
[Major] you never test for memory allocation failure. This is OK for testing purposes and throw away code, but such potential failures should always be accounted for in production code.
[Remark] strtok() is a tricky function to use: it modifies the source string and keeps a hidden static state that makes it non-reentrant. You should avoid using this function although in this particular case it performs correctly, but if the caller of split_string or getNofTokens relied on this hidden state being preserved, it would get unexpected behavior.
Here is a modified version:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "stringsplit.h"
/* Split string by another string, return split parts + NULL in array.
*
* Parameters:
* str: the string to split
* split: the string to split str with
*
* Returns:
* A dynamically reserved array of dynamically reserved string parts.
*
* For example called with "Test string split" and " ",
* returns ["Test", "string", "split", NULL].
* Or called with "Another - test" and " - ",
* returns ["Another", "test", NULL].
*/
size_t getNofTokens(const char *string, const char *split) {
char *tmp = strdup(string);
size_t count = 0;
if (strtok(tmp, split) != NULL) {
count++;
while (strtok(NULL, split) != NULL)
count++;
}
free(tmp);
return count;
}
char **split_string(const char *str, const char *split) {
size_t count = getNofTokens(str, split);
char **result = malloc(sizeof(*result) * (count + 1));
char *tmp = strdup(str);
char *token = strtok(tmp, split);
size_t idx = 0;
while (token != NULL && idx < count) {
result[idx++] = strdup(token);
token = strtok(NULL, split);
}
result[idx] = NULL;
free(tmp);
return result;
}
void print_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
printf("%s\n", split_string[i]);
}
}
void free_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
free(split_string[i]);
}
free(split_string);
}
Here is an alternative without strtok() and without intermediary allocations:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "stringsplit.h"
size_t getNofTokens(const char *str, const char *split) {
size_t count = 0;
size_t pos = 0, len;
for (pos = 0;; pos += len) {
pos += strspn(str + pos, split); // skip delimiters
len = strcspn(str + pos, split); // parse token
if (len == '\0')
break;
count++;
}
return count;
}
char **split_string(const char *str, const char *split) {
size_t count = getNofTokens(str, split);
char **result = malloc(sizeof(*result) * (count + 1));
size_t pos, len, idx;
for (pos = 0, idx = 0; idx < count; pos += len, idx++) {
pos += strspn(str + pos, split); // skip delimiters
len = strcspn(str + pos, split); // parse token
if (len == '\0')
break;
result[idx] = strndup(str + pos, len);
}
result[idx] = NULL;
return result;
}
void print_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
printf("%s\n", split_string[i]);
}
}
void free_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
free(split_string[i]);
}
free(split_string);
}
EDIT After re-reading the specification in your comment, there seems to be some potential confusion as to the semantics of the split argument:
if split is a set of delimiters, the above code does the job. And the examples will be split as expected.
if split is an actual string to match explicitly, the above code only works by coincidence on the examples given in the comment.
To implement the latter semantics, you should use strstr() to search for the split substring in both getNofTokens and split_string.
Here is an example:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "stringsplit.h"
/* Split string by another string, return split parts + NULL in array.
*
* Parameters:
* str: the string to split
* split: the string to split str with
*
* Returns:
* A dynamically reserved array of dynamically reserved string parts.
*
* For example called with "Test string split" and " ",
* returns ["Test", "string", "split", NULL].
* Or called with "Another - test" and " - ",
* returns ["Another", "test", NULL].
*/
size_t getNofTokens(const char *str, const char *split) {
const char *p;
size_t count = 1;
size_t len = strlen(split);
if (len == 0)
return strlen(str);
for (p = str; (p = strstr(p, split)) != NULL; p += len)
count++;
return count;
}
char **split_string(const char *str, const char *split) {
size_t count = getNofTokens(str, split);
char **result = malloc(sizeof(*result) * (count + 1));
size_t len = strlen(split);
size_t idx;
const char *p = str;
for (idx = 0; idx < count; idx++) {
const char *q = strstr(p, split);
if (q == NULL) {
q = p + strlen(p);
} else
if (q == p && *q != '\0') {
q++;
}
result[idx] = strndup(p, q - p);
p = q + len;
}
result[idx] = NULL;
return result;
}
void print_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
printf("%s\n", split_string[i]);
}
}
void free_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
free(split_string[i]);
}
free(split_string);
}
When debugging, take note of values that you got from malloc, strdup, etc. Let's call these values "the active set". It's just a name, so that we can refer to them. You get a pointer from those functions, you mentally add it to the active set. When you call free, you can only pass values from the active set, and after free returns, you mentally remove them from the set. Any other use of free is invalid and a bug.
You can easily find this out by putting breakpoints after all memory allocations, so that you can write down the pointer values, and then breakpoints on all frees, so that you can see if one of those pointer values got passed to free - since, again, to do otherwise is to misuse free.
This can be done also using "printf" debugging. Like this:
char *buf = malloc(...); // or strdup, or ...
fprintf(stderr, "+++ Alloc %8p\n", buf);
And then whenever you have free, do it again:
fprintf(stderr, "--- Free %8p\n", ptr);
free(ptr);
In the output of the program, you must be able to match every +++ with ---. If you see any --- with a value that wasn't earlier listed with a +++, there's your problem: that's the buggy invocation of free :)
I suggest using fprintf(stderr, ... instead of printf(..., since the former is typically unbuffered, so if your program crashes, you won't miss any output. printf is buffered on some architectures (and not buffered on others - so much for consistency).

Reimplementing split function in C

I have been trying to write a function that takes in strings as a line and returns a pointer to an array of words. The function written below does something similar
How can I rewrite the following code1 but it should be better than code2 by being able to change the delimiter. However, code1 works but during memory allocation the same memory is duplicated for the words array. Thereby causing word duplication.
Code 1:
char *split(const char *string) {
char *words[MAX_LENGTH / 2];
char *word = (char *)calloc(MAX_WORD, sizeof(char));
memset(word, ' ', sizeof(char));
static int index = 0;
int line_index = 0;
int word_index = 0;
while (string[line_index] != '\n') {
const char c = string[line_index];
if (c == ' ') {
word[word_index+ 1] = '\0';
memcpy(words + index, &word, sizeof(word));
index += 1;
if (word != NULL) {
free(word);
char *word = (char *)calloc(MAX_WORD, sizeof(char));
memset(word, ' ', sizeof(char));
}
++line_index;
word_index = 0;
continue;
}
if (c == '\t')
continue;
if (c == '.')
continue;
if (c == ',')
continue;
word[word_index] = c;
++word_index;
++line_index;
}
index = 0;
if (word != NULL) {
free(word);
}
return *words;
}
Code 2:
char **split(char *string) {
static char *words[MAX_LENGTH / 2];
static int index = 0;
// resetting words
for (int i = 0; i < sizeof(words) / sizeof(words[0]); i++) {
words[i] = NULL;
}
const char *delimiter = " ";
char *ptr = strtok(string, delimiter);
while (ptr != NULL) {
words[index] = ptr;
ptr = strtok(NULL, delimiter);
++index;
}
index = 0;
return words;
}
However I noticed that the memory of word+index is been reassigned to the same location thereby causing word duplication.
strtok() always returns a different pointer into the initial string. This cannot produce duplicates, unless you call it twice with the same input string (maybe with new contents).
However, your function returns a pointer to a static array, which is overwritten on each call to split(), voiding the results of all previous calls. To prevent this,
either allocate new memory in each call (which must be freed by the caller):
char *words = calloc(MAX_LENGTH / 2, 1);
or return a struct instead (which is always copied by value):
struct wordlist { char *word[MAX_LENGTH / 2]; };
wordlist split(char *string)
{
wordlist list = {};
/* ... */
list.word[index] = /* ... */;
/* ... */
return list;
}

How to split a string into tokens in C?

How to split a string into tokens by '&' in C?
strtok / strtok_r
char *token;
char *state;
for (token = strtok_r(input, "&", &state);
token != NULL;
token = strtok_r(NULL, "&", &state))
{
...
}
I would do it something like this (using strchr()):
#include <string.h>
char *data = "this&&that&other";
char *next;
char *curr = data;
while ((next = strchr(curr, '&')) != NULL) {
/* process curr to next-1 */
curr = next + 1;
}
/* process the remaining string (the last token) */
strchr(const char *s, int c) returns a pointer to the next location of c in s, or NULL if c isn't found in s.
You might be able to use strtok(), however, I don't like strtok(), because:
it modifies the string being tokenized, so it doesn't work for literal strings, or is not very useful when you want to keep the string for other purposes. In that case, you must copy the string to a temporary first.
it merges adjacent delimiters, so if your string was "a&&b&c", the returned tokens are "a", "b", and "c". Note that there is no empty token after "a".
it is not thread-safe.
You can use the strok() function as shown in the example below.
/// Function to parse a string in separate tokens
int parse_string(char pInputString[MAX_STRING_LENGTH],char *Delimiter,
char *pToken[MAX_TOKENS])
{
int i;
i = 0;
pToken[i] = strtok(pInputString, Delimiter);
i++;
while ((pToken[i] = strtok(NULL, Delimiter)) != NULL){
i++;
}
return i;
}
/// The array pTokens[] now contains the pointers to the start of each token in the (unchanged) original string.
sprintf(String,"Token1&Token2");
NrOfParameters = parse_string(String,"&",pTokens);
sprintf("%s, %s",pToken[0],pToken[1]);
For me, using strtok() function is unintuitive and too complicated, so I managed to create my own one. As arguments it accepts a string to split, character which determinates spaces between tokens and pointer representing number of found tokens (useful when printing these tokens in loop). A disadvantage of this function is fixed maximum lenght of each token.
#include <stdlib.h>
#include <string.h>
#define MAX_WORD_LEN 32
char **txtspt(const char *text, char split_char, int *w_count)
{
if(strlen(text) <= 1)
return NULL;
char **cpy0 = NULL;
int i, j = 0, k = 0, words = 1;
//Words counting
for(i = 0; i < strlen(text); ++i)
{
if(text[i] == split_char && text[i + 1] != '\0')
{
++words;
}
}
//Memory reservation
cpy0 = (char **) malloc(strlen(text) * words);
for(i = 0; i < words; ++i)
{
cpy0[i] = (char *) malloc(MAX_WORD_LEN);
}
//Splitting
for(i = 0; i < strlen(text) + 1; ++i)
{
if(text[i] == split_char)
{
cpy0[k++][j] = '\0';
j = 0;
}
else
{
if(text[i] != '\n') //Helpful, when using fgets()
cpy0[k][j++] = text[i]; //function
}
}
*w_count = words;
return cpy0;
}

Resources