Split text to words with memory allocation - c

I have a task where I need to create function to split given text into words and save them in array of string.
But function requires create and return a double pointer (pointer to pointer of strings) with NULL pointer at the end of created array or only NULL when there are no words in the text.
I already created this function, which works when I test it in c-lion compiler and display words correctly, but when I test it with testing program on classes, it throw me an error that the function returns NULL instead of pointer to allocated memory.
char** split_words(const char *text){
if (text == NULL) return NULL;
//fun() to count words in given text
int words_number = count_words(text);
//allocate memory for result array, size: words_number * pointer for each word
char **result = (char**)malloc((words_number + 1) * sizeof(char*));
if (result == NULL){
return NULL;
}
//temporary word variable to save wrods in result array
char *word = (char*)malloc(50 * sizeof(char));
if (word == NULL){
destroy(result);
return NULL;
}
int i = 0, next_word=0;
while (i < (int)strlen(text)){
while (i < (int)strlen(text)){
if ((*(text + i) >= 'a' && *(text + i) <= 'z') || (*(text + i) >= 'A' && *(text + i) <= 'Z')) break;
i++;
}
int j = 0;
while (i < (int)strlen(text)){
if (strchr(" '.,{}[]/\\\"-;:", *(text + i)) != NULL) break;
*(word + j) = *(text + i);
i++;
j++;
}
if (j > 0){
// add
*(word + j) = '\0';
// allocate memory for each word in result array
*(result + next_word) = (char*)malloc((strlen(word) + 1) * sizeof(char));
if(*(result + next_word) == NULL){
// destroy(result);
free(word);
return NULL;
}
strcpy(*(result + next_word), word);
next_word++;
}
}
// add NULL pointer at the end of words array
*(result + next_word) = NULL;
free(word);
return result;
}
Function passed the test, when given string has no words, and function need to return pointer to NULL.
I'm not sure if this memory allocating is right, in properties of test I've information about wrong access to array out of bound when I try set NULL at the end -> *(result + next_word) = NULL, but I'm confused why it work properly e.g. in c-lion compiler.
Additionaly, here is my destroy function to deallocate memory of result:
void destroy(char **words){
if (words == NULL) return;
int i = 0;
while (*(words + i) != NULL){
free(*(words + i));
i++;
}
free(words);
}
Thanks for suggestion about using of NULL pointer in double pointer array
[Edit]
the count_words function:
int count_words(const char *text){
if (text == NULL) return 0;
int counter = 0, state = 1;
for (int i=0 ; *(text + i) != '\0'; i++){
if (strchr(" '.,{}[]/\\\"-;:", *(text + i)) != NULL){
state = 1;
} else if (state == 1){
state = 0;
counter++;
}
}
return counter;
}

Related

my double pointer table isnt entierly returned. my function is suppose to split a string with a charset

char **ft_split(char *str, char *charset)
{
int i = 0;
int index = 0;
int len;
if (condition(str[0], charset) != 1)
while (condition(str[index], charset) != 1)
index++;
while (condition(str[index], charset) == 1)
index++;
int tab = nbword(str, charset);
printf("%d, len\n\n", tab);
char **tableau = malloc((tab + 1) * sizeof(char));
while (str[index] != '\0')
{
while (condition(str[index], charset) == 1)
index++;
len = wordlen(str, index, charset);
tableau[i++] = malloc(len + 1 * sizeof(char));
ft_strSPcpy(tableau[i - 1], str, len - 1, index);
printf("tableau[%d] = %s\n", i - 1, tableau[i - 1]);
index += len;
}
tableau[i] = 0;
return tableau;
free(tableau);
}
this is where something blocks.
before main
tableau[0] = bodeg
tableau[1] = c vr
tableau[2] = iment
tableau[3] = uper
in function main
tableau[0] = tableau[last] = (null)
c vr
iment
uper
and this is what i get when executed.
so as you can see, in the main function, the first string is empty?? what happenned ?
i also tried to see after my last while, all strings were good.
thank you

C - Split a string at the whitespaces

I need to split a string where there are spaces (ex string: Hello this is an example string. into an array of words. I'm not sure what I'm missing here, I'm also curious as to what the best way to test this function is. The only library function allowed is malloc.
Any help is appreciated!
#include <stdlib.h>
char **ft_split(char *str) {
int wordlength;
int wordcount;
char **wordbank;
int i;
int current;
current = 0;
wordlength = 0;
//while sentence
while (str[wordlength] != '\0') {
//go till letters
while (str[current] == ' ')
current++;
//go till spaces
wordlength = 0;
while (str[wordlength] != ' ' && str[wordlength] != '\0')
wordlength++;
//make memory for word
wordbank[wordcount] = malloc(sizeof(char) * (wordlength - current + 1));
i = 0;
//fill wordbank current
while (i < wordlength - current) {
wordbank[wordcount][i] = str[current];
i++;
current++;
}
//end word with '\0'
wordbank[wordcount][i] = '\0';
wordcount++;
}
return wordbank;
}
There are multiple problems in your code:
You do not allocate an array for wordbank to point to, dereferencing an uninitialized pointer has undefined behavior.
Your approach to scanning the string is broken: you reset wordlength inside the loop so you keep re-scanning from the beginning of the string.
You should allocate an extra entry in the array for a trailing null pointer to indicate the end of the array to the caller.
Here is a modified version:
#include <stdlib.h>
char **ft_split(const char *str) {
size_t i, j, k, wordcount;
char **wordbank;
// count the number of words:
wordcount = 0;
for (i = 0; str[i]; i++) {
if (str[i] != ' ' && (i == 0 || str[i - 1] == ' ')) {
wordcount++;
}
}
// allocate the word array
wordbank = malloc((wordcount + 1) * sizeof(*wordbank));
if (wordbank) {
for (i = k = 0;;) {
// skip spaces
while (str[i] == ' ')
i++;
// check for end of string
if (str[i] == '\0')
break;
// scan for end of word
for (j = i++; str[i] != '\0' && str[i] != ' '; i++)
continue;
// allocate space for word copy
wordbank[k] = p = malloc(i - j + 1);
if (p == NULL) {
// allocation failed: free and return NULL
while (k-- > 0) {
free(wordbank[k]);
}
free(wordbank);
return NULL;
}
// copy string contents
memcpy(p, str + j, i - j);
p[i - j] = '\0';
}
// set a null pointer at the end of the array
wordbank[k] = NULL;
}
return wordbank;
}
You need to malloc() wordbank too. You can count the number for words, and then
wordbank = malloc((count + 1) * sizeof(*wordbank));
if (wordbank == NULL)
return NULL;
Note: sizeof(char) is 1 by definition. And sizeof *pointer is always what you want.

C split string function returns \377 at the end of string instead of \0. Why?

I tried to implement a small splitStringByString() function in C, this is how I have come so far:
char* splitStringByString(char* string, char* delimiter){
int i = 0, j = 0, k = 0;
while(*(string + i) != '\0'){
j = i;
while((*(string + j) == *(delimiter + k)) && (*(string + j) != '\0')){
if(*(delimiter + k + 1) == '\0'){
// return string from here.
char result[(strlen(string) - strlen(delimiter) + 1)]; // + 1 for '\0'
i = 0;
j++;
while(*(string + j) != '\0'){
result[i] = *(string + j);
i += 1;
j++;
}
i = (int)strlen(result);
result[i - 1] = '\0';
return result;
}
k++;
j++;
}
i++;
}
return NULL;
}
So it works more or less;
the function returns the string after the delimiter as wanted, but at the end of this string (the last character) is always \377.
I already found something that said this is an octal number or so (stackoverflow), but it is not very clear for me. Could you help me and give me some advice about what I did wrong?
Thanks a lot! :-)
I do not understand your code but to do what you mention in the comment
char *splitSstring(char *haystack, char *separator)
{
char *result = (haystack == NULL || separator == NULL || !strlen(haystack) || !strlen(separator)) ? NULL : haystack;
if (result != NULL)
{
result = strstr(haystack, separator);
if (result != NULL) result += strlen(separator);
}
return result;
}
or if you want to have it in the separate string
char *splitSstring(char *haystack, char *separator, char *res)
{
char *result = (haystack == NULL || separator == NULL || !strlen(haystack) || !strlen(separator)) ? NULL : haystack;
if (result != NULL)
{
result = strstr(haystack, separator);
if (result != NULL)
{
result = result + strlen(separator);
if(res == NULL) res = malloc(strlen(result) + 1);
if(res != NULL) strcpy(res, result);
result = res;
}
}
return result;
}
you can provide your own buffer to the function or if you pass NULL it will be allocated for you. But you need to remember to free it;

How to choose types for a function prototype?

If I have a a data structure like a matrix or a tree, and I want to factor out a for loop from a very large function where the above variable is included, how should the call look like? I tried the following but I get a segmentation fault.
void write_command(int w, char *argv[], char *string[]) {
char *dest;
for (int r = 0; argv[r] != NULL; r++) {
dest = malloc(sizeof(char *) * strlen(argv[r]) + 1);
*dest = '0';
strcpy(dest, argv[r]);
string[w][r] = *dest;
free(dest);
}
}
I think you see what I'm trying to do but how should I declare the variables? I get segfault at string[w][r] = *dest;.
I don't think you want to see what I'm refactoring but it is the biggest and most unreadable function ever.
static int runCmd(const char *cmd) {
const char *cp;
pid_t pid;
int status;
struct command structcommand[15];
char **argv = 0;
int argc = 1;
bool pipe = false;
char *string[z][z];
char *pString3[40];
char *pString2[40];
int n = 0;
char **ptr1;
char string1[z];
bool keep = false;
char *pString1[z];
char *pString[z];
*pString1 = "\0";
*pString = "\0";
char *temp = {'\0'};
int w = 0;
bool b = false;
int j = 0;
int i;
int p = 0;
char **ptr;
char *tmpchar;
char *cmdtmp;
bool b1 = false;
char *dest;
int y = 0;
i = 0;
int h = 0;
nullterminate(string);
if (cmd) {
for (cp = cmd; *cp; cp++) {
if ((*cp >= 'a') && (*cp <= 'z')) {
continue;
}
if ((*cp >= 'A') && (*cp <= 'Z')) {
continue;
}
if (isDecimal(*cp)) {
continue;
}
if (isBlank(*cp)) {
continue;
}
if ((*cp == '.') || (*cp == '/') || (*cp == '-') ||
(*cp == '+') || (*cp == '=') || (*cp == '_') ||
(*cp == ':') || (*cp == ',') || (*cp == '\'') ||
(*cp == '"')) {
continue;
}
}
}
if (cmd) {
cmdtmp = malloc(sizeof(char *) * strlen(cmd) + 1);
strcpy(cmdtmp, cmd);
tmpchar = malloc(sizeof(char *) * strlen(cmd) + 1);
if (tmpchar == NULL) {
printf("Error allocating memory!\n"); /* print an error message */
return 1; /* return with failure */
}
strcpy(tmpchar, cmd);
ptr1 = str_split(pString3, cmdtmp, '|');
if (strstr(cmd, "|") == NULL) { /* not a pipeline */
makeArgs(cmd, &argc, (const char ***) &argv, pipe, 0, 0);
for (j = 0; j < argc; j++) {
string[0][j] = argv[j];
structcommand[i].argv = string[0]; /*process;*/
}
n++;
}
else {
for (i = 0; *(ptr1 + i); i++) { /* tokenize the input string for each pipeline*/
n++; /* save number of pipelines */
int e = 0; /* a counter */
*pString = "\0"; /* should malloc and free this? */
strcpy(string1, *(ptr1 + i));
if ((string1[0] != '\0') && !isspace(string1[0])) { /* this is neither the end nor a new argument */
ptr = str_split(pString2, *(&string1), ' '); /* split the string at the arguments */
h = 0;
for (j = 0; *(ptr + j); j++) { /* step through the arguments */
/* the pipeline is in cmdtmp and the argument/program is in ptr[i] */
if (ptr + j && !b && strstr(*(ptr + j), "'")) {
b = true;
strcpy(temp, *(ptr + j));
if (y < 1) {
y++;
}
}
while (b) {
if (*(ptr + j) && strstr(*(ptr + j), "'")) { /* end of quote */
b = false;
if (y < 1) {
string[i][j] = strcpy(temp, *(ptr + j));
}
y = 0;
}
else if (*(ptr + j)) { /* read until end of quote */
string[i][j] = temp;
continue;
} else {
b = false;
break;
}
}
if (ptr + j) {
if (*(ptr + j)[0] == '{') {
keep = true;
}
if (testFn(*(ptr + j))) { /* test for last char */
string[i][j - p] = concat(*pString1, *(ptr + j));
keep = false;
free(*pString1);
goto mylabel;
}
if (keep) {
*pString1 = concat(*pString1, *(ptr + j));
*pString1 = concat(*pString1, " ");
p++;
} else {
// strcpy(temp, *(ptr + j));
b1 = false;
int q = j;
for (e = 0; *(ptr + q + e); e++) { /* step through the string */
b1 = true;
if (*(ptr + e + q)) {
*pString = concat(*pString, *(ptr + e + q));
*pString = concat(*pString, " ");
}
j = e;
}
if (makeArgs(*pString, &argc, (const char ***) &argv, pipe, i, h)) {
write_command(&w, argv, string[w]);
/*for (int r = 0; argv[r] != NULL; r++) {
dest = malloc(sizeof(char *) * strlen(argv[r]) + 1);
*dest = '0';
strcpy(dest, argv[r]);
string[w][r] = dest;
}*/
w++;
} else {
if (!b1) { /* no args (?) */
for (int r = 0; argv[r] != NULL; r++) {
string[i][r] = argv[r];
}
}
}
}
}
}
mylabel:
free(ptr);
dump_argv((const char *) "d", argc, argv);
}
}
free(ptr1);
free(cmdtmp);
free(tmpchar);
}
for (i = 0; i < n; i++) {
for (j = 0; DEBUG && string[i][j] != NULL; j++) {
if (i == 0 && j == 0) printf("\n");
printf("p[%d][%d] %s\n", i, j, string[i][j]);
}
structcommand[i].argv = string[i];
}
fflush(NULL);
pid = fork();
if (pid < 0) {
perror("fork failed");
return -1;
}
/* If we are the child process, then go execute the string.*/
if (pid == 0) {
/* spawn(cmd);*/
fork_pipes(n, structcommand);
}
/*
* We are the parent process.
* Wait for the child to complete.
*/
status = 0;
while (((pid = waitpid(pid, &status, 0)) < 0) && (errno == EINTR));
if (pid < 0) {
fprintf(stderr, "Error from waitpid: %s", strerror(errno));
return -1;
}
if (WIFSIGNALED(status)) {
fprintf(stderr, "pid %ld: killed by signal %d\n",
(long) pid, WTERMSIG(status));
return -1;
}
}
return WEXITSTATUS(status);
}
I'm going to suppose that you are trying to make a deep copy of the argv array, with that being a NULL-terminated array of strings such as the second parameter of a C program's main() function. The function you present seems to assume that you have already allocated space for the destination array itself; its job seems limited to copying the argument strings.
First things first, then: let's look at the caller. If you're making a deep copy of a standard argument vector, then the type of the destination variable should be compatible with the type of argv itself (in the colloquial sense of "compatible"). If the lifetime of the copy does not need to extend past the host function's return, then a variable-length array would be a fine choice:
char *copy[argc + 1];
That relieves you of manually managing the memory of the array itself, but not of managing any memory uniquely allocated to its elements. On the other hand, if you need the copy to survive return from the function in which it is declared, then you'll have to use manual allocation:
char **copy = malloc((argc + 1) * sizeof(*copy));
if (!copy) /* handle allocation failure */ ;
Either way, you can pass the resulting array or pointer itself to your write_command() function, and the required parameter type is the same. It is pointless to pass a pointer to copy, because that function will not modify the pointer it receives as its argument; rather, it will modify the memory to which it points.
Here is the signature of the function you seem to want:
void write_command(char *argv[], char *string[]) {
Given such a signature, you would call it as ...
write_command(argv, copy);
....
The key step you seem to want to perform in the loop inside is
string[r] = strdup(argv[r]);
You can accomplish the same thing with a malloc(), initialize, strcpy() sequence, but that's a bit silly when stdrup() is ready-made for the same task. Do not forget to check its return value, however, (or in your original code, the return value of malloc()) in case memory allocation fails. Any way around, you must not free the allocated memory within write_command(), because that leaves you with invalid pointers inside your copied array.
Furthermore, even if you really do have a 2D array of char * in the caller, such as ...
char *copies[n][argc + 1];
... nothing changes with function write_command(). It doesn't need to know or care whether the array it's copying into is an element of a 2D array. You simply have to call it appropriately, something like:
write_command(argv, copies[w]);
No matter what, you must be sure to free the copied argument strings, but only after you no longer need them. Again, you cannot do that inside the write_command() function.

C: Why will I get an error on free()

I wrote the following function which will break in the lines marked with // Breakpoint:
char *parseNextWord(char *str)
{
static char *lastStr = "";
static int lastPosition = 0;
if (strcmp(lastStr, str) != 0)
{
lastStr = str;
lastPosition = 0;
}
if (lastPosition > 0 && str[lastPosition - 1] == 0)
{
return 0;
}
char *word = "";
int wLength = 0;
while (str[lastPosition] != ' ' && str[lastPosition] != '\n' && str[lastPosition] != '\0')
{
char *tmp = (char*)malloc(++wLength * sizeof(char));
for (int i = 0; i < sizeof(word); i++)
{
tmp[i] = word[i];
}
tmp[sizeof(*tmp) - 1] = str[lastPosition];
free(word); // Breakpoint
word = (char*)malloc(sizeof(*tmp));
for (int i = 0; i < sizeof(tmp); i++)
{
word[i] = tmp[i];
}
free(tmp); // Breakpoint
lastPosition++;
}
while (str[lastPosition - 1] != '\0' && (str[lastPosition] == ' ' || str[lastPosition] == '\n' || str[lastPosition] == '\0'))
{
lastPosition++;
}
return word;
}
The function can be called like this:
char* string = "Name1 Name2\nName3 Name4\nName1";
int totalCount = 0;
char *nextWord = parseNextWord(string);
while (nextWord != 0)
{
for (int c = 1; c < argc; c++)
{
if (strcmp((const char*)argv[c], nextWord) == 0)
{
totalCount++;
}
}
nextWord = parseNextWord(string);
}
Why is my code breaking on free? How can I improve it?
The relevant code I see is:
char* word = "";
free(word);
You did not allocate the empty string (""), so you cannot free it.
You can only free what you malloc
If you didn't allocate it, don't try to free it.
P.S. Here is my best list of functions which allocate memory that can be freed:
malloc
realloc
calloc
strdup
asprintf
vasprintf
(notably: _not_ alloca)
Maybe there are others as well?
because you didn't allocate word the first time you entered the loop.
you just make it point to "" which is not allocated dynamically.
my suggestion to make it work is to add integer variable before the while with initial value 0 :
if (flag != 0) {
free(word);
} else {
word = 1;
}

Resources