String Tokenization problem occurs when deleting duplicate words from a string - c

What I'm trying to do in the following code is to tokenize a string and store every token in a dynamic allocated structure but exclude any duplicates.
This code kind of works, until I enter a string that contains two equal words. For example, the string "this this", will also store the second word even though it's the same. But if I enter "this this is" instead, it removes the second "this", and completely ignores the last word of the string, so that it doesn't get deleted if there's a duplicate in the string.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define dim 70
typedef struct string {
char* token[25];
} string;
int main() {
string* New = malloc(dim*sizeof(string));
char* s;
char* buffer = NULL;
int i = 0, r = 0;
s = malloc(dim * sizeof(char));
fgets(s, dim, stdin);
printf("The string is: %s\n", s);
New->token[i] = malloc(dim*sizeof(char));
New->token[i] = strtok(s, " ");
++i;
while((buffer = strtok(NULL, " ")) && buffer != NULL){
printf("\nbuffer is: %s", buffer);
for(r = 0; r < i; ++r) {
if(strcmp(New->token[r], buffer) != 0 && r == i-1) {
New->token[i] = malloc(strlen(buffer)*sizeof(char)+1);
New->token[i] = buffer;
++i;
}
else if(New->token[r] == buffer) {
break;
}
}
}
printf("\n New string: ");
for(i = 0; New->token[i] != NULL; ++i) {
printf(" %s", New->token[i]);
}
return 0;
}
In my mind this should work fine but I'm really having a hard time finding what I did wrong here. If you need additional info just ask me please, I apologise for any eventual lack of clarity (and for my english).

Complete re-write of this answer to address some fundamentally wrong things I did not see the first time through. See in-line comments in the code at bottom to explain some of the construct changes:
I ran your code exactly as is and saw what you are describing, and other than the note about using strcmp in the other answer, found several lines of code that can be adjusted, or removed to make it do what you described it should:
First, the struct definition creates a pointer to an array of char. Based on what you are doing later in the code, what you need is a simple array of char
typedef struct string {
//char* token[25]; //this create a pointer to array of 25 char
char token[25]; //this is all you need
} string;
As you will see later, this will greatly simplify memory allocation.
some basic problems:
Include the \n newline character in your parsing delimiter. When <enter> is hit as the end of entering the string, a newline is appended, causing the first instance of this and the second instance of this\n to be unequal.
while((buffer = strtok(NULL, " \n")) && buffer != NULL){
^^
This line is creating uninitialized memory.
string* New = malloc(dim*sizeof(string));
A note about using malloc() vs. calloc(): malloc() leaves the memory it creates uninitialized, while calloc() creates a block of memory all initialized to 0.
Memory created using malloc()
Memory created using calloc():
This becomes important in several places in your code, but in particular I see a problem in the last section:
for(i = 0; New->token[i] != NULL; ++i) {
printf(" %s", New->token[i]);
}
If the memory created for New is not initialized, you can get a run-time error when the index i is incremented beyond the area in memory that you have explicitly written to, and loop attempts to test New->token[i]. If New->token[i] contains anything but 0, it will attempt to print that area of memory.
You should also free each instance of memory created in your code with a corresponding call to free().
All of this, and more is addressed in the following re-write of your code:
(tested against this is a string a string.)
typedef struct string {
//char* token[25]; //this create a pointer to array of 25 char
char token[25]; //this is all you need
} string;
int main() {
char* s;
char* buffer = NULL;
int i = 0, r = 0;
string* New = calloc(dim, sizeof(string));//Note: This creates an array of New.
//Example: New[i]
//Not: New->token[i]
s = calloc(dim , sizeof(char));
fgets(s, dim, stdin);
printf("The string is: %s\n", s);
buffer = strtok(s, " \n");
strcpy(New[i].token, buffer); //use strcpy instead of = for strings
//restuctured the parsing loop to a more conventional construct
// when using strtok:
if(buffer)
{
++i;
while(buffer){
printf("\nbuffer is: %s", buffer);
for(r = 0; r < i; ++r) {
if(strcmp(New[r].token, buffer) != 0 && r == i-1) {
strcpy(New[i].token, buffer);
++i;
}
else if(strcmp(New[r].token, buffer)==0) {
break;
}
}
buffer = strtok(NULL, " \n");
}
}
printf("\n New string: ");
for(i = 0; i<dim; i++) {
if(New[i].token) printf(" %s", New[i].token);
}
free(New);
free(s);
return 0;
}

You comparing pointers instead of comparing strings. Replace
}
else if(New->token[r] == buffer) {
break;
With
}
else if(strcmp(New->token[r], buffer) == 0) {
break;
You also need to copy the buffer:
memcpy(New->token[i],buffer,strlen(buffer)+1);
instead of
New->token[i] = buffer;
or replace both lines (along with malloc) with
New->token[i] = strdup(buffer);
And it's better to replace strtok with strtok_r (strtok is not re-entrant).

The structure seems unnecessary.
This uses an array of pointers to store the tokens.
The input can be parsed with strspn and strcspn.
Unique tokens are added to the array of pointers.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define DIM 70
int main() {
char* token[DIM] = { NULL};
char s[DIM];
char* buffer = s;
int unique = 0, check = 0;
int match = 0;
int loop = 0;
size_t space = 0;
size_t span = 0;
fgets(s, DIM, stdin);
printf("The string is: %s\n", s);
while ( unique < DIM && *buffer){//*buffer not pointing to zero terminator
space = strspn ( buffer, " \n\t");//leading whitespace
buffer += space;//advance past whitespace
span = strcspn ( buffer, " \n\t");//not whitespace
if ( span) {
printf("\ntoken is: %.*s", (int)span, buffer );//prints span number of characters
}
match = 0;
for ( check = 0; check < unique; ++check) {
if ( 0 == strncmp ( token[check], buffer, span)) {
match = 1;//found match
break;
}
}
if ( ! match) {//no match
token[unique] = malloc ( span + 1);//allocate for token
strncpy ( token[unique], buffer, span);//copy span number of characters
token[unique][span] = 0;//zero terminate
++unique;//add a unique token
}
buffer += span;//advance past non whitespace for next token
}
printf("\n New string: ");
for( loop = 0; loop < unique; ++loop) {
printf(" %s", token[loop]);//print the unique tokens
}
printf("\n");
for( loop = 0; loop < unique; ++loop) {
free ( token[loop]);//free memory
}
return 0;
}

Related

count the occurrences of each word in that phrase using C language

I am not understanding why, when I am doing sub=strtok(NULL,delim);, instead of going to the next word, it is making it null. Please try this code and help me resolve the problem.
#include <stdio.h>
#include <string.h>
int main()
{
char str[100],str1[100],*sub,*sub1;
int c1;
printf("\nEnter the string or sentence: ");
scanf("%[^\n]%*c",str);
strcpy(str1,str);
char delim[]=" ";
sub=strtok(str,delim);
while(sub !=NULL)
{
c1=0;
sub1=strtok(str1,delim);
while(sub1 !=NULL)
{
if(!strcmp(sub,sub1))
{
c1++;
}
sub1=strtok(NULL,delim);
}
printf("\n%s : %d",sub,c1);
sub=strtok(NULL,delim);
}
return 0;
}
You can't parse multiple string buffers concurrently with strtok(), like you are trying to do. Internally, strtok() maintains a static reference to the input string, that is how it knows which string to advance through when subsequent calls specify a NULL string pointer.
Before entering your outer loop, you call strtok(str) to set the internal reference to str. Then, inside the loop, you call strtok(str1) to reset that reference to str1. So, on the 1st iteration of your outer loop, before entering the inner loop, strtok() has already lost its reference to your str buffer since you replace it with the str1 buffer. Then, after the inner loop has finished scanning str1, your outer loop breaks when calling sub=strtok(NULL,delim); because there are no more tokens to read from str1.
For what you are attempting, use strtok_r() instead:
The strtok_r() function is a reentrant version strtok(). The saveptr argument is a pointer to a char * variable that is used internally by strtok_r() in order to maintain context between successive calls that parse the same string.
...
Different strings may be parsed concurrently using sequences of calls to strtok_r() that specify different saveptr arguments.
Also, because strtok(_r)() is destructive to the input string, inserting '\0' after each token found, you need to reset the contents of your 2nd string buffer back to the original string before entering the inner loop. Otherwise, the inner loop will stop scanning after is reads the 1st word in your 2nd string buffer.
Try something more like this:
#include <stdio.h>
#include <string.h>
int main()
{
char orig[100], str1[100], str2[100], *sub1, *sub2, *save1, *save2;
char delim[] = " ";
int c1;
printf("\nEnter the string or sentence: ");
scanf("%[^\n]%*c", orig);
strcpy(str1, orig);
sub1 = strtok_r(str1, delim, &save1);
while (sub1 != NULL)
{
c1 = 0;
strcpy(str2, orig);
sub2 = strtok_r(str2, delim, &save2);
while (sub2 != NULL)
{
if (strcmp(sub1, sub2) == 0)
{
++c1;
}
sub2 = strtok_r(NULL, delim, &save2);
}
printf("\n%s : %d", sub1, c1);
sub1 = strtok_r(NULL, delim, &save1);
}
return 0;
}
Otherwise, you can use a single loop through a single string buffer, storing the word counts in a separate array, eg:
#include <stdio.h>
#include <string.h>
typedef struct _wordInfo
{
char* str;
int count;
} wordInfo;
int main()
{
char str[100], *sub;
char delim[] = " ";
wordInfo words[100], *word;
int numWords = 0;
printf("\nEnter the string or sentence: ");
scanf("%[^\n]%*c", str);
sub = strtok(str, delim);
while (sub != NULL)
{
word = NULL;
for (int i = 0; i < numWords; ++i)
{
if (strcmp(sub, words[i].str) == 0)
{
word = &words[i];
break;
}
}
if (!word)
{
if (numWords == 100) break;
word = &words[numWords++];
word->str = sub;
word->count = 1;
}
else {
word->count++;
}
sub = strtok(NULL, delim);
}
for(int i = 0; i < numWords; ++i)
{
word = &words[i];
printf("\n%s : %d", word->str, word->count);
}
return 0;
}

Searching an array for a specific character [duplicate]

I want to write a program in C that displays each word of a whole sentence (taken as input) at a seperate line. This is what I have done so far:
void manipulate(char *buffer);
int get_words(char *buffer);
int main(){
char buff[100];
printf("sizeof %d\nstrlen %d\n", sizeof(buff), strlen(buff)); // Debugging reasons
bzero(buff, sizeof(buff));
printf("Give me the text:\n");
fgets(buff, sizeof(buff), stdin);
manipulate(buff);
return 0;
}
int get_words(char *buffer){ // Function that gets the word count, by counting the spaces.
int count;
int wordcount = 0;
char ch;
for (count = 0; count < strlen(buffer); count ++){
ch = buffer[count];
if((isblank(ch)) || (buffer[count] == '\0')){ // if the character is blank, or null byte add 1 to the wordcounter
wordcount += 1;
}
}
printf("%d\n\n", wordcount);
return wordcount;
}
void manipulate(char *buffer){
int words = get_words(buffer);
char *newbuff[words];
char *ptr;
int count = 0;
int count2 = 0;
char ch = '\n';
ptr = buffer;
bzero(newbuff, sizeof(newbuff));
for (count = 0; count < 100; count ++){
ch = buffer[count];
if (isblank(ch) || buffer[count] == '\0'){
buffer[count] = '\0';
if((newbuff[count2] = (char *)malloc(strlen(buffer))) == NULL) {
printf("MALLOC ERROR!\n");
exit(-1);
}
strcpy(newbuff[count2], ptr);
printf("\n%s\n",newbuff[count2]);
ptr = &buffer[count + 1];
count2 ++;
}
}
}
Although the output is what I want, I have really many black spaces after the final word displayed, and the malloc() returns NULL so the MALLOC ERROR! is displayed in the end.
I can understand that there is a mistake at my malloc() implementation, but I do not know what it is.
Is there another more elegant or generally better way to do it?
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
Take a look at this, and use whitespace characters as the delimiter. If you need more hints let me know.
From the website:
char * strtok ( char * str, const char * delimiters );
On a first call, the function expects a C string as argument for str, whose first character is used as the starting location to scan for tokens. In subsequent calls, the function expects a null pointer and uses the position right after the end of last token as the new starting location for scanning.
Once the terminating null character of str is found in a call to strtok, all subsequent calls to this function (with a null pointer as the first argument) return a null pointer.
Parameters
str
C string to truncate.
Notice that this string is modified by being broken into smaller strings (tokens).
Alternativelly [sic], a null pointer may be specified, in which case the function continues scanning where a previous successful call to the function ended.
delimiters
C string containing the delimiter characters.
These may vary from one call to another.
Return Value
A pointer to the last token found in string.
A null pointer is returned if there are no tokens left to retrieve.
Example
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
For the fun of it here's an implementation based on the callback approach:
const char* find(const char* s,
const char* e,
int (*pred)(char))
{
while( s != e && !pred(*s) ) ++s;
return s;
}
void split_on_ws(const char* s,
const char* e,
void (*callback)(const char*, const char*))
{
const char* p = s;
while( s != e ) {
s = find(s, e, isspace);
callback(p, s);
p = s = find(s, e, isnotspace);
}
}
void handle_word(const char* s, const char* e)
{
// handle the word that starts at s and ends at e
}
int main()
{
split_on_ws(some_str, some_str + strlen(some_str), handle_word);
}
malloc(0) may (optionally) return NULL, depending on the implementation. Do you realize why you may be calling malloc(0)? Or more precisely, do you see where you are reading and writing beyond the size of your arrays?
Consider using strtok_r, as others have suggested, or something like:
void printWords(const char *string) {
// Make a local copy of the string that we can manipulate.
char * const copy = strdup(string);
char *space = copy;
// Find the next space in the string, and replace it with a newline.
while (space = strchr(space,' ')) *space = '\n';
// There are no more spaces in the string; print out our modified copy.
printf("%s\n", copy);
// Free our local copy
free(copy);
}
Something going wrong is get_words() always returning one less than the actual word count, so eventually you attempt to:
char *newbuff[words]; /* Words is one less than the actual number,
so this is declared to be too small. */
newbuff[count2] = (char *)malloc(strlen(buffer))
count2, eventually, is always one more than the number of elements you've declared for newbuff[]. Why malloc() isn't returning a valid ptr, though, I don't know.
You should be malloc'ing strlen(ptr), not strlen(buf). Also, your count2 should be limited to the number of words. When you get to the end of your string, you continue going over the zeros in your buffer and adding zero size strings to your array.
Just as an idea of a different style of string manipulation in C, here's an example which does not modify the source string, and does not use malloc. To find spaces I use the libc function strpbrk.
int print_words(const char *string, FILE *f)
{
static const char space_characters[] = " \t";
const char *next_space;
// Find the next space in the string
//
while ((next_space = strpbrk(string, space_characters)))
{
const char *p;
// If there are non-space characters between what we found
// and what we started from, print them.
//
if (next_space != string)
{
for (p=string; p<next_space; p++)
{
if(fputc(*p, f) == EOF)
{
return -1;
}
}
// Print a newline
//
if (fputc('\n', f) == EOF)
{
return -1;
}
}
// Advance next_space until we hit a non-space character
//
while (*next_space && strchr(space_characters, *next_space))
{
next_space++;
}
// Advance the string
//
string = next_space;
}
// Handle the case where there are no spaces left in the string
//
if (*string)
{
if (fprintf(f, "%s\n", string) < 0)
{
return -1;
}
}
return 0;
}
you can scan the char array looking for the token if you found it just print new line else print the char.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
int len = strlen(s);
char delim =' ';
for(int i = 0; i < len; i++) {
if(s[i] == delim) {
printf("\n");
}
else {
printf("%c", s[i]);
}
}
free(s);
return 0;
}
char arr[50];
gets(arr);
int c=0,i,l;
l=strlen(arr);
for(i=0;i<l;i++){
if(arr[i]==32){
printf("\n");
}
else
printf("%c",arr[i]);
}

Scanf unknown number of string arguments C

I wanted to know if there was a way to use scanf so I can take in an unknown number of string arguments and put them into a char* array. I have seen it being done with int values, but can't find a way for it to be done with char arrays. Also the arguments are entered on the same line separated by spaces.
Example:
user enters hello goodbye yes, hello gets stored in array[0], goodbye in array[1] and yes in array[2]. Or the user could just enter hello and then the only thing in the array would be hello.
I do not really have any code to post, as I have no real idea how to do this.
You can do something like, read until the "\n" :
scanf("%[^\n]",buffer);
you need to allocate before hand a big enough buffer.
Now go through the buffer count the number of words, and allocate the necessary space char **array = ....(dynamic string allocation), go to the buffer and copy string by string into the array.
An example:
int words = 1;
char buffer[128];
int result = scanf("%127[^\n]",buffer);
if(result > 0)
{
char **array;
for(int i = 0; buffer[i]!='\0'; i++)
{
if(buffer[i]==' ' || buffer[i]=='\n' || buffer[i]=='\t')
{
words++;
}
}
array = malloc(words * sizeof(char*));
// Using RoadRunner suggestion
array[0] = strtok (buffer," ");
for(int w = 1; w < words; w++)
{
array[w] = strtok (NULL," ");
}
}
As mention in the comments you should use (if you can) fgets instead fgets(buffer,128,stdin);.
More about strtok
If you have an upper bound to the number of strings you may receive from the user, and to the number of characters in each string, and all strings are entered on a single line, you can do this with the following steps:
read the full line with fgets(),
parse the line with sscanf() with a format string with the maximum number of %s conversion specifiers.
Here is an example for up to 10 strings, each up to 32 characters:
char buf[400];
char s[10][32 + 1];
int n = 0;
if (fgets(buf, sizeof buf, sdtin)) {
n = sscanf("%32s%32s%32s%32s%32s%32s%32s%32s%32s%32s",
s[0], s[1], s[2], s[3], s[4], s[5], s[6], s[7], s[8], s[9]));
}
// `n` contains the number of strings
// s[0], s[1]... contain the strings
If the maximum number is not known of if the maximum length of a single string is not fixed, or if the strings can be input on successive lines, you will need to iterate with a simple loop:
char buf[200];
char **s = NULL;
int n;
while (scanf("%199s", buf) == 1) {
char **s1 = realloc(s, (n + 1) * sizeof(*s));
if (s1 == NULL || (s1[n] = strdup(buf)) == NULL) {
printf("allocation error");
exit(1);
}
s = s1;
n++;
}
// `n` contains the number of strings
// s[0], s[1]... contain pointers to the strings
Aside from the error handling, this loop is comparable to the hard-coded example above but it still has a maximum length for each string. Unless you can use a scanf() extension to allocate the strings automatically (%as on GNU systems), the code will be more complicated to handle any number of strings with any possible length.
You can use:
fgets to read input from user. You have an easier time using this instead of scanf.
malloc to allocate memory for pointers on the heap. You can use a starting size, like in this example:
size_t currsize = 10
char **strings = malloc(currsize * sizeof(*strings)); /* always check
return value */
and when space is exceeded, then realloc more space as needed:
currsize *= 2;
strings = realloc(strings, currsize * sizeof(*strings)); /* always check
return value */
When finished using the requested memory from malloc() and realloc(), it's always to good to free the pointers at the end.
strtok to parse the input at every space. When copying over the char * pointer from strtok(), you must also allocate space for strings[i], using malloc() or strdup.
Here is an example I wrote a while ago which does something very similar to what you want:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define INITSIZE 10
#define BUFFSIZE 100
int
main(void) {
char **strings;
size_t currsize = INITSIZE, str_count = 0, slen;
char buffer[BUFFSIZE];
char *word;
const char *delim = " ";
int i;
/* Allocate initial space for array */
strings = malloc(currsize * sizeof(*strings));
if(!strings) {
printf("Issue allocating memory for array of strings.\n");
exit(EXIT_FAILURE);
}
printf("Enter some words(Press enter again to end): ");
while (fgets(buffer, BUFFSIZE, stdin) != NULL && strlen(buffer) > 1) {
/* grow array as needed */
if (currsize == str_count) {
currsize *= 2;
strings = realloc(strings, currsize * sizeof(*strings));
if(!strings) {
printf("Issue reallocating memory for array of strings.\n");
exit(EXIT_FAILURE);
}
}
/* Remove newline from fgets(), and check for buffer overflow */
slen = strlen(buffer);
if (slen > 0) {
if (buffer[slen-1] == '\n') {
buffer[slen-1] = '\0';
} else {
printf("Exceeded buffer length of %d.\n", BUFFSIZE);
exit(EXIT_FAILURE);
}
}
/* Parsing of words from stdin */
word = strtok(buffer, delim);
while (word != NULL) {
/* allocate space for one word, including nullbyte */
strings[str_count] = malloc(strlen(word)+1);
if (!strings[str_count]) {
printf("Issue allocating space for word.\n");
exit(EXIT_FAILURE);
}
/* copy strings into array */
strcpy(strings[str_count], word);
str_count++;
word = strtok(NULL, delim);
}
}
/* print and free strings */
printf("Your array of strings:\n");
for (i = 0; i < str_count; i++) {
printf("strings[%d] = %s\n", i, strings[i]);
free(strings[i]);
strings[i] = NULL;
}
free(strings);
strings = NULL;
return 0;
}

Scan a sentence with space into *char array in C

I'm not good at using C language. Here is my dumb question. Now I am trying to get input from users, which may have spaces. And what I need to do is to split this sentence using space as delimiter and then put each fragment into char* array. Ex:
Assuming I have char* result[10];, and the input is: Good morning John. The output should be result[0]="Good"; result[1]="morning"; result[2]="John";I have already tried scanf("%[^\n]",input); and gets(input); Yet it is still hard to deal with String in C. And also I have tried strtok, but it seems that it only replaced the space by NULL. Hence the result will be GoodNULLmorningNULLJohn. Obviously it's not what I want. Please help my dumb question. Thanks.
Edit:
This is what I don't understand when using strtok. Here is a test code.
The substr still displayed Hello there. It seems subtok only replace a null at the space position. Thus, I can't use the substr in an if statement.
int main()
{
int i=0;
char* substr;
char str[] = "Hello there";
substr = strtok(str," ");
if(substr=="Hello"){
printf("YES!!!!!!!!!!");
}
printf("%s\n",substr);
for(i=0;i<11;i++){
printf("%c", substr[i]);
}
printf("\n");
system("pause");
return 0;
}
Never use gets, is deprecated in C99 and removed from C11.
IMO, scanf is not a good function to use when you don't know the number of elements before-hand, I suggest fgets:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[128];
char *ptr;
fgets(str, sizeof str, stdin);
/* Remove trailing newline */
ptr = strchr(str, '\n');
if (ptr != NULL) {
*ptr = '\0';
}
/* Tokens */
ptr = strtok(str, " ");
while (ptr != NULL) {
printf("%s\n", ptr);
ptr = strtok(NULL, " ");
}
return 0;
}
gets is not recommended to use, as there is no way to tell the size of the buffer. fgets is ok here because it will stop reading when the 1st new line is encountered. You could use strtok to store all the splited words in to an array of strings, for example:
#include <stdio.h>
#include <string.h>
int main(void) {
char s[256];
char *result[10];
fgets(s, sizeof(s), stdin);
char *p = strtok(s, " \n");
int cnt = 0;
while (cnt < (sizeof result / sizeof result[0]) && p) {
result[cnt++] = p;
p = strtok(NULL, " \n");
}
for (int i = 0; i < cnt; i++)
printf("%s\n", result[i]);
return 0;
}
As most of the other answers haven't covered another thing you were asking:
strtok will not allocate temporary memory and will use your given string to replace every separator with a zero termination. This is why Good morning John becomes GoodNULLmorningNULLJohn. If it wouldn't do this, each token would print the whole rest of the string on its tail like:
result[0] = Good morning John
result[1] = morning John
result[2] = John
So if you want to keep your original input and an array of char* per word, you need 2 buffers. There is no other way around that. You also need the token buffer to stay in scope as long as you use the result array of char* pointers, else that one points to invalid memory and will cause undefined behavior.
So this would be a possible solution:
int main()
{
const unsigned int resultLength = 10;
char* result[resultLength];
memset(result, 0, sizeof result); // we should also zero the result array to avoid access violations later on
// Read the input from the console
char input[256];
fgets(input, sizeof input, stdin);
// Get rid of the newline char
input[strlen(input) - 1] = 0;
// Copy the input string to another buffer for your tokens to work as expected
char tokenBuffer[256];
strcpy(tokenBuffer, input);
// Setting of the pointers per word
char* token = strtok(tokenBuffer, " ");
for (unsigned int i = 0; token != NULL && i < resultLength; i++)
{
result[i] = token;
token = strtok(NULL, " ");
}
// Print the result
for (unsigned int i = 0; i < resultLength; i++)
{
printf("result[%d] = %s\n", i, result[i] != NULL ? result[i] : "NULL");
}
printf("The input is: %s\n", input);
return 0;
}
It prints:
result[0] = Good
result[1] = morning
result[2] = John
result[3] = NULL
result[4] = NULL
result[5] = NULL
result[6] = NULL
result[7] = NULL
result[8] = NULL
result[9] = NULL
The input is: Good morning John

Split string in C every white space

I want to write a program in C that displays each word of a whole sentence (taken as input) at a seperate line. This is what I have done so far:
void manipulate(char *buffer);
int get_words(char *buffer);
int main(){
char buff[100];
printf("sizeof %d\nstrlen %d\n", sizeof(buff), strlen(buff)); // Debugging reasons
bzero(buff, sizeof(buff));
printf("Give me the text:\n");
fgets(buff, sizeof(buff), stdin);
manipulate(buff);
return 0;
}
int get_words(char *buffer){ // Function that gets the word count, by counting the spaces.
int count;
int wordcount = 0;
char ch;
for (count = 0; count < strlen(buffer); count ++){
ch = buffer[count];
if((isblank(ch)) || (buffer[count] == '\0')){ // if the character is blank, or null byte add 1 to the wordcounter
wordcount += 1;
}
}
printf("%d\n\n", wordcount);
return wordcount;
}
void manipulate(char *buffer){
int words = get_words(buffer);
char *newbuff[words];
char *ptr;
int count = 0;
int count2 = 0;
char ch = '\n';
ptr = buffer;
bzero(newbuff, sizeof(newbuff));
for (count = 0; count < 100; count ++){
ch = buffer[count];
if (isblank(ch) || buffer[count] == '\0'){
buffer[count] = '\0';
if((newbuff[count2] = (char *)malloc(strlen(buffer))) == NULL) {
printf("MALLOC ERROR!\n");
exit(-1);
}
strcpy(newbuff[count2], ptr);
printf("\n%s\n",newbuff[count2]);
ptr = &buffer[count + 1];
count2 ++;
}
}
}
Although the output is what I want, I have really many black spaces after the final word displayed, and the malloc() returns NULL so the MALLOC ERROR! is displayed in the end.
I can understand that there is a mistake at my malloc() implementation, but I do not know what it is.
Is there another more elegant or generally better way to do it?
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
Take a look at this, and use whitespace characters as the delimiter. If you need more hints let me know.
From the website:
char * strtok ( char * str, const char * delimiters );
On a first call, the function expects a C string as argument for str, whose first character is used as the starting location to scan for tokens. In subsequent calls, the function expects a null pointer and uses the position right after the end of last token as the new starting location for scanning.
Once the terminating null character of str is found in a call to strtok, all subsequent calls to this function (with a null pointer as the first argument) return a null pointer.
Parameters
str
C string to truncate.
Notice that this string is modified by being broken into smaller strings (tokens).
Alternativelly [sic], a null pointer may be specified, in which case the function continues scanning where a previous successful call to the function ended.
delimiters
C string containing the delimiter characters.
These may vary from one call to another.
Return Value
A pointer to the last token found in string.
A null pointer is returned if there are no tokens left to retrieve.
Example
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
For the fun of it here's an implementation based on the callback approach:
const char* find(const char* s,
const char* e,
int (*pred)(char))
{
while( s != e && !pred(*s) ) ++s;
return s;
}
void split_on_ws(const char* s,
const char* e,
void (*callback)(const char*, const char*))
{
const char* p = s;
while( s != e ) {
s = find(s, e, isspace);
callback(p, s);
p = s = find(s, e, isnotspace);
}
}
void handle_word(const char* s, const char* e)
{
// handle the word that starts at s and ends at e
}
int main()
{
split_on_ws(some_str, some_str + strlen(some_str), handle_word);
}
malloc(0) may (optionally) return NULL, depending on the implementation. Do you realize why you may be calling malloc(0)? Or more precisely, do you see where you are reading and writing beyond the size of your arrays?
Consider using strtok_r, as others have suggested, or something like:
void printWords(const char *string) {
// Make a local copy of the string that we can manipulate.
char * const copy = strdup(string);
char *space = copy;
// Find the next space in the string, and replace it with a newline.
while (space = strchr(space,' ')) *space = '\n';
// There are no more spaces in the string; print out our modified copy.
printf("%s\n", copy);
// Free our local copy
free(copy);
}
Something going wrong is get_words() always returning one less than the actual word count, so eventually you attempt to:
char *newbuff[words]; /* Words is one less than the actual number,
so this is declared to be too small. */
newbuff[count2] = (char *)malloc(strlen(buffer))
count2, eventually, is always one more than the number of elements you've declared for newbuff[]. Why malloc() isn't returning a valid ptr, though, I don't know.
You should be malloc'ing strlen(ptr), not strlen(buf). Also, your count2 should be limited to the number of words. When you get to the end of your string, you continue going over the zeros in your buffer and adding zero size strings to your array.
Just as an idea of a different style of string manipulation in C, here's an example which does not modify the source string, and does not use malloc. To find spaces I use the libc function strpbrk.
int print_words(const char *string, FILE *f)
{
static const char space_characters[] = " \t";
const char *next_space;
// Find the next space in the string
//
while ((next_space = strpbrk(string, space_characters)))
{
const char *p;
// If there are non-space characters between what we found
// and what we started from, print them.
//
if (next_space != string)
{
for (p=string; p<next_space; p++)
{
if(fputc(*p, f) == EOF)
{
return -1;
}
}
// Print a newline
//
if (fputc('\n', f) == EOF)
{
return -1;
}
}
// Advance next_space until we hit a non-space character
//
while (*next_space && strchr(space_characters, *next_space))
{
next_space++;
}
// Advance the string
//
string = next_space;
}
// Handle the case where there are no spaces left in the string
//
if (*string)
{
if (fprintf(f, "%s\n", string) < 0)
{
return -1;
}
}
return 0;
}
you can scan the char array looking for the token if you found it just print new line else print the char.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
int len = strlen(s);
char delim =' ';
for(int i = 0; i < len; i++) {
if(s[i] == delim) {
printf("\n");
}
else {
printf("%c", s[i]);
}
}
free(s);
return 0;
}
char arr[50];
gets(arr);
int c=0,i,l;
l=strlen(arr);
for(i=0;i<l;i++){
if(arr[i]==32){
printf("\n");
}
else
printf("%c",arr[i]);
}

Resources