Edit: I later discovered this issue mainly stemmed from my confusion with sizeof , and replacing it with strlen was pretty much my solution. My answer (scroll down) presents a decent but simple example of strtok if you're interested as well.
So I've been trying to get a program working where I input a list of words separated by commas, and it subsequently outputs those words, line by line, and removing any spaces.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define delim ","
int main() {
//variable declaration:
char words[100];
char *word;
char tempWord[100];
int n;
//gets input assinged to "words":
puts("\nEnter a list of words separated by commas.\n");
fgets(words, sizeof(words), stdin);
//sets up the first word in strtok
word = strtok(words, delim);
//loops so long as the word isn't null (reaching the last word)
while (word != NULL) {
puts("\n");
//checks if each character in the word is a space (and ignores them if they are)
for (n = 0; n < sizeof(word); ++n) {
//for some reason can't directly use word (probably because it's a pointer)
//so have to copy it to a temporary value
strcpy(tempWord, word);
//don't print if it's a space
if (!isspace(tempWord[n])) printf("%c", tempWord[n]);
}
//moves to next word
word = strtok(NULL, delim);
}
return(0);
}
by inputting "LETS, FREAKINGG, GOOOOOOOOOOOO", I seem to encounter an issue:
(running the program):
Enter a list of words separated by commas.
(input) >>>LETS, FREAKINGG, GOOOOOOOOOOOO
LETS
FREAKIN
GOOOOOO
It seems depending on the size of the first word, it sets an character limit to be no more than 3 beyond that for subsequent words. Can anyone explain why this is happening?
Thank you #rici and #kaylum for the responses!
For some reason I was getting some errors before when not copying the word to a temporary variable, I'm not entirely sure what I was thinking by putting it inside of the for-loop but it works without! Just a very small brain moment...
I believe my confusion was around sizeof(), and replacing with strlen() is a complete fix for this!
Here's the amended (working) code:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define delim ","
int main() {
//variable declaration:
char words[100];
char *word;
int n;
//gets input assinged to "words":
puts("\nEnter a list of words separated by commas.\n");
fgets(words, sizeof(words), stdin);
//sets up the first word in strtok
word = strtok(words, delim);
//loops so long as the word isn't null (reaching the last word)
while (word != NULL) {
puts("\n");
//checks if each character in the word is a space (and ignores them if they are)
for (n = 0; n < strlen(word); ++n) {
//don't print if it's a space
if (!isspace(word[n])) printf("%c", word[n]);
}
//moves to next word
word = strtok(NULL, delim);
}
return(0);
}
and the output:
Enter a list of words separated by commas.
(input) >>>LETS, FREAKINGG, GOOOOOOOOOOOO
LETS
FREAKINGG
GOOOOOOOOOOOO
Thank you very much for the help!
Related
So I am making a small program and I wanted to ask how do I break a string into an array of words? I tried strtok but I what if there is a tab or something like that?
char *S = "This is a cool sentence";
char *words[];
words[0] = "This";
words[1] = "is";
// etc.
Can anyone help?
strtok works just fine even if there are tabs in between. Setting the delimiter(the second argument of strtok) to space(" ") ignores all consecutive spaces also. For further clarification refer to the below code.
EDITED: As #Chris Dodd has rightly mentioned, you should add \t to the delimiter strtok(str, " \t") to ignore tabs also.
#include<stdio.h>
#include <string.h>
int main() {
// Initialize str with a string with bunch of spaces and tabs in between
char str[100] = "Hi! This is a long sentence.";
// Get the first word
char* word = strtok(str, " \t");
printf("First word: %s", word); // prints `First word: Hi!`
// Declare an array of string to store each word
char * words[20];
int count = 0;
// Loop through the string to get rest of the words
while (1) {
word = strtok(NULL, " \t");
if(!word) break; // breaks out of the loop, if no more word is left
words[count] = word; // Store it in the array
count++;
}
int index = 0;
// Loop through words and print
while(index < count) {
// prints a comma after previous word and then the next word in a new line
printf(",\n%s", words[index]);
index++;
}
return 0;
}
Output (note that there is no space printed between the words and commas) :
First word: Hi!,
This,
is,
a,
long,
sentence.
Certainly making no claims on efficiency/elegance, but this is a possible implementation to split words on all whitespace. This only prints out the words, it does not save them off to an array or elsewhere, I'll leave that as an exercise to you:
#include <stdio.h>
#include <ctype.h>
void printOrSaveWord(char curWord[], size_t curWordIndex)
{
curWord[curWordIndex] = '\0';
if (curWordIndex > 0)
{
printf("%s\n", curWord);
}
}
void separateWords(const char* sentence)
{
char curWord[256] = { 0 };
size_t curWordIndex = 0;
for (size_t i=0; sentence[i]; i++)
{
// skip all white space
if (isspace(sentence[i]))
{
// found a space, print out the word. This where you would
// add it to an array or otherwise save it, I'll leave that
// task to you
printOrSaveWord(curWord, curWordIndex);
// reset index
curWordIndex = 0;
}
else
{
curWord[curWordIndex++] = sentence[i];
}
}
// catch the ending case
printOrSaveWord(curWord, curWordIndex);
}
Demonstration
I have a simple C-based code to read a file. Read the input line by line. Tokenize the line and prints the current token. My problem is, I want to print the next token if some conditions are satisfied. Do you have any idea how to do it. I really need your help for this project. Thank you
Here is the code:
main(){
FILE *input;
FILE *output;
//char filename[100];
const char *filename = "sample1.txt";
input=fopen(filename,"r");
output=fopen("test.st","w");
char word[1000];
char *token;
int num =0;
char var[100];
fprintf(output,"LEXEME, TOKEN");
while( fgets(word, 1000, input) != NULL ){ //reads a line
token = strtok(word, " \t\n" ); // tokenize the line
while(token!=NULL){ // while line is not equal to null
fprintf(output,"\n");
if (strcmp(token,"SIOL")==0)
fprintf(output,"SIOL, SIOL", token);
else if (strcmp(token,"DEFINE")==0)
fprintf(output,"DEFINE, DEFINE", token);
else if (strcmp(token,"INTEGER")==0){
fprintf(output,"INTEGER, INTEGER");
strcpy(var,token+1);
fprintf(output,"\n%s,Ident",var);
}
else{
printf("%s\n", token);
}
token = strtok(NULL, " \t\n" ); //tokenize the word
}}fclose(output);return 0;}
Continuing from my comment. I'm not sure I completely understand what you need, but if you have the string:
"The quick brown fox";
And, you want to tokenize the string, printing the next word, only if a condition concerning the current word is met, then you need to adjust your thinking just a bit. In your example, you want to print the next word "quick", only if the current word is "The".
The adjustment in thinking is how you look at the test. Instead of thinking about printing the next word if the current matches some condition, you need to save the last word, and only print the current if the last word matches some condition -- "The" in your example.
To handle that situation, you can make use of a statically declared character array of at least 47 characters (the longest word in Merriam-Websters Unabridged Dictionary is 46-character). I'll use 48 in the example below. You may be tempted to just save a pointer to the last word, but when using strtok there is no guarantee that the memory address returned by the previous iteration is preserved -- so make a copy of the word.
Putting the pieces together, you could do something like the following. It saves the prior token in last and then compares the current word to the last and prints the current word if last == "The":
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXW 48
int main (void) {
char str[] = "The quick brown fox";
char last[MAXW] = {0};
char *p;
for (p = strtok (str, " "); p; p = strtok (NULL, " "))
{
if (*last && strcmp (last, "The") == 0)
printf (" '%s'\n", p);
strncpy (last, p, MAXW);
}
return 0;
}
Output
$ ./bin/str_chk_last
'quick'
Let me know if you have any questions.
Test Explanation
As written in the comment *last is simply shorthand for last[0]. So the first part of the test, *last is just testing if ((last[0] != 0) && ... Since last was initially declared and initialized:
char last[MAXW] = {0};
All chars in last are 0 for the first pass through the loop. By including the check last[0] != 0, that just causes the printf to be skipped the first time the for loop executes. The longhand for the test would look like:
if ((last[0] != 0) && strcmp (last, "The") == 0)
printf (" '%s'\n", p);
Which in pseudo code just says:
if (NOT first iteration && last == "The")
printf (" '%s'\n", p);
Let me know if that doesn't make sense.
It is easy to achieve with strtok function. Note that if you put null pointer as the first argument, the function continues scanning the same string where a previous successful call to the function ended. So if you need next token just call
char* token = strtok(NULL, delimeters);
See small example below
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "The quick brown fox";
// split str by space
char* token = strtok(str, " ");
// if a token is found
if(token != NULL) {
// print current token
printf("%s\n", token);
// if token is "The"
if(strcmp(token, "The") == 0) {
// print next token
printf("%s\n", strtok(NULL, " "));
}
}
return 0;
}
The output will be
The
quick
Hi I have this program that reads a text file line by line and it's supposed to output the longest word in each sentence. Although it works to a degree, it's overwriting the biggest word with an equally big word which is something I am not sure how to fix. What do I need to think about when editing this program? Thanks
//Program Written and Designed by R.Sharpe
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "memwatch.h"
int main(int argc, char** argv)
{
FILE* file;
file = fopen(argv[1], "r");
char* sentence = (char*)malloc(100*sizeof(char));
while(fgets(sentence, 100, file) != NULL)
{
char* word;
int maxLength = 0;
char* maxWord;
maxWord = (char*)calloc(40, sizeof(char));
word = (char*)calloc(40, sizeof(char));
word = strtok(sentence, " ");
while(word != NULL)
{
//printf("%s\n", word);
if(strlen(word) > maxLength)
{
maxLength = strlen(word);
strcpy(maxWord, word);
}
word = strtok(NULL, " ");
}
printf("%s\n", maxWord);
maxLength = 0; //reset for next sentence;
}
return 0;
}
My textfile that the program is accepting contains this
some line with text
another line of words
Jimmy John took the a apple and something reallyreallylongword it was nonsense
and my output is this
text
another
reallyreallylongword
but I would like the output to be
some
another
reallyreallylongword
EDIT: If anyone plans on using this code, remember when you fix the newline character issue don't forget about the null terminator. This is fixed by setting
sentence[strlen(sentence)-1] = 0 which in effect gets rid of newline character and replaces it with null terminating.
You get each line by using
fgets(sentence, 100, file)
The problem is, the new line character is stored inside sentence. For instance, the first line is "some line with text\n", which makes the longest word "text\n".
To fix it, remove the new line character every time you get sentence.
My program reads in a text file line by line and prints out the largest word in each sentence line. However, it sometimes prints out previous highest words although they have nothing to do with the current sentence and I reset my char array at the end of processing each line. Can someone explain to me what is happening in memory to make this happen? Thanks.
//Program Written and Designed by R.Sharpe
//LONGEST WORD CHALLENGE
//Purpose: Find the longest word in a sentence
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "memwatch.h"
int main(int argc, char** argv)
{
FILE* file;
file = fopen(argv[1], "r");
char* sentence = (char*)malloc(100*sizeof(char));
while(fgets(sentence, 100, file) != NULL)
{
//printf("%s\n", sentence);
char sub[100];
char maxWord[100];
strncpy(sub, sentence, strlen(sentence)-1);
strcpy(sentence, sub);
char* word;
int maxLength = 0;
word = strtok(sentence, " ");
while(word != NULL)
{
if(strlen(word) > maxLength)
{
maxLength = strlen(word);
strcpy(maxWord, word);
//printf("%s\n", maxWord);
}
word = strtok(NULL, " ");
}
printf("%s\n", maxWord);
memset(maxWord, 0, sizeof(char));
maxLength = 0; //reset for next sentence;
}
free(sentence);
return 0;
}
my text file contains . .
some line with text
another line of words
Jimmy John took the a apple and something reallyreallylongword it was nonsense
test test BillGatesSteveJobsWozWasMagnificant
a b billy
the output of the program is . .
some
another
reallyreallylongword
BillGatesSteveJobsWozWasMagnificantllyreallylongword
BillGatesSteveJobsWozWasMagnificantllyreallylongword //should be billy
Also when I arbitrarily change the length of the 5th sentence the last word sometimes
comes out to be "reallyreallylongword" which is odd.
EDIT: Even when I comment MEMSET out I still get the same result so it may not have anything to do with memset but not completely sure
Trailing NULL bytes (\0) are the bane of string manipulation. You have a copy sequence that is not quite doing what you desire of it:
strncpy(sub, sentence, strlen(sentence)-1);
strcpy(sentence, sub);
Sentence is copied into sub, and then back again. Except, strncpy does not copy the '\0' out of sentence. When you copy the string from sub back into sentence, you are copying an unknown length of data back into sentence. Because the stack is being reused and the char arrays are uninitialized, the data is likely residing there from the previous iteration and thus being seen by the next execution.
Adding the following between the two strcpys fixes the problem:
sub[strlen(sentence) - 1] = '\0';
You've got a missing null terminator.
char sub[100];
char maxWord[100];
strncpy(sub, sentence, strlen(sentence)-1);
strcpy(sentence, sub);
When you strncpy, if src is longer than the number of characters to be copied, no null terminator is added. You've guaranteed this is the case, so sub has no terminator, and you're rapidly running into behavior you don't want. It looks like you're trying to trim the last character from the string; the easier way to do that is simply set the character at index strlen(sentence)-1 to '\0'.
This is bad:
strncpy(sub, sentence, strlen(sentence)-1);
strcpy(sentence, sub);
The strncpy function does not null-terminate its buffer if the source string doesn't fit. By doing strlen(sentence)-1 you guaranteed it doesn't fit. Then the strcpy causes undefined behaviour because sub isn't a string.
My advice is to not use strncpy, it is almost never a good solution to a problem. Use strcpy or snprintf.
In this case you never even use sub so you could replace these lines with:
sentence[ strlen(sentence) - 1 ] = 0;
which has the effect of removing the \n on the end that was left by fgets. (If the input was longer than 100 then this deletes a character of input).
Find the corrected code in below
int main(int argc, char** argv)
{
FILE* file;
file = fopen(argv[1], "r");
char sub[100];
char maxWord[100];
char* word;
int maxLength = 0;
char* sentence = (char*)malloc(100*sizeof(char));
while(fgets(sentence, 100, file) != NULL)
{
maxLength = 0;
strncpy(sub, sentence, strlen(sentence)-1);
sub[strlen(sentence) - 1] = '\0'; //Fix1
strcpy(sentence, sub);
word = strtok(sentence, " ");
while(word != NULL)
{
if(strlen(word) > maxLength)
{
maxLength = strlen(word);
strcpy(maxWord, word);
}
word = strtok(NULL, " ");
}
printf("%s\n", maxWord);
memset(maxWord, 0, sizeof(char));
maxLength = 0; //reset for next sentence;
}
free(sentence);
fclose (file); //Fix2
return 0;
}
Ensure that the file is closed at the end. It is good practice.
Well, I declared a global array of chars like this char * strarr[];
in a method I am tokenising a line and try to put everything into that array like this
*line = strtok(s, " ");
while (line != NULL) {
*line = strtok(NULL, " ");
}
seems like this is not working.. How can I fix it?
Thanks
Any number of things could be going wrong with the code you haven't shown us, such as undefined behaviour by strtoking a string constatnt, or getting your parameters wrong when calling the function.
But the most likely problem from the code we can see is the use of *line instead of line, assuming that line is of type char *.
Use the following code as a baseline:
#include <stdio.h>
#include <string.h>
int main (void) {
char str[] = "My name is paxdiablo";
// Start tokenising words.
char *line = strtok (str, " ");
while (line != NULL) {
// Print current token and get next word.
printf ("[%s]\n", line);
line = strtok(NULL, " ");
}
return 0;
}
This outputs:
[My]
[name]
[is]
[paxdiablo]
and should be easily modifiable into something you can use.
Be aware that, if you're trying to save the character pointers returned from strtok (which would make sense for using *line), they are transitory and will not be what you expect after you're done. That's because modifications are made in-place within the source string. You can do it with something like:
#include <stdio.h>
#include <string.h>
int main (void) {
char *word[4]; // The array of words.
size_t i; // General counter.
size_t nextword = 0; // For preventing array overflow.
char str[] = "My name is paxdiablo";
// Start tokenising.
char *line = strtok (str, " ");
while (line != NULL) {
// If array not full, duplicate string to array and advance index.
if (nextword < sizeof(word) / sizeof(*word))
word[nextword++] = strdup (line);
// Get next word.
line = strtok(NULL, " ");
}
// Print out all stored words.
for (i = 0; i < nextword; i++)
printf ("[%s]\n", word[i]);
return 0;
}
Note the specific size of the word array in that code above. The use of char * strarr[] in your code, along with the message tentative array definition assumed to have one element is almost certainly where the problem lies.
If your implementation doesn't come with a strdup, you can get a reasonably-priced one here :-)