How would i Use strtok to compare word by word

How would i Use strtok to compare word by word - c

I've been reading up on strtok and thought it would be the best way for me to compare two files word by word. So far i can't really figure out how i would do it though
Here is my function that perfoms it:
int wordcmp(FILE *fp1, FILE *fp2)
{
char *s1;
char *s2;
char *tok;
char *tok2;
char line[BUFSIZE];
char line2[BUFSIZE];
char comp1[BUFSIZE];
char comp2[BUFSIZE];
char temp[BUFSIZE];
int word = 1;
size_t i = 0;
while((s1 = fgets(line,BUFSIZE, fp1)) && (s2 = fgets(line2,BUFSIZE, fp2)))
{
;
}
tok = strtok(line, " ");
tok2 = strtok(line, " ");
while(tok != NULL)
{
tok = strtok (NULL, " ");
}
return 0;
}
Don't mind the unused variables, I've been at this for 3 hours and have tried all possible ways I can think of to compare the values of the first and second strtok. Also I would to know how i would check which file reaches EOF first.
when i tried
if(s1 == EOF && s2 != EOF)
{
return -1;
}
It returns -1 even when the files are the same! Is it because in order for it to reach the if statement outside of the loop both files have reached EOF which makes the program always go to this if statement?
Thanks in advance!

If you want to check if files are same try doing,
do {
s1 = fgetc(fp1);
s2 = fgetc(fp2);
if (s1 == s2) {
if (s1 == EOF) {
return 1; // RETURN TRUE
}
continue;
}
else {
return -1; // RETURN FALSE
}
} while (1);
Good Luck :)

When you use strtok() you typically use code like this:
tok = strtok(line, " ");
while (NULL != tok)
{
tok = strtok(NULL, " ");
}
The NULL in the call in the loop tells strtok to continue from after the previously found token until it finds the null terminating character in the value you originally passed (line) or until there are no more tokens. The current pointer is stored in the run time library, and once strtok() returns NULL to indicate no more tokens any more calls to strtok() using NULL as the first parameter (to continue) will result in NULL. You need to call it with another value (e.g. another call to strtok(line, " ")) to get it to start again.
What this means is that to use strtok on two different strings at the same time you need to manually update the string position and pass in a modified value on each call.
tok = strtok(line, " ");
tok2 = strtok(line2, " ");
while (NULL != tok && NULL != tok2)
{
/* Do stuff with tok and tok2 here */
if (strcmp(tok, tok2)... {}
/* Update strtok pointers */
tok += strlen(tok) + 1;
tok2 += strlen(tok2) + 1;
/* Get next token */
tok = strtok(tok, " ");
tok2 = strtok(tok2, " ");
}
You'll still need to add logic for determining whether lines are different - you've not said whether the files are equivalent if a line break occurs at different position but the words surrounding it are the same. I assume it should be, given your description, but it makes the logic more awkward as you only need to perform the initial fgets() and strtok() for a file if you don't already have a token. You also need to look at how files are read in. Currently your first while loop just reads lines until the end of the file without processing them.

Related

C: Token system tokenize, but doing if statements fail

I have this little function, that is suppose to parse tokens.
void LWDL_Parse(LWDL_Data data, LWDL_State state) {
char ch;
LWDL_string contents = "lwdl_data\n";
LWDL_Array tokens;
LWDL_TOOL_INIT_ARRAY( & tokens, 5); // 5 is starting size.
while ((ch = fgetc(state.LWDL_File)) != EOF) {
contents = LWDL_TOOL_AppendCharacters(contents, ch);
}
LWDL_string chunks;
const char remove[4] = " \n";
chunks = strtok(contents, remove);
while (chunks != NULL) {
chunks = strtok(NULL, remove);
if (chunks != NULL){
LWDL_TOOL_INSERT_ARRAY( & tokens, chunks);
}
}
LWDL_TOOL_FREE_ARRAY(&tokens);
}
but doing if statements with the tokens array or the chunks, fail.
if (tokens.array[0] == "token"){
printf("works!\n");
}
any clue on how to fix this. If I do a for loop on all elements, they all get parsed correctly.

you need to use a compare function like strcmp instead of tokens.array[0] == "token"

I expect this:
if (chunks != NULL){
LWDL_TOOL_INSERT_ARRAY( & tokens, chunks);
}
should be before the second call to strtok()
otherwise the first token is being ignored

Seg Fault when reading simple CSV file - C

I am reading a 2 columned csv file into an array of structs:
struct unused_s{
char col1[MAX_ARG_LENGTH];
char col2[MAX_ARG_LENGTH];
};
struct unused_s unused[MAX_USEABLE];
But I am getting a "Segmentation fault: 11" during execution. I have tried my best to debug this myself through reallocation of memory but I'm afraid my abilities are not up to the task. I have, however, pinpointed that the error is occuring somewhere in this section of code:
void readCSV(FILE *file){
int i = 0;
char line[MAX_LINE_LENGTH];
while (fgets(line, 1024, file))
{
char* tmp = strdup(line);
strcpy(unused[i].col1, getunused(tmp, FIRST_COLUMN));
strcpy(unused[i].col2, getunused(tmp, SECOND_COLUMN));
free(tmp);
i++;
}
fclose(file);
}
const char* getunused(char* line, int n)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--n)
return tok;
}
return NULL;
}
Any help solving this/pointing me in the right direction to solve this myself would be greatly appreciated!

As noted in the comments by John3136, you are returning NULL from getunused(), e.g.
const char* getunused(char* line, int n)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--n)
return tok;
}
return NULL;
}
From your calls to strtok, it appears you have an input file that will result in tmp similar to:
tmp = "somevalue; othervalue\n"
After your 1st call to getunused(), strtok will have replaced each delimiter in tmp with a nul-character in order to tokenize the string, so tmp will now contain:
tmp = "somevalue\0 othervalue\0"
When you call getunused(tmp, SECOND_COLUMN) (where SECOND_COLUMN is presumably 2), !--n tests false and NULL is returned.
Why Tokenize?
Rarely will you need to tokenize fields from a .csv file (or in your case a semi-colon separated file) Why? That is the whole purpose of a separated values file -- so you can read the file as input using a formatted input function to separate the fields rather than tokenizing on delimiters. (which you can do -- it's just not generally necessary). In your case, if your .csv file format is as set out above, then you can eliminate getunused entirely and simply use sscanf to separate the input strings, e.g.
void readCSV (FILE *file) {
int i = 0;
while (fgets(line, 1024, file))
if (sscanf (line "%49[^;] %49[^;\n]", unused[i].col1, unused[i].col2) == 2)
i++;
fclose(file);
}
(note: as in my comment, you should include the field-width modifier of MAX_ARG_LENGTH-1 (the number) as part of your format-specifier -- as edited above after your last comment)
Also, if your second value is terminated by a '\n', then drop the ';' from the character class, e.g. %49[^\n] will do for the 2nd value.

Access the next word/string

I have a simple C-based code to read a file. Read the input line by line. Tokenize the line and prints the current token. My problem is, I want to print the next token if some conditions are satisfied. Do you have any idea how to do it. I really need your help for this project. Thank you
Here is the code:
main(){
FILE *input;
FILE *output;
//char filename[100];
const char *filename = "sample1.txt";
input=fopen(filename,"r");
output=fopen("test.st","w");
char word[1000];
char *token;
int num =0;
char var[100];
fprintf(output,"LEXEME, TOKEN");
while( fgets(word, 1000, input) != NULL ){ //reads a line
token = strtok(word, " \t\n" ); // tokenize the line
while(token!=NULL){ // while line is not equal to null
fprintf(output,"\n");
if (strcmp(token,"SIOL")==0)
fprintf(output,"SIOL, SIOL", token);
else if (strcmp(token,"DEFINE")==0)
fprintf(output,"DEFINE, DEFINE", token);
else if (strcmp(token,"INTEGER")==0){
fprintf(output,"INTEGER, INTEGER");
strcpy(var,token+1);
fprintf(output,"\n%s,Ident",var);
}
else{
printf("%s\n", token);
}
token = strtok(NULL, " \t\n" ); //tokenize the word
}}fclose(output);return 0;}

Continuing from my comment. I'm not sure I completely understand what you need, but if you have the string:
"The quick brown fox";
And, you want to tokenize the string, printing the next word, only if a condition concerning the current word is met, then you need to adjust your thinking just a bit. In your example, you want to print the next word "quick", only if the current word is "The".
The adjustment in thinking is how you look at the test. Instead of thinking about printing the next word if the current matches some condition, you need to save the last word, and only print the current if the last word matches some condition -- "The" in your example.
To handle that situation, you can make use of a statically declared character array of at least 47 characters (the longest word in Merriam-Websters Unabridged Dictionary is 46-character). I'll use 48 in the example below. You may be tempted to just save a pointer to the last word, but when using strtok there is no guarantee that the memory address returned by the previous iteration is preserved -- so make a copy of the word.
Putting the pieces together, you could do something like the following. It saves the prior token in last and then compares the current word to the last and prints the current word if last == "The":
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXW 48
int main (void) {
char str[] = "The quick brown fox";
char last[MAXW] = {0};
char *p;
for (p = strtok (str, " "); p; p = strtok (NULL, " "))
{
if (*last && strcmp (last, "The") == 0)
printf (" '%s'\n", p);
strncpy (last, p, MAXW);
}
return 0;
}
Output
$ ./bin/str_chk_last
'quick'
Let me know if you have any questions.
Test Explanation
As written in the comment *last is simply shorthand for last[0]. So the first part of the test, *last is just testing if ((last[0] != 0) && ... Since last was initially declared and initialized:
char last[MAXW] = {0};
All chars in last are 0 for the first pass through the loop. By including the check last[0] != 0, that just causes the printf to be skipped the first time the for loop executes. The longhand for the test would look like:
if ((last[0] != 0) && strcmp (last, "The") == 0)
printf (" '%s'\n", p);
Which in pseudo code just says:
if (NOT first iteration && last == "The")
printf (" '%s'\n", p);
Let me know if that doesn't make sense.

It is easy to achieve with strtok function. Note that if you put null pointer as the first argument, the function continues scanning the same string where a previous successful call to the function ended. So if you need next token just call
char* token = strtok(NULL, delimeters);
See small example below
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "The quick brown fox";
// split str by space
char* token = strtok(str, " ");
// if a token is found
if(token != NULL) {
// print current token
printf("%s\n", token);
// if token is "The"
if(strcmp(token, "The") == 0) {
// print next token
printf("%s\n", strtok(NULL, " "));
}
}
return 0;
}
The output will be
The
quick

Segmentation fault on line with fgets() - C

I have this code in my program:
char* tok = NULL;
char move[100];
if (fgets(move, 100, stdin) != NULL)
{
/* then split into tokens using strtok */
tok = strtok(move, " ");
while (tok != NULL)
{
printf("Element: %s\n", tok);
tok = strtok(NULL, " ");
}
}
I have tried adding printf statements before and after fgets, and the one before gets printed, but the one after does not.
I cannot see why this fgets call is causing a segmentation failure.
If someone has any idea, I would much appreciate it.
Thanks
Corey

The strtok runtime function works like this
the first time you call strtok you provide a string that you want to tokenize
char s[] = "this is a string";
in the above string space seems to be a good delimiter between words so lets use that:
char* p = strtok(s, " ");
what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)
in order to get next token and to continue with the same string NULL is passed as first argument since strtok maintains a static pointer to your previous passed string:
p = strtok(NULL," ");
p now points to 'is'
and so on until no more spaces can be found, then the last string is returned as the last token 'string'.
more conveniently you could write it like this instead to print out all tokens:
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}
EDITED HERE:
If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.

tokenizing a string twice in c with strtok()

I'm using strtok() in c to parse a csv string. First I tokenize it to just find out how many tokens there are so I can allocate a string of the correct size. Then I go through using the same variable I used last time for tokenization. Every time I do it a second time though it strtok(NULL, ",") returns NULL even though there are still more tokens to parse. Can somebody tell me what I'm doing wrong?
char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
count++;
tok = strtok(NULL, ",");
}
//allocate array
tok = strtok(buffer, ",");
while(tok != NULL) {
//do other stuff
tok = strtok(NULL, ",");
}
So on that second while loop it always ends after the first token is found even though there are more tokens. Does anybody know what I'm doing wrong?

strtok() modifies the string it operates on, replacing delimiter characters with nulls. So if you want to use it more than once, you'll have to make a copy.

There's not necessarily a need to make a copy - strtok() does modify the string it's tokenizing, but in most cases that simply means the string is already tokenized if you want to deal with the tokens again.
Here's your program modified a bit to process the tokens after your first pass:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int i;
char buffer[] = "some, string with , tokens";
char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
count++;
tok = strtok(NULL, ",");
}
// walk through the tokenized buffer again
tok = buffer;
for (i = 0; i < count; ++i) {
printf( "token %d: \"%s\"\n", i+1, tok);
tok += strlen(tok) + 1; // get the next token by skipping past the '\0'
tok += strspn(tok, ","); // then skipping any starting delimiters
}
return 0;
}
Note that this is unfortunately trickier than I first posted - the call to strspn() needs to be performed after skipping the '\0' placed by strtok() since strtok() will skip any leading delimiter characters for the token it returns (without replacing the delimiter character in the source).

Use strsep - it actually updates your pointer. In your case you would have to keep calling NULL versus passing in the address of your string. The only issue with strsep is if it was previously allocated on the heap, keep a pointer to the beginning and then free it later.
char *strsep(char **string, char *delim);
char *string;
char *token;
token = strsep(&string, ",");
strtok is used in your normal intro to C course - use strsep, it's much better. :-)
No getting confused on "oh shit - i have to pass in NULL still cuz strtok screwed up my positioning."

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How would i Use strtok to compare word by word - c

If you want to check if files are same try doing, do { s1 = fgetc(fp1); s2 = fgetc(fp2); if (s1 == s2) { if (s1 == EOF) { return 1; // RETURN TRUE } continue; } else { return -1; // RETURN FALSE } } while (1); Good Luck :)

Related

C: Token system tokenize, but doing if statements fail

Seg Fault when reading simple CSV file - C

Access the next word/string

Segmentation fault on line with fgets() - C

tokenizing a string twice in c with strtok()

Categories

Resources