C: Token system tokenize, but doing if statements fail - arrays

I have this little function, that is suppose to parse tokens.
void LWDL_Parse(LWDL_Data data, LWDL_State state) {
char ch;
LWDL_string contents = "lwdl_data\n";
LWDL_Array tokens;
LWDL_TOOL_INIT_ARRAY( & tokens, 5); // 5 is starting size.
while ((ch = fgetc(state.LWDL_File)) != EOF) {
contents = LWDL_TOOL_AppendCharacters(contents, ch);
}
LWDL_string chunks;
const char remove[4] = " \n";
chunks = strtok(contents, remove);
while (chunks != NULL) {
chunks = strtok(NULL, remove);
if (chunks != NULL){
LWDL_TOOL_INSERT_ARRAY( & tokens, chunks);
}
}
LWDL_TOOL_FREE_ARRAY(&tokens);
}
but doing if statements with the tokens array or the chunks, fail.
if (tokens.array[0] == "token"){
printf("works!\n");
}
any clue on how to fix this. If I do a for loop on all elements, they all get parsed correctly.

you need to use a compare function like strcmp instead of tokens.array[0] == "token"

I expect this:
if (chunks != NULL){
LWDL_TOOL_INSERT_ARRAY( & tokens, chunks);
}
should be before the second call to strtok()
otherwise the first token is being ignored

Related

C Programming - Space Character Not Detected

Mainly a Java/Python coder here. I am coding a tokenizer for an assignment. (I explicitly cannot use strtok().) The code below is meant to separate the file text into lexemes (aka words and notable characters).
char inText[256];
fgets(inText, 256, inf);
char lexemes[256][256];
int x = 0;
char string[256] = "\0";
for(int i=0; inText[i] != '\0'; i++)
{
char delims[] = " (){}";
char token = inText[i];
if(strstr(delims, &inText[i]) != NULL)
{
if(inText[i] == ' ') // <-- Problem Code
{
if(strlen(string) > 0)
{
strcpy(lexemes[x], string);
x++;
strcpy(string, "\0");
(*numLex)++;
}
}
else if(inText[i] == '(')
{
if(strlen(string) > 0)
{
strcpy(lexemes[x], string);
x++;
strcpy(string, "\0");
(*numLex)++;
}
strcpy(lexemes[x], &token);
x++;
(*numLex)++;
}
else
{
strcpy(lexemes[x], &token);
x++;
(*numLex)++;
}
}
else
{
strcat(string, (char[2]){token});
}
}
For some odd reason, my code cannot recognize the space character as ' ', as 32, or by using isspace(). There are no error messages, and I have confirmed that the code is reaching the space in the text.
This is driving me insane. Does anyone have any idea what is happening here?
You are using the function strstr incorrectly.
if(strstr(delims, &inText[i]) != NULL)
the function searches exactly the string pointed to by the pointer expression &inText[i] in the string " (){}".
Instead you need to use another function that is strcspn.
Something like
i = strcspn( &inText[i], delims );
or you can introduce another variable like for example
size_t n = strcspn( &inText[i], delims );
depending on the logic of the processing you are going to follow.
Or more probably you need to use the function strchr like
if(strchr( delims, inText[i]) != NULL)

C - Program not detecting blank spaces

I want my program to read a file containing words separated by blank spaces and then prints words one by one. This is what I did:
char *phrase = (char *)malloc(LONGMAX * sizeof(char));
char *mot = (char *)malloc(TAILLE * sizeof(char));
FILE *fp = NULL;
fp = fopen("mots.txt", "r");
if (fp == NULL) {
printf("err ");
} else {
fgets(phrase, LONGMAX, fp);
while (phrase[i] != '\0') {
if (phrase[i] != " ") {
mot[m] = phrase[i];
i++;
m++;
} else {
printf("%s\n", phrase[i]);
mot = "";
}
}
}
but it isn't printing anything! Am I doing something wrong? Thanks!
The i in the following:
while (phrase[i]!='\0'){
Should be initialized to 0 before being used, then incremented as you iterate through the string.
You have not shown where/how it is created.
Also in this line,
if(phrase[i]!=" "){
the code is comparing a char: (phrase[i]) with a string: ( " " )
// char string
if(phrase[i] != " " ){
change it to:
// char char
if(phrase[i] != ' '){
//or better yet, include all whitespace:
if(isspace(phrase[i]) {
There is no error checking in the following, but it is basically your code with modifications. Read comments for explanation on edits to fgets() usage, casting return of malloc(), how and when to terminate the output buffer mot, etc.:
This performs the following: read a file containing words separated by blank spaces and then prints words one by one.
int main(void)
{
int i = 0;
int m = 0;
char* phrase=malloc(LONGMAX);//sizeof(char) always == 0
if(phrase)//test to make sure memory created
{
char* mot=malloc(TAILLE);//no need to cast the return of malloc in C
if(mot)//test to make sure memory created
{
FILE* fp=NULL;
fp=fopen("_in.txt","r");
if(fp)//test to make sure fopen worked
{//shortcut of what you had :) (left off the print err)
i = 0;
m = 0;
while (fgets(phrase,LONGMAX,fp))//fgets return NULL when no more to read.
{
while(phrase[i] != NULL)//test for end of last line read
{
// if(phrase[i] == ' ')//see a space, terminate word and write to stdout
if(isspace(phrase[i])//see ANY white space, terminate and write to stdout
{
mot[m]=0;//null terminate
if(strlen(mot) > 0) printf("%s\n",mot);
i++;//move to next char in phrase.
m=0;//reset to capture next word
}
else
{
mot[m] = phrase[i];//copy next char into mot
m++;//increment both buffers
i++;// "
}
}
mot[m]=0;//null terminate after while loop
}
//per comment about last word. Print it out here.
mot[m]=0;
printf("%s\n",mot);
fclose(fp);
}
free(mot);
}
free(phrase);
}
return 0;
}
phrase[i]!=" "
You compare character (phrase[i]) and string (" "). If you want to compare phrase[i] with space character, use ' ' instead.
If you want to compare string, use strcmp.
printf("%s\n",phrase[i]);
Here, you use %s for printing the string, but phrase[i] is a character.
Do not use mot=""; to copy string in c. You should use strcpy:
strcpy(mot, " ");
If you want to print word by word from one line of string. You can use strtok to split string by space character.
fgets(phrase,LONGMAX,fp);
char * token = strtok(phrase, " ");
while(token != NULL) {
printf("%s \n", token);
token = strtok(NULL, " ");
}
OT, your program will get only one line in the file because you call only one time fgets. If your file content of many line, you should use a loop for fgets function.
while(fgets(phrase,LONGMAX,fp)) {
// do something with pharse string.
// strtok for example.
char * token = strtok(phrase, " ");
while(token != NULL) {
printf("%s \n", token);
token = strtok(NULL, " ");
}
}
Your program has multiple problems:
the test for end of file is incorrect: you should just compare the return value of fgets() with NULL.
the test for spaces is incorrect: phrase[i] != " " is a type mismatch as you are comparing a character with a pointer. You should use isspace() from <ctype.h>
Here is a much simpler alternative that reads one byte at a time, without a line buffer nor a word buffer:
#include <ctype.h>
#include <stdio.h>
int main() {
int inword = 0;
int c;
while ((c = getchar()) != EOF) {
if (isspace(c)) {
if (inword) {
putchar('\n');
inword = 0;
}
} else {
putchar(c);
inword = 1;
}
}
if (inword) {
putchar('\n');
}
return 0;
}

C Reading a file of digits separated by commas

I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!
For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.
First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.
My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");

strtok changing value of pointer

i have the following code:
char* pathTokens;
char* paths;
paths = getFilePaths();
//printf("%s", paths);
pathTokens = strtok(paths, "\n");
updateFile(pathTokens, argv[1]);
and these variables in the same file as updateFile():
static FILE* file;
static char content[1024];
static char* token;
static int numChanges = 0;
static char newContent[1024];
Here is updateFile():
void updateFile(char pathTokens[], char searchWord[]) {
while(pathTokens != NULL) {
printf("Token: %s\n", pathTokens);
updateNewContent(pathTokens, searchWord);
pathTokens = strtok(NULL, "\n");
}
}
and updateNewContent():
static void updateNewContent(char fileName[], char searchWord[]) {
if(searchWord == NULL) {
printf("Please enter a word\n");
return;
}
numChanges = 0;
file = fopen(fileName, "r");
if(file == NULL) {
printf("Error opening file\n");
return;
}
while(fgets(content, 1024, file) != NULL) {
token = strtok(content, " ");
}
fclose(file);
}
whenever token = strtok(content, " "); is called, the value of pathTokens changes. if i comment it out, pathTokens maintains its original values. i don't want pathTokens to change, so why is strtok modifying it?
You are nesting strtok calls and strtok doesn't work like that. For nesting
calls you have to use strtok_r.
Also, when calling strtok, only the first time the source argument must be
used, for all subsequent calls, NULL has to be used. When you call strtok
again with an non-NULL argument, strtok "forgets" about the last state and
"restarts" parsing new content.
When you do in updateNewContent you are doing:
while(fgets(content, 1024, file) != NULL) {
token = strtok(content, " ");
}
strtok will forget about paths (the very first call). Also this loop is
pointless, you read a line, you split it for the first time, and then read the
next line, split it again, etc. You are doing nothing with token. When the
loop ends token will store the first word of the last line.
And then the function returns and you do
pathTokens = strtok(NULL, "\n");
Because you call it with NULL, it will look continue parsing the contents
pointed to by content, which seems to be a global variable.
whenever token = strtok(content, " "); is called, the value of pathTokens changes
Of course it does, after updateNewContent returns, you assign a new value to
it. What else did you expect?
I really don't know what you are trying to do here, to me that makes no sense.
If you need to do a strtok with a token that previously returned by another
strtok, then you have to use strtok_r.
Here is an example of how to nest strtok:
char line[] = "a:b:c,d:e:f,x:y:z";
char *s1, *s2, *token1, *token2, *in1, *in2;
in1 = line;
while(token1 = strtok_r(in1, ",", &s1))
{
in1 = NULL; // for subsequent calls
in2 = token1;
printf("First block: %s\n", token1);
while(token2 = strtok_r(in2, ":", &s2))
{
in2 = NULL; // for subsequent calls
printf(" val: %s\n", token2);
}
}
Output:
First block: a:b:c
val: a
val: b
val: c
First block: d:e:f
val: d
val: e
val: f
First block: x:y:z
val: x
val: y
val: z
If you use strtok() function it means that you want to divide your input into tokens. Like that when you given input strtok(pathtokens,"") ,divides into tokens and prints even though there is pointer variable

How would i Use strtok to compare word by word

I've been reading up on strtok and thought it would be the best way for me to compare two files word by word. So far i can't really figure out how i would do it though
Here is my function that perfoms it:
int wordcmp(FILE *fp1, FILE *fp2)
{
char *s1;
char *s2;
char *tok;
char *tok2;
char line[BUFSIZE];
char line2[BUFSIZE];
char comp1[BUFSIZE];
char comp2[BUFSIZE];
char temp[BUFSIZE];
int word = 1;
size_t i = 0;
while((s1 = fgets(line,BUFSIZE, fp1)) && (s2 = fgets(line2,BUFSIZE, fp2)))
{
;
}
tok = strtok(line, " ");
tok2 = strtok(line, " ");
while(tok != NULL)
{
tok = strtok (NULL, " ");
}
return 0;
}
Don't mind the unused variables, I've been at this for 3 hours and have tried all possible ways I can think of to compare the values of the first and second strtok. Also I would to know how i would check which file reaches EOF first.
when i tried
if(s1 == EOF && s2 != EOF)
{
return -1;
}
It returns -1 even when the files are the same! Is it because in order for it to reach the if statement outside of the loop both files have reached EOF which makes the program always go to this if statement?
Thanks in advance!
If you want to check if files are same try doing,
do {
s1 = fgetc(fp1);
s2 = fgetc(fp2);
if (s1 == s2) {
if (s1 == EOF) {
return 1; // RETURN TRUE
}
continue;
}
else {
return -1; // RETURN FALSE
}
} while (1);
Good Luck :)
When you use strtok() you typically use code like this:
tok = strtok(line, " ");
while (NULL != tok)
{
tok = strtok(NULL, " ");
}
The NULL in the call in the loop tells strtok to continue from after the previously found token until it finds the null terminating character in the value you originally passed (line) or until there are no more tokens. The current pointer is stored in the run time library, and once strtok() returns NULL to indicate no more tokens any more calls to strtok() using NULL as the first parameter (to continue) will result in NULL. You need to call it with another value (e.g. another call to strtok(line, " ")) to get it to start again.
What this means is that to use strtok on two different strings at the same time you need to manually update the string position and pass in a modified value on each call.
tok = strtok(line, " ");
tok2 = strtok(line2, " ");
while (NULL != tok && NULL != tok2)
{
/* Do stuff with tok and tok2 here */
if (strcmp(tok, tok2)... {}
/* Update strtok pointers */
tok += strlen(tok) + 1;
tok2 += strlen(tok2) + 1;
/* Get next token */
tok = strtok(tok, " ");
tok2 = strtok(tok2, " ");
}
You'll still need to add logic for determining whether lines are different - you've not said whether the files are equivalent if a line break occurs at different position but the words surrounding it are the same. I assume it should be, given your description, but it makes the logic more awkward as you only need to perform the initial fgets() and strtok() for a file if you don't already have a token. You also need to look at how files are read in. Currently your first while loop just reads lines until the end of the file without processing them.

Resources