Validating a CSV file with Strtok()

Validating a CSV file with Strtok() - c

I need to validate that a text file is in CSV format (i.e. that each digit is separated by a comma).
From reading online, it seems that people have conflicting views about it - but is Strtok() the best way to do this?
Any help would be great.

Your input seems so easy that I would probably just use a loop around fgetc(3); I'll sketch some pseudo-code here:
fd = fopen("file", "r");
int c;
while((c=fgetc(fd)) != EOF) {
switch(c) {
case '0':
case '1':
/* so on */
case '9':
handle_digit(c);
break;
case ',':
handle_comma();
break;
case '\n':
handle_newline();
break;
default:
fprintf(stderr, "mistaken input %c\n", c);
break;
}
}
fclose(fd);
You'll have to manage the input in the functions in a manner that may be a bit awkward if you're used to higher-level languages such as Ruby or Python where you'd just run line.split(',') to get a list of numbers, but that is pretty idiomatic C.
Of course, if this were a real problem, I'd probably prefer flex and bison, and write a tiny lexer and grammar, mostly because it would be a lot easier to extend in the future as needs change.
Update
With some additional criteria to check, the handle_{digit,comma,newline}() routines are easier to sketch. I'll sketch using global variables, but you could just as easily stuff these into a struct and pass them around from function to function:
enum seen {
NEWLINE,
COMMA,
DIGIT,
};
enum seen last_seen = NEWLINE;
handle_digit(int c) {
if (last_seen == DIGIT) {
/* error if numbers cannot have multiple digits
or construct a larger number if numbers can have
multiple digits */
} else if (last_seen == COMMA || last_seen == NEWLINE) {
/* start a new entry */
}
last_seen = DIGIT;
}
handle_comma() {
if (last_seen == COMMA) {
/* error */
} else if (last_seen == NEWLINE) {
/* error */
} else if (last_seen == DIGIT) {
/* end previous field */
}
last_seen = COMMA;
}
handle_newline() {
if (last_seen == NEWLINE) {
/* error */
} else if (last_seen == COMMA) {
/* error */
} else if (last_seen == DIGIT) {
/* end previous field */
}
last_seen = NEWLINE;
}
Add whichever checks you need to validate the contents according to whichever rules you have. You might wish to standardize the order and contents of the tests to ensure that you never forget one, even if it means you write a /* nop */ comment once or twice to remind yourself that something is fine.

Related

Problems when trying to skip '\n' in reading txt files

I wrote a fiarly small program to help with txt files formatting, but when I tried to read from the input files and skip unwanted '\n' I actually skipped the next character after '\n' instead.
The characters I work on in the sample file is like this:
abcde
abc
ab
abcd
And my code looks like this:
while (!feof(fp1)) {
ch = fgetc(fp1);
if (ch != '\n') {
printf("%c",ch);
}
else {
ch = fgetc(fp1); // move to the next character
if (ch == '\n') {
printf("%c",ch);
}
}
}
The expected result is
abcdeabc
ababcd
But I actually got
abcdebc
abbcd
I guess the problem is in ch = fgetc(fp1); // move to the next character
, but I just can't find a correct way to implement this idea.

Think of the flow of your code (lines numbered below):
1: while (!feof(fp1)) {
2: ch = fgetc(fp1);
3: if (ch != '\n') {
4: printf("%c",ch);
5: }
6: else {
7: ch = fgetc(fp1); // move to the next character
8: if (ch == '\n') {
9: printf("%c",ch);
10: }
11: }
12: }
When you get a newline followed by non-newline, the flow is (starting at the else line): 6, 7, 8, 10, 11, 12, 1, 2.
It's that execution of the final 2 in that sequence that effectively throws away the non-newline character that you had read at 7.
If your intent is to basically throw away single newlines and convert sequences of newlines (two or more) to a single one(a), you can use something like the following pseudo-code:
set numNewlines to zero
while not end-file:
get thisChar
if numNewlines is one or thisChar is not newline:
output thisChar
if thisChar is newline:
increment numNewlines
else:
set numNewlines to zero
This reads the character in one place, making it less likely that you'll inadvertently skip one due to confused flow.
It also uses the newline history to decide what gets printed. It only outputs a newline on the second occurrence in a sequence of newlines, ignoring the first and any after the second.
That means taht a single newline will never be echoed and any group of two or more will be transformed into one.
Some actual C code that demonstrates this(b) follows:
#include <stdio.h>
#include <stdbool.h>
int main(void) {
// Open file.
FILE *fp = fopen("testprog.in", "r");
if (fp == NULL) {
fprintf(stderr, "Cannot open input file\n");
return 1;
}
// Process character by character.
int numNewlines = 0;
while (true) {
// Get next character, stop if none left.
int ch = fgetc(fp);
if (ch == EOF) break;
// Output only second newline in a sequence of newlines,
// or any non-nwline.
if (numNewlines == 1 || ch != '\n') {
putchar(ch);
}
// Manage sequence information.
if (ch == '\n') {
++numNewlines;
} else {
numNewlines = 0;
}
}
// Finish up cleanly.
fclose(fp);
return 0;
}
(a) It's unclear from your question how you want to handle sequences of three or more newlines so I've had to make an assumption.
(b) Of course, you shouldn't use this if your intent is to learn, because:
You'll learn more if you try yourself and have to fix any issues.
Educational institutions will almost certainly check submitted code against a web search, and you'll probably be pinged for plagiarism.
I'm just providing it for completeness.

Lexical Analysis in C Language - How to make the asterisk be read and output it in detection of Multi-Line Comment?

I am working on a Lexical Analysis program , everything works fine when detecting a single line comment. This is my code for single line comment detection.
//Single Comment
if ((Current_Character == '/') && (fgetc(File_Input) == '/')){
printf("%c", Current_Character);
do{
printf ("%c", Current_Character);
Current_Character = fgetc (File_Input);
}while(Current_Character != '\n');
printf("\b \t | COMMENT\n", Current_Character);
i = -1;
Lexeme_Count++;
Comment_Count++;
}
But when i am trying to detect the Multi-Line comment it got a logical error which it cannot detect the opening asterisk. here is my code for Multi-Line comment detection:
//Multi-Line Comment
if((Current_Character == '/') && (fgetc(File_Input) == '*')){
printf ("%c", fgetc(File_Input));
do{
printf ("%c", Current_Character);
Current_Character = fgetc(File_Input);
}while(Current_Character != '/');
printf("\b | COMMENT\n", Current_Character);
i = -1;
Lexeme_Count++;
Comment_Count++;
}
Current character is for the first character for multi-line comment which is backslash and second character which is (fgetc(File_Input) (getting the next latest character from file) is for opening askterisk.
This is the content of the file I inputted:
#include <conio.h>
{
int a[3],t1,t2;
t1=2; a[0]=1; a[1]=2; a[t1]=3;
t2=
-
(a[2]+t1*6)/(a[2]
-
t1);
if t2>5 then
print(t2);
else {
int t3;
t3=99;
t2=
-
25;
print(
-
t1+t2*t3); // this is a comment on 2 lines
} endif /* THIS IS A MUTLI-LINE COMMENT ON 2 LINES
*/ }
This is my current output

You have:
if((Current_Character == '/') && (fgetc(File_Input) == '*')){
printf ("%c", fgetc(File_Input));
do{
printf ("%c", Current_Character);
Current_Character = fgetc(File_Input);
}while(Current_Character != '/');
The first printf() should be printing the character returned by the fgetc(), which you know to be a *, so you could use putchar('*'); or (if you really insist) printf("%c", '*') or printf("*").
Note that you've got another problem lurking:
x = a/b;
It isn't clear which of your comment blocks executes first, but both of them lose the b after the division. There are numerous other subtleties in comment detection in C — I won't bore you with them all, but suffice to say "it is hard work removing comments in C" (and harder still in C++). One of the issues you're not addressing is unexpected EOF (end of file).
You probably need a peek() function to look at the next character without consuming it:
int peek(FILE *fp)
{
int c = fgetc(fp);
if (c != EOF)
ungetc(c, fp);
return c;
}

this code snippet skips all the chars until you detect the */, considering the special cases where you have endings like ***/:
int state = 0;
while((c = getchar()) != EOF) {
switch(state) {
case 0:
switch(c) {
case '*': state = 1; continue;
default: /* process as comment char, but ignore */
continue;
} /* NOTREACHED */
case 1:
switch(c) {
case '*': continue;
case '/': /* end comment processing and return */
state = 0;
return COMMENT; /* or continue, depending on scanner */
default: /* any other char returns to state 0 */
state = 0;
/* process comment char */
continue;
} /* NOTREACHED */
} /* switch */
} /* while */

How to count amount of elements on each line of a file

Im trying to read in some data from a file (ultimately into a structure, but not important for now) and to ensure that this file has equal amount of data on each line. Each line can have words or numbers though so I've converted one line of a file into one big string. I then try and split this string into tokens using strtok and commas (which separate the data) as delimiters. But how can i count the amount of tokens that exist between commas.
I've tried to count the amount of commas on each line but for some reason it is not behaving as it i expect it to. Each line in the file has 5 pieces of data, all split by commas, so there should be 4 commas on each line.
while (fgets(string, sizeof(string), f)) {
input = fgetc(f);
if(input == ','){
i++;
}
else if (input == ' '){
printf("Error");
exit(0);
}
}
if(i % 4 != 0){
printf("Error");
exit(0);
}
Here i am trying to count the amount of commas on every line (and if theres a space on the file line, it should show an error, as I ONLY want commas dividing the data). Finally after fgets stops reading, I want to see if the "i" variable is a multiple of 4. Im sure there is a more efficient and user friendly way to do this though but I cant think of one.
Quick question as well: does the fgetc run through every character on the line before the rest of the commands continue, or as soon as a comma is encountered, my program will move on the next loop?
Thank you!

To count commas at each line of file you need to know exact line delimiter in your file. Then you may iterate over file until end-of-file reached and count commas within lines.
In following example i assume '\n' is a line delimiter.
#define DESIRED_COMMAS_COUNT 4
int commas_per_line = 0;
bool prev_is_comma = false;
int c;
while ((c = fgetc(f)) != EOF) //This expression reads one characters from file and checks for end-of-file condition
{
switch(c)
{
case ',':
commas_per_line++;
if (prev_is_comma)
{
printf("Two successive commas! Empty element in line\n");
exit(1);
}
prev_is_comma = true;
if (commas_per_line > DESIRED_COMMAS_COUNT)
{
printf("Error: too many commas at line. At least %d.\n", commas_per_line);
exit(1);
}
break;
case ' ':
printf("Error: space encountered!\n");
exit(1);
case '\n':
if (commas_per_line != DESIRED_COMMAS_COUNT)
{
printf("Error: too low commas (%d)", commas_per_line);
exit(1);
}
if (prev_is_comma)
{
printf("Line ends with comma: no last element in line\n");
exit(1);
}
commas_per_line = 0;
break;
default:
prev_is_comma = false;
}
}
if ((commas_per_line != DESIRED_COMMAS_COUNT) && //check commas count for last line in file
(commas_per_line != 0))
{
printf("Error: too low commas (%d)", commas_per_line);
exit(1);
}

Detecting and skipping line comments with Flex

How can I detect one line comments like // in Flex and skip those lines?
Also, for /* comments, will the following snippet be enough?
"/*" { comment(); }
%%
comment()
{
char c, c1;
loop:
while ((c = input()) != '*' && c != 0)
putchar(c);
if ((c1 = input()) != '/' && c != 0)
{
unput(c1);
goto loop;
}
if (c != 0)
putchar(c1);
}

Why don't you just use regular expressions to recognize the comments? The whole point of lex/flex is to save you from having to write lexical scanners by hand. The code you present should work (if you put the pattern /* at the beginning of the line), but it's a bit ugly, and it is not obvious that it will work.
Your question says that you want to skip comments, but the code you provide uses putchar() to print the comment, except for the /* at the beginning. Which is it that you want to do? If you want to echo the comments, you can use an ECHO action instead of doing nothing.
Here are the regular expressions:
Single line comment
This one is easy because in lex/flex, . won't match a newline. So the following will match from // to the end of the line, and then do nothing.
"//".* { /* DO NOTHING */ }
Multiline comment
This is a bit trickier, and the fact that * is a regular expression character as well as a key part of the comment marker makes the following regex a bit hard to read. I use [*] as a pattern which recognizes the character *; in flex/lex, you can use "*" instead. Use whichever you find more readable. Essentially, the regular expression matches sequences of characters ending with a (string of) * until it finds one where the next character is a /. In other words, it has the same logic as your C code.
[/][*][^*]*[*]+([^*/][^*]*[*]+)*[/] { /* DO NOTHING */ }
The above requires the terminating */; an unterminated comment will force the lexer to back up to the beginning of the comment and accept some other token, usually a / division operator. That's likely not what you want, but it's not easy to recover from an unterminated comment since there's no really good way to know where the comment should have ended. Consequently, I recommend adding an error rule:
[/][*][^*]*[*]+([^*/][^*]*[*]+)*[/] { /* DO NOTHING */ }
[/][*] { fatal_error("Unterminated comment"); }

For // you can read until you find the end of line \n or EOF, in case if the comment was at the end of file, for example:
static void
skip_single_line_comment(void)
{
int c;
/* Read until we find \n or EOF */
while((c = input()) != '\n' && c != EOF)
;
/* Maybe you want to place back EOF? */
if(c == EOF)
unput(c);
}
as for multiple lines comments /* */, you can read until you see * and peek the next character, if it's / this means this is the end of comment, if not just skip it with any other character. You shouldn't expect EOF, means unclosed comment:
static void
skip_multiple_line_comment(void)
{
int c;
for(;;)
{
switch(input())
{
/* We expect ending the comment first before EOF */
case EOF:
fprintf(stderr, "Error unclosed comment, expect */\n");
exit(-1);
goto done;
break;
/* Is it the end of comment? */
case '*':
if((c = input()) == '/')
goto done;
unput(c);
break;
default:
/* skip this character */
break;
}
}
done:
/* exit entry */ ;
}
Complete file:
%{
#include <stdio.h>
static void skip_single_line_comment(void);
static void skip_multiple_line_comment(void);
%}
%option noyywrap
%%
"//" { puts("short comment was skipped ");
skip_single_line_comment();}
"/*" { puts("long comment begins ");
skip_multiple_line_comment();
puts("long comment ends");}
" " { /* empty */ }
[\n|\r\n\t] { /* empty */ }
. { fprintf(stderr, "Tokenizing error: '%c'\n", *yytext);
yyterminate(); }
%%
static void
skip_single_line_comment(void)
{
int c;
/* Read until we find \n or EOF */
while((c = input()) != '\n' && c != EOF)
;
/* Maybe you want to place back EOF? */
if(c == EOF)
unput(c);
}
static void
skip_multiple_line_comment(void)
{
int c;
for(;;)
{
switch(input())
{
/* We expect ending the comment first before EOF */
case EOF:
fprintf(stderr, "Error unclosed comment, expect */\n");
exit(-1);
goto done;
break;
/* Is it the end of comment? */
case '*':
if((c = input()) == '/')
goto done;
unput(c);
break;
default:
/* skip this character */
break;
}
}
done:
/* exit entry */ ;
}
int main(int argc, char **argv)
{
yylex();
return 0;
}

To detect single line comments :
^"//" printf("This is a comment line\n");
This says any line which starts with // will be considered as comment line.
To detect multi line comments :
^"/*"[^*]*|[*]*"*/" printf("This is a Multiline Comment\n");
*
Explanation :
*
^"/*" This says beginning should be /*.
[^*]* includes all characters including \n but excludes *.
[*]* says 0 or more number of stars.
[^*]|[*]* - "or" operator is applied to get any string.
"*/" specifies */ as end.
This will work perfectly in lex.
Below is the complete code of lex file :
%{
#include <stdio.h>
int v=0;
%}
%%
^"//" printf("This is a comment line\n");
^"/*"[^*]*|[*]*"*/" printf("This is a Multiline Comment\n");
.|\n {}
%%
int yywrap()
{
return 1;
}
main()
{
yylex();
}

Not exiting from loop in c

I want the problem to recognize words in that order: XYZ1111*** no matter how many '1' or '*' but it must have at least one '1' and XYZ must be in that exact order and always be included for the string to be valid. It must read from a file that I have written a lot of these words such as XYZ1, XYZ1111*, 1111* and print ok if the word meets the restrictions. When I run my program it just takes the name of the file and then does nothing. Here's my code:
#include<stdio.h>
#include<stdlib.h>
int main(int argc,char *argv[]) {
FILE *input;
char c;
if(argc >2) {
printf("Too much arguments");
exit(1);
} else if(argc==2) {
input=fopen(argv[1],"r");
if(input==NULL) {
printf("Unable to find file ");
exit(1);
}
} else {
input=stdin;
c=fgetc(input);
while (!feof(input)) {
if (c=='x') {
int state=1;
while(1) {
switch(state) {
case 1:
c=fgetc(input);
if (c=='Y')
state=2;
break;
case 2:
c=fgetc(input);
if(c=='Z')
state=3;
break;
case 3:
c=fgetc(input);
if (c==1)
state=4;
break;
case 4:
if (c=='1')
state=4;
else if(c=='*')
state=5;
else if(c=='\n' || c=='\t' || c==' ')
printf("ok");
break;
case 5:
if (c=='*')
state=5;
else if(c=='\n' || c=='\t' || c==' ')
printf("ok");
break;
} // end of switch
} // end of while(1)
} // end of if(c=='x')
} // end of while(!feof(input))
} // end of else
printf("bgika");
} // end of main

I think your approach is wrong. As I have understood from your explanation, the string must start with "XYZ1" and after this you may have no matter how many 1 (just 1, not other characters). So, it would be very simple to check for the first part using strncmp, and then check if remaining characters are all 1.
c=fgetc(input);
if(strncmp(c, "XYZ1", 4) == 0){
//check if remaining characters are 1
}else{
//the string does not match
}
Also, while(!feof(input)) is discouraged: Why is “while ( !feof (file) )” always wrong?

while(1)
This is your problem.
Within this while loop, you have a switch statement full of break;s. These break;s will only get you out of the switch and will not get you out of the while loop. You have no way of getting out of this loop.
I have no earthly idea of what exactly the program is supposed to do (and I just spent 5 minutes fixing the formatting), so I can't currently make a recommended fix, but this is the problem.
Per #woolstar's comment, the inner while loop is unnecessary. The while(!feof(input)) can take care of repeatedly calling the switch statement. You will probably need to move the int state = 1; outside of this outer while loop however.