The exercise reads "Write a program to check a C program for rudimentary syntax errors like unbalanced parentheses, brackets, and braces. Don't forget about quotes, both single and double, escape sequences, and comments."
I chose to go about solving the problem by putting parentheses, brackets, and braces on a stack and making sure everything was LIFO along with various counters for marking whether we're in a comment, quote, etc.
The issue is that I feel my code, although it works, is poorly structured and not particularly idiomatic. I tried implementing the state variables (the stack, escaped, inString, etc.) within a struct and breaking apart the tests into subroutines. It didn't help much. Is there a way to solve this problem in a cleaner way while still handling escaped characters and the like correctly?
#include <stdio.h>
#include <stdlib.h>
#define INITIALSTACK 8
#define FALSE 0
#define TRUE 1
typedef struct {
int position;
int maxLength;
char* array;
} stack;
int match(char, char);
stack create();
void delete(stack*);
void push(stack*, char);
char pop(stack*);
int main() {
char c, out;
stack elemStack = create();
int escaped, inString, inChar, inComment, startComment, i, lineNum;
int returnValue;
escaped = inString = inChar = inComment = startComment = 0;
lineNum = 1;
while ((c = getchar()) != EOF) {
if (c == '\n')
lineNum++;
/* Test if in escaped state or for escape character */
if (escaped) {
escaped = FALSE;
}
else if (c == '\\') {
escaped = TRUE;
}
/* Test if currently in double/single quote or a comment */
else if (inString) {
if (c == '"' && !escaped) {
inString = FALSE;
}
}
else if (inChar) {
if (escaped)
escaped = FALSE;
else if (c == '\'' && !escaped) {
inChar = FALSE;
}
}
else if (inComment) {
if (c == '*')
startComment = TRUE;
else if (c == '/' && startComment)
inComment = FALSE;
else
startComment = FALSE;
}
/* Test if we should be starting a comment, quote, or escaped character */
else if (c == '*' && startComment)
inComment = TRUE;
else if (c == '/')
startComment = TRUE;
else if (c == '"') {
inString = TRUE;
}
else if (c == '\'') {
inChar = TRUE;
}
/* Accept the character and check braces on the stack */
else {
startComment = FALSE;
if (c == '(' || c == '[' || c == '{')
push(&elemStack, c);
else if (c == ')' || c == ']' || c == '}') {
out = pop(&elemStack);
if (out == -1 || !match(out, c)) {
printf("Syntax error on line %d: %c matched with %c\n.", lineNum, out, c);
return -1;
}
}
}
}
if (inString || inChar) {
printf("Syntax error: Quote not terminated by end of file.\n");
returnValue = -1;
}
else if (!elemStack.position) {
printf("Syntax check passed on %d line(s).\n", lineNum);
returnValue = 0;
}
else {
printf("Syntax error: Reached end of file with %d unmatched elements.\n ",
elemStack.position);
for(i = 0; i < elemStack.position; ++i)
printf(" %c", elemStack.array[i]);
printf("\n");
returnValue = -1;
}
delete(&elemStack);
return returnValue;
}
int match(char left, char right) {
return ((left == '{' && right == '}') ||
(left == '(' && right == ')') ||
(left == '[' && right == ']'));
}
stack create() {
stack newStack;
newStack.array = malloc(INITIALSTACK * sizeof(char));
newStack.maxLength = INITIALSTACK;
newStack.position = 0;
return newStack;
}
void delete(stack* stack) {
free(stack -> array);
stack -> array = NULL;
}
void push(stack* stack, char elem) {
if (stack -> position >= stack -> maxLength) {
char* newArray = malloc(2 * (stack -> maxLength) * sizeof(char));
int i;
for (i = 0; i < stack -> maxLength; ++i)
newArray[i] = stack -> array[i];
free(stack -> array);
stack -> array = newArray;
}
stack -> array[stack -> position] = elem;
(stack -> position)++;
}
char pop(stack* stack) {
if (!(stack -> position)) {
printf("Pop attempted on empty stack.\n");
return -1;
}
else {
(stack -> position)--;
return stack -> array[stack -> position];
}
}
Your solution is not that bad. It is very straight forward, which is a good thing. To learn a bit more from this excercise, I would probably implement this with a state machine. E.g. you have a few states like: code, comment, string etc.. then you define transitions between them. It gets much easier because you end up with logic depending on the state (so you don't have a blob of code, like in your main function). After that you can parse your code depending on the state. This means for example: If you're in a comment state, you ignore everything until you encounter an ending comment character. Then you change the state to code for example, and so forth.
In pseudo code it could look like this:
current_state = CODE
while(...) {
switch(current_state) {
case CODE:
if(input == COMMENT_START) {
current_state = COMMENT
break
}
if(input == STRING_START) {
current_state = STRING
break
}
// handle your {, [, ( stuff...
break
case COMMENT:
if(input == COMMENT_END) {
current_state = CODE
break
}
// handle comment.. i.e. ignore everything
break
case STRING:
// ... string stuff like above with state transitions..
break
}
}
Of course this can be done with e.g. yacc. But as I stated in a comment, I wouldn't suggest you use that. Maybe you could do that if you have enough time and want to learn as much as possible, but first I would implement it "the hard way".
I would probably approach this quite differently, by making use of a parser generator, like yacc, combined with a lexer generator, like lex.
You could base yourself on existing input files for these tools, for ANSI C. This lex specification and yacc grammar eg. can be a starting point. Alternatively, K&R contains a yacc compatible C grammar too in appendix A, or you could of course work directly with the grammar in the C standard.
For this exercise, you would only use those parts of the grammar that are of interest to you, and ignore the rest. The grammar will ensure that the syntax is correct (all braces matched etc.), and lex/yacc will take care of all the code generation. That leaves you with only having to specify some glue code, which will mostly be error messages in this case.
It will be a complete re-write of your code, but will probably give you a better understanding of the C grammar, and at the very least, you'll have learned to work with the great tools lex/yacc, which never hurts.
Related
I have a string for example "ABCDEFG.......", and I want to check if a certain character is in this string or not (the string also contains the newline character). I have the following code, but it doesn't seem to be working. Anyone have any better ideas?
Currently using the strchr to check if it comes out to be NULL, meaning the current char in the loop, is NOT present in the valid_characters variable.
bool check_bad_characters(FILE *inputFile)
{
int c;
char valid_characters[28] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ ";
while ((c = fgetc(inputFile)) != EOF) {
char charC = c + '0';
if (strchr(valid_characters, c) == NULL && strncmp(&charC, "\n", 1) != 0)
{
// This means that there was a character in the input file
// that is not valid.
return false;
}
}
return true;
}
Your code considers \n to be a valid character, so put it in the list of valid characters instead of handling it separately. Your routine can be simply:
bool check_bad_characters(FILE *inputFile)
{
int c;
while ((c = fgetc(inputFile)) != EOF)
if (!strchr("ABCDEFGHIJKLMNOPQRSTUVWXYZ \n", c))
return false;
return true;
}
Ok firstly I'm a total amateur on programming and i wanted to try something. I want to make a C program which will read a line and then if the characters are accepted to print "ACCEPTED" or "REJECTED" if the characters are valid or not.
So I've used a while loop and some if-else if to add the viable characters. The viable characters are the letters of the alphabet ',' '.' '/' '[' ']'. The problem is that after i type the whole line, it prints ACCEPTED and REJECTED for every character on the line. How can i get the program to read the whole line first and then print the result?
#include <stdio.h>
int main(void) {
char c;
c=getchar();
while(c!=EOF) {
while (c!='\n') {
if (c>='a' && c<='z') {
printf("OK!\n");
}
else if(c==','|| c=='.' ||c=='/') {
printf("OK!\n");
}
else if(c==']'||c=='[') {
printf("OK!\n");
}
else {
printf("ERROR!\n");
}
c=getchar();
}
c=getchar();
}
}
Sorry, my original answer did not seem to relate to your question. Skim reading fail.
Thank you for posting the code, it helps a lot when it comes to answering your question correctly.
Ignoring style for now, I would change your code in this way to make it print OK only when you finish parsing the entire line and it is exactly what #ScottMermelstein said but with code.
#include <stdio.h>
int main(void) {
int c; // This needs to be an int otherwise you won't recognize EOF correctly
int is_ok;
c=getchar();
while(c!=EOF) {
is_ok = 1; // Let's assume all characters will be correct for each line.
while (c!='\n') { // So long as we are in this loop we are on a single line
if (c>='a' && c<='z') {
// Do nothing (leave for clarity for now)
}
else if(c==','|| c=='.' ||c=='/') {
// Do nothing (leave for clarity for now)
}
else if(c==']'||c=='[') {
// Do nothing (leave for clarity for now)
}
else {
is_ok = 0; // Set is_ok to false and get out of the loop
break;
}
c=getchar();
}
if (is_ok) // Only print our result after we finished processing the line.
{
printf("OK!\n");
} else
{
printf("ERROR!\n");
}
c=getchar();
}
return 0; // If you declare main to return int, you should return an int...
}
However, I would recommend modularizing your code a little more. This will come with time and practice but you can write things in a way that is much easier to understand if you hide things away in appropriately named functions.
#include <stdio.h>
int is_valid_char(int c)
{
return (isalpha(c) || c == ',' || c == '.' || c == '/' || c == '[' || c == ']');
}
int main(void) {
int c;
int is_valid_line;
c=getchar();
while(c!=EOF) {
is_valid_line = 1;
while (c!='\n') {
if (!is_valid_char(c)) {
is_valid_line = 0; // Set is_valid_line to false on first invalid char
break; // and get out of the loop
}
c=getchar();
}
if (is_valid_line) // Only print our result after we finished processing the line.
{
printf("OK!\n");
} else
{
printf("ERROR!\n");
}
c=getchar();
}
return 0;
}
You can use scanf and putting a space before the format specifier %c to ignore white-space.
char ch;
scanf(" %c", &ch);
This might be what you are looking for?
Read a line and process good/bad chars and print either OK or Error.
#include <stdio.h>
int main ( void )
{
char buff[1000];
char *p = buff ;
char c ;
int flgError= 0 ; // Assume no errors
gets( buff ) ;
printf("You entered '%s'\n", buff );
while ( *p ) // use pointer to scan through each char of line entered
{
c=*p++ ; // get char and point to next one
if ( // Your OK conditions
(c>='a' && c<='z')
|| (c>='A' && c<='Z') // probably want upper case letter to be OK
|| (c==','|| c=='.' ||c=='/')
|| (c==']'||c=='[')
|| (c=='\n' ) // assume linefeed OK
)
{
// nothing to do since these are OK
}
else
{
printf ("bad char=%c\n",c);
flgError = 1; // 1 or more bad chars
}
}
if ( flgError )
printf ( "Error\n" );
else
printf ( "OK\n" );
}
I have written the following program to answer Kernighan and Ritchies ch1 problem 12.
The issue is that I have never really understood how to properly use functions and would like to know why the one I wrote into this program, getcharc(), does not work?
What are good resources that explain correct function usage. Where? and How?
I know the optimal solution to this problem from Richard Heathfield's site (which uses || or, rather than nested while statements, which I have used), however I would like to know how to make my program work properly:
#include <stdio.h>
int getcharc ();
// Exercise 1-12
// Copy input to output, one word per line
// words deleniated by tab, backspace, \ and space
int main()
{
int c;
while ((c = getchar()) != EOF) {
while ( c == '\t') {
getcharc(c);
}
while ( c == '\b') {
getcharc(c);
}
while ( c == '\\') {
getcharc(c);
}
while ( c == ' ') {
getcharc(c);
}
putchar(c);
}
}
int getcharc ()
{
int c;
c = getchar();
printf("\n");
return 0;
}
The original program (and I know it has bugs), without the function was:
#include <stdio.h>
// Exercise 1-12
// Copy input to output, one word per line
// words deleniated by tab, backspace, \ and space
int main()
{
int c;
while ((c = getchar()) != EOF) {
while ( c == '\t') {
c = getchar();
printf("\n");
}
while ( c == '\b') {
c = getchar();
printf("\n");
}
while ( c == '\\') {
c = getchar();
printf("\n");
}
while ( c == ' ') {
c = getchar();
printf("\n");
}
putchar(c);
}
}
So all I am trying to do with the function is to stop
c = getchar();
printf("\n");
being repeated every time.
What, exactly, is this getcharc() function supposed to do? What it does, is read a character from input, print a newline, and return zero. The character just read from input is discarded, because you didn't do anything with it. When it's called, the return value is ignored as well. In each of the places where it is called, you're calling it in an infinite loop, because there's no provision made for changing the loop control variable.
Perhaps you were intending something like c = getcharc(), but that wouldn't really help because you aren't returning c from the function, anyway. (Well, it would help with the "infinite loop" part, anyway.)
What's the point of this function anyway? If you just use getchar() correctly in its place, it looks like you'd have your solution, barring a few other bugs.
One of the possible solution is, change prototype for your function to int getcharc (int c, int flag).
Now your code after some modification;
#include <stdio.h>
int getcharc (int c, int flag);
// Exercise 1-12
// Copy input to output, one word per line
// words deleniated by tab, backspace, \ and space
int main()
{
int c;
int flag = 0; //to keep track of repeated newline chars.
while ((c = getchar()) != '\n') {
flag = getcharc(c, flag); // call getcharc() for each char in the input string. Testing for newline and printing of chars be done in the getcharc() function
}
return 0;
}
int getcharc (int c, int flag)
{
if( (c == ' ' || c == '\t' || c == '\b' || c== '\\') && flag == 0)
{
printf("\n");
flag = 1;
}
else
{
if(c != ' ' && c != '\t' && c != '\b' && c!= '\\')
{
putchar(c);
flag = 0;
}
}
return flag;
}
EDIT:
but I wanted to keep the nested while statements rather than using || or
Your nested while loop is executing only once for each character as grtchar() reads one character at one time. No need of nested loops here! You can check it by replacing while to if and your code will give the same output for a given string. See the output here.
know the optimal solution to this problem from Richard Heathfield's site (which uses || or, rather than nested while statements, which I have used), however I would like to know how to make my program work properly:
You make your program work to some extent (with your bugs) by adding an if condition and a break statement as;
#include <stdio.h>
int getcharc (int c);
int main()
{
int c;
while ((c = getchar()) != '\n') {
while ( c == '\t') {
c = getcharc(c);
if(c != '\t')
break;
}
....
....
while ( c == ' ') {
c = getcharc(c);
if(c != ' ')
break;
}
putchar(c);
}
return 0;
}
int getcharc (int c)
{
c = getchar();
printf("\n");
return c;
}
// compiled by my brain muhahaha
#include <stdio.h>
int getcharc(); // we prototype getcharc without an argument
int main()
{
int c; // we declare c
// read character from stdio, if end of file quit, store read character in c
while ((c = getchar()) != EOF) {
// if c is tab \t call function getcharc() until forever since c never changes
while ( c == '\t') {
getcharc(c); // we call function getcharc with an argument
// however getcharc doesn't take an argument according to the prototype
}
// if c is \b call function getcharc() until forever since c never changes
while ( c == '\b') {
getcharc(c);
}
// if c is \\ call function getcharc() until forever since c never changes
while ( c == '\\') {
getcharc(c);
}
// if c is ' ' call function getcharc() until forever since c never changes
while ( c == ' ') {
getcharc(c);
}
// since we never will get here but if we happened to get here by some
// strange influence of some rare cosmic phenomena print out c
putchar(c);
}
}
// getcharc doesn't take an argument
int getcharc ()
{
int c; // we declare another c
c = getchar(); // we read from the keyboard a character
printf("\n"); // we print a newline
return 0; // we return 0 which anyway will never be read by anyone
}
maybe you are getting confused with the old K&R
nowadays when you write a function argument you specify it like
int getcharch(int c)
{
...
}
Consider, this message:
N,8545,01/02/2011 09:15:01.815,"RASTA OPTSTK 24FEB2011 1,150.00 CE",S,8.80,250,0.00,0
This is just a sample. The idea is, this is one of the rows in a csv file. Now, if I am to break it into commas, then there will be a problem with 1150 figure.
The string inside the double quotes is of variable length, but can be ascertained as one "element"(if I may use the term)
The other elements are the ones separated by ,
How do I parse it? (other than Ragel parsing engine)
Soham
Break the string into fields separated by commas provided that the commas are not embedded in quoted strings.
A quick way to do this is to use a state machine.
boolean inQuote = false;
StringBuffer buffer= new StringBuffer();
// readchar() is to be implemented however you read a char
while ((char = readchar()) != -1) {
switch (char) {
case ',':
if (inQuote == false) {
// store the field in our parsedLine object for later processing.
parsedLine.addField(buffer.toString());
buffer.setLength(0);
}
break;
case '"':
inQuote = !inQuote;
// fall through to next target is deliberate.
default:
buffer.append(char);
}
}
Note that while this provides an example, there is a bit more to CSV files which would have to be accounted for (like embedded quotes within quotes, or whether it is appropriate to strip outer quotes in your example).
A quick and dirty solution if you don't want to add external libraries would be converting the double quotes to \0 (the end of string marker), then parsing the three strings separately using sscanf. Ugly but should work.
Assuming the input is well-formed (otherwise you'll have to add error handling):
for (i=0; str[i]; i++)
if (str[i] == '"') str[i] = 0;
str += sscanf(str, "%c,%d,%d/%d/%d %d:%d:%d.%d,", &var1, &var2, ..., &var9);
var10 = str; // it may be str+1, I don't remember if sscanf consumes also the \0
sscanf(str+strlen(var10), ",%c,%f,%d,%f,%d", &var11, &var12, ..., &var15);
You will obviously have to make a copy of var10 if you want to free str immediately.
This is a function to get the next single CSV field from an input file supplied as a FILE *. It expects the file to be opened in text mode, and supports quoted fields with embedded quotes and newlines. Fields longer than the size of the supplied buffer are truncated.
int get_csv_field(FILE *f, char *buf, size_t size)
{
char *p = buf;
int c;
enum { QS_UNQUOTED, QS_QUOTED, QS_GOTQUOTE } quotestate = QS_UNQUOTED;
if (size < 1)
return EOF;
while ((c = getc(f)) != EOF)
{
if ((c == '\n' || c == ',') && quotestate != QS_QUOTED)
break;
if (c == '"')
{
if (quotestate == QS_UNQUOTED)
{
quotestate = QS_QUOTED;
continue;
}
if (quotestate == QS_QUOTED)
{
quotestate = QS_GOTQUOTE;
continue;
}
if (quotestate == QS_GOTQUOTE)
{
quotestate = QS_QUOTED;
}
}
if (quotestate == QS_GOTQUOTE)
{
quotestate = QS_UNQUOTED;
}
if (size > 1)
{
*p++ = c;
size--;
}
}
*p = '\0';
return c;
}
How about libcsv from our very own Robert Gamble?
Here was my original code:
#include <stdio.h>
#define IN 1 // inside a word
#define OUT 0 // outside a word
// program to print input one word per line
int main(void)
{
int c, state;
state = OUT;
while ((c = getchar()) != EOF) {
if (c == ' ' || c == '\n' || c == '\t') {
state = OUT;
printf("\n");
}
else if (state == OUT) {
state = IN;
}
if (state == IN) {
putchar(c);
}
}
return 0;
}
But the problem was if there were multiple blanks (spaces) or multiple tabs next to each other a newline would be printed for both. So I used a variable (last) to keep track of where I was:
#include <stdio.h>
#define IN 1 // inside a word
#define OUT 0 // outside a word
// program to print input one word per line, corrected bug if there was
// more than one space between words to only print one \n
int main(void)
{
int c, last, state;
last = EOF;
state = OUT;
while ((c = getchar()) != EOF) {
if (c == ' ' || c == '\n' || c == '\t') {
if (last != c) {
state = OUT;
printf("\n");
}
}
else if (state == OUT) {
state = IN;
}
if (state == IN) {
putchar(c);
}
last = c;
}
return 0;
}
That solved it, except now if there is [blank][tab] next to each other, a newline gets printed for both.
Could someone please help?
Your problem with your original code is that you will output your newline for every whitespace character. You only want to do it when transitioning from word to non-word:
Change:
if (c == ' ' || c == '\n' || c == '\t') {
state = OUT;
printf("\n");
}
to:
if (c == ' ' || c == '\n' || c == '\t') {
if (state == IN) printf("\n");
state = OUT;
}
In fact, what I originally thought I'd suggest would be an enumeration for the states along the lines of:
enum eState {IN, OUT};
:
enum eState state = OUT;
but, for a simple finite state machine with only two states, you can just use an boolean:
#include <stdio.h>
#define FALSE (1==0)
#define TRUE (1==1)
// Or: enum eBoolean {FALSE = 0, TRUE = 1};
int main (void) {
int ch;
int inWord = FALSE; // Or: enum eBoolean inWord = FALSE;
// Process every character.
while ((ch = getchar()) != EOF) {
// Check for whitespace.
if (ch == ' ' || ch == '\n' || ch == '\t') {
// Check if transitioning nonwhite to white.
if (inWord) {
printf("\n");
}
// Mark white no matter what.
inWord = FALSE;
} else {
// Mark non whitespace.
inWord = TRUE;
}
// If not whitespace, output character.
if (inWord) {
putchar(ch);
}
}
return 0;
}
As paxdiablo said, your program is a typical finite state automata (FSA). You have to print a new line in transitions from state OUT to state IN and only then.
Below is how I would write such code. In this particular case it can be made simpler, but the structure is interesting because typical and it applies to any FSA. You have a big external switch with a case for each state. Inside each case, you get another one that materialize transitions, here transition event are input characters. All is left to do is think about what should be done for each transition. Also this structure is quite efficient.
You should keep it in mind, it's really a very common one to have in your toolkit of pre-thought program structures. I certainly do it.
#include <stdio.h>
#define IN 1 // inside a word
#define OUT 0 // outside a word
// program to print input one word per line
int main(void)
{
int c, state;
state = OUT;
while ((c = getchar()) != EOF) {
switch (state){
case OUT:
switch (c){
case ' ': case '\n': case '\t':
break;
default:
putchar(c);
state = IN;
}
break;
case IN:
switch (c){
case ' ': case '\n': case '\t':
putchar('\n');
state = OUT;
break;
default:
putchar(c);
}
break;
}
}
return 0;
}
See when you check in your second code
if (last != c) {
You are not checking for all conditions.last could be equal to space, tab or new line. In all such cases it should not print new line. Lets call the set of these three special characters as X.
Now when printing new line, you need to make sure that last character printed does not bring to set X. But you check that last!=current. Now current could be space, tab or new line. But it is only one value. It does not serve our need, our purpose.
So instead replace it with
if (last != ' ' && last != '\n' && last != '\t' ) {
You can see the code here:
#include <stdio.h>
#define IN 1 // inside a word
#define OUT 0 // outside a word
// program to print input one word per line, corrected bug if there was
// more than one space between words to only print one \n
int main(void)
{
int c, last, state;
last = 0; // We need it to make sure that a newline is not printed in case first
// char is space, tab or new line.
state = OUT;
while ((c = getchar()) != EOF) {
if (c == ' ' || c == '\n' || c == '\t') {
// if (last != c)
if (last != ' ' && last != '\n' && last != '\t' && last != 0 )
{
state = OUT;
printf("\n");
}
} else if (state == OUT) {
state = IN;
}
if (state == IN) {
putchar(c);
}
last = c;
}
return 0;
}
Edit
Fixed the bug paxdiablo pointed out in comments.
#include<stdio.h>
#define OFF 0
#define ON 1
main()
{
int c,state=ON;
while((c=getchar())!=EOF)
{
if(c=='\n'||c==' '||c=='\t')
{
if(state==OFF)putchar('\n');
state=ON;
}
else if(state==ON)
{
putchar(c);
state=OFF;
}
else if(state==OFF)
{
putchar(c);
}
}
}
Here's one way of approaching the problem, which was used above:
Where, STE=Space, tab or enter.
<STE><WORD>---->TYPE<WORD>
<STE><STE>----->DO NOTHING
<WORD><SPACE>-->TYPE<WORD><ENTER/NEWLINE>
<WORD><WORD>--->TYPE<WORD>
You can replace and with ON and OFF, as illustrated above.