C Program won't remove comments that take up the whole line - c

So I'm working through the K&R C book and there was a bug in my code that I simply cannot figure out.
The program is supposed to remove all the comments from a C program. Obviously I'm just using stdin
#include <stdio.h>
int getaline (char s[], int lim);
#define MAXLINE 1000 //maximum number of characters to put into string[]
#define OUTOFCOMMENT 0
#define INASINGLECOMMENT 1
#define INMULTICOMMENT 2
int main(void)
{
int i;
int isInComment;
char string[MAXLINE];
getaline(string, MAXLINE);
for (i = 0; string[i] != EOF; ++i) {
//finds whether loop is in a comment or not
if (string[i] == '/') {
if (string[i+1] == '/')
isInComment = INASINGLECOMMENT;
if (string[i+1] == '*')
isInComment = INMULTICOMMENT;
}
//fixes the problem of print messing up after the comment
if (isInComment == INASINGLECOMMENT && string[i] == '\0')
printf("\n");
//if the line is done, restates all the variables
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
//prints current character in loop
if(isInComment == OUTOFCOMMENT && string[i] != EOF)
printf("%c", string[i]);
//checks to see of multiline comment is over
if(string[i] == '*' && string[i+1] == '/' ) {
++i;
isInComment = OUTOFCOMMENT;
}
}
return 0;
}
So this works great except for one problem. Whenever a line starts with a comment, it prints that comment.
So for instance, if I had a line that was simply
//this is a comment
without anything before the comment begins, it will print that comment even though it's not supposed to.
I thought I was making good progress, but this bug has really been holding me up. I hope this isn't some super easy thing I've missed.
EDIT: Forget the getaline function
//puts line into s[], returns length of that line
int getaline(char s[], int lim)
{
int c, i;
for (i = 0; i < lim-1 && (c = getchar()) != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}

There are many problems in your code:
isInComment is not initialized in function main.
as pointed by others, string[i] != EOF is wrong. You need to test for end of file more precisely, especially for files that do not end with a linefeed. This test only works if char type is signed and EOF is a valid signed char value. It will nonetheless mistakenly stop on a stray \377 character, which is legal in a string or in a comment.
When you detect the end of line, you read another line and reset i to 0, but i will be incremented by the for loop before you test again for single line comment... hence the bug!
You do not handle special cases such as /* // */ or // /*
You do not handle strings. This is not a comment: "/*", nor this: '//'
You do not handle \ at end of line (escaped linefeed). This can be used to extend single line comments, strings, etc. There are more subtle cases related to \ handling and if you really want completeness, you should handle trigraphs too.
Your implementation has a limit for line size, this is not needed.
The problem you are assigned is a bit tricky. Instead of reading and parsing lines, read one character at a time and implement a state machine to parse escaped linefeeds, strings, and both comment styles. The code is not too difficult if you do it right with this method.

if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
When you start a new line, you initialize i to 0. But then in the next iteration:
for (i = 0; string[i] != EOF; ++i)
i will be incremented, so you'll begin the new line with index 1. Therefore there is a bug when the line begins with //.
You can see that it solves the problem if you write instead:
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
though it's usually considered as bad style to modify for loop indices inside the loop. You may redesign your implementation in a more readable way.

Related

Kernighan and Ritchie C exercise 1-16

I tried to implement a solution for the exercise on the C language of K&R's book. I wanted to ask here if this could be considered a legal "solution", just modifying the main without changing things inside external functions.
Revise the main routine of the longest-line program so it will
correctly print the length of arbitrary long input lines, and as much
as possible of the text.
#include <stdio.h>
#define MAXLINE 2 ////
int get_line1(char s[], int lim)
{
int c, i;
for (i = 0; i < lim - 1 && ((c = getchar()) != EOF) && c != '\n'; i++) {
s[i] = c;
}
if (c == '\n') {
s[i] = c;
i++;
}
s[i] = '\0';
return i;
}
int main()
{
int len;
int max = MAXLINE;
char line[MAXLINE];
int tot = 0;
int text_l = 0;
while ((len = get_line1(line, max)) > 0) {
if (line[len - 1] != '\n') {
tot = tot + len;
}
if (line[1] == '\n' || line[0] == '\n') {
printf("%d\n", tot + 1);
text_l = text_l + (tot + 1);
tot = 0;
}
}
printf("%d\n", text_l);
}
The idea is to set the max lenght of the string considered for the array line ad 2.
For a string as abcdef\n , the array line will be ab. Since the last element of the array is not \n (thus the line we are considering is not over), we save the length up until now and repeat the cycle. We will get then the array made of cd, then ef and at the end we will get the array of just \n. Then the else if condition is executed, since the first element of this array is\n, and we print the tot length obtained from the previous additions. We add +1 in order to also consider the new character \n. This works also for odd strings: with abcdefg\n the process will go on up until we reach g\n and the sum is done correctly.
Outside the loop then we print the total amount of text.
Is this a correct way to do the exercise?
The exercise says to “Revise the main routine,” but you altered the definition of MAXLINE, which is outside of main, so that is not a valid solution.
Also, your code does not have the copy or getline routines of the original. Your get_line1 appears to be identical except for the name. However, a correction solution would use identical source code except for the code inside main.
Additionally, the exercise says to print “as much as possible of the text.” That is unclearly stated, but I expect it means to keep a buffer of MAXLINE characters (with MAXLINE at its original value of 1000) and use it to print the first MAXLINE−1 characters of the longest line.

C programming language excercise1-18

Hi guys i am reading The C programming language by Kernighan and Ritchie and trying to solve along the questions. This program is supposed to eliminate all the blanks and tabs from the end of the input lines but its not working.
I have added comments to the section of code i am having problems with
#include <stdio.h>
#define MAXLINE 1000
int getline(char line[], int max);
void copy(char to[], char frm[]);
int main(void) {
int maxLen=0;
int currLen;
char line[MAXLINE];
char longestLine[MAXLINE];
while((currLen=getline(line,MAXLINE)) > 0){
if(currLen > maxLen){
maxLen = currLen;
copy(longestLine, line);
}
}
if(maxLen>0){
printf("%d\n", maxLen);
printf("%s", longestLine);
}
return 0;
}
int getline(char line[], int max){
int c,i;
for(i=0;i<max-1 && (c=getchar())!=EOF && c!='\n';i++){
line[i] = c;
}
if(c == '\n'){
/* This is the logic i added to take care of the trailing
spaces and tabs. Rest of the program is same as the book
Its still counting spaces at the end of line when variable maxLen is
printed in the main function, would like to know if the logic is
erronous*/
while(line[i] == ' ' || line[i] == '\t'){
i--;
}
line[i]=c;
++i;
}
line[i] = '\0';
return i;
}
void copy(char to[], char from[]){
int i=0;
while((to[i]=from[i])!='\0'){
++i;
}
}
As far as i can tell, your program is returning the longest line entered, and does not eliminate blanks or tabs.
you are welcome to see the solution the link below:
solution to 1-18
I hope this helped
good luck
First, let's take a look at your output:
if(maxLen>0){
printf("%d\n", maxLen);
printf("%s", longestLine);
}
This part isn't in any loop and does not contain any loop, so it prints just one line. However, the program should print all the lines. Therefore, your code that prints lines should be in a loop similar to the code that gets a line. They could be in the same loop, like this:
while there is text left
get line
removing the spaces
print line
end loop
Or in different loops. Actually, I think you misunderstood the exercise, because it's printing the longest one, and that cleary seems to be your intention going by your code.
Deleting the whitespace
It seems that this is where you're trying to delete the whitespace:
while(line[i] == ' ' || line[i] == '\t'){
i--;
}
Note that the i you're using here is not the i from this line:
for(i=0;i<max-1 && (c=getchar())!=EOF && c!='\n';i++){
Instead, it's the i from this line:
int c,i;
This is because in for(i=0; ... you're defining another i so that one is in scope for the loop. The first i isn't touched in that loop and is still uninitialized when you try to delete the whitespace. So line[i] is undefined behavior and will most likely just crash your program. And even if it doesn't, it will just read a rubbish value, see if it coincidentally translates to ' ' or '\t' and then either not do anything, or decrease i and try again.
My suggestion to you is that you should try to solve the exercise incrementally. The task can be split in multiple parts:
Removing trailing whitespace from a line
Reading in the input
Do one after another. First, you write a program that takes one hardcoded line and removes the trailing whitespace and prints it. Then, when that works, you modify it to work with not one line, but with all the input lines. This way you don't have multiple parts that aren't working, but instead you can focus on doing one part at a time. It's easier to write software this way.
Thanks for your help guys. Problem was with the value of i in function getLine. Here is the solution:
int getline(char line[], int max){
int c,i;
for(i=0;i<max-1 && (c=getchar())!=EOF && c!='\n';i++){
line[i] = c;
}
i--; /* This line is the fix, value of i is incremented before termination of the
loop, so decrement it to point to the last character in the array */
while(line[i] == ' ' || line[i] == '\b'){
i--;
}
if(c == '\n'){
line[i]=c;
++i;
}
line[i] = '\0';
return i;
}

K&R exercise 1-18 gives segmentation fault when EOF is encountered

I've been stuck at this problem:
Write a Program to remove the trailing blanks and tabs from each input line and to delete entirely blank lines.
for the last couple of hours, it seems that I can not get it to work properly.
#include<stdio.h>
#define MAXLINE 1000
int mgetline(char line[],int lim);
int removetrail(char rline[]);
//==================================================================
int main(void)
{
int len;
char line[MAXLINE];
while((len=mgetline(line,MAXLINE))>0)
if(removetrail(line) > 0)
printf("%s",line);
return 0;
}
//==================================================================
int mgetline(char s[],int lim)
{
int i,c;
for(i = 0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if( c == '\n')
{
s[i]=c;
++i;
}
s[i]='\0';
return i;
}
/* To remove Trailing Blanks,tabs. Go to End and proceed backwards removing */
int removetrail(char s[])
{
int i;
for(i = 0; s[i] != '\n'; ++i)
;
--i; /* To consider raw line without \n */
for(i > 0; ((s[i] == ' ') || (s[i] == '\t')); --i)
; /* Removing the Trailing Blanks and Tab Spaces */
if( i >= 0) /* Non Empty Line */
{
++i;
s[i] = '\n';
++i;
s[i] = '\0';
}
return i;
}
I am using gedit text editor in debian.
Anyway when I type text into the terminal and hit enter it just copies the whole line down, and if I type text with blanks and tabs and I press EOF(ctrl+D) I get the segmentation fault.
I guess the program is running out of memory and/or using memory 'blocks' out of its array, I am still really new to all of this.
Any kind of help is appreciated, thanks in advance.
P.S.: I tried using both the code from the solutions book and code from random sites on internet but both of them give me the segmentation fault message when EOF is encountered.
It's easy:
The mgetline returns the buffer filled with data you enter in two cases:
when new line char is encountered
when EOF is encountered.
In first case it put the new line char into the buffer, in the latter - it does not.
Then you pass the buffer to removetrail function that first tries to find the newline char:
for(i=0; s[i]!='\n'; ++i)
;
But there is no new line char when you hit Ctrl-D! Thus you get memory access exception as you pass over the mapped memory.

Can't assign value to a variable inside a for loop

Here is what I want to do:
Read all characters from a '.c' file and store that into an array.
When a character from that array is '{', it will be pushed into a stack. And count of pushed characters will be increased by 1.
When a character from that array is '}', stack will pop and the count of popped characters will be increased by 1.
Compare those two counts to check whether there is a missing '{' or '}'
Here is my code:
int getLinesSyntax(char s[], int limit, FILE *cfile)
{
int i, c, push_count = 0, pop_count = 0;
int state = CODE;
int brackets[limit];
char braces[limit];
for(i = 0; i < 100; i++)
{
braces[i] = 0;
}
for(i = 0; i < limit - 1 && (c = getc(cfile)) != EOF && c != '\n'; i++)
{
s[i] = c;
if(s[i] == '{')
{
braces[0] = s[i];
//push(s[i], braces);
++push_count;
}
else if(s[i] == '}')
{
pop(braces);
++pop_count;
}
}
//Mor shiljih uyed array -n togsgold 0-g zalgana
if(c == '\n')
{
s[i] = c;
i++;
}
s[i] = '\0';
i = i -1; //Suuld zalgasan 0 -g toonoos hasna
if(c == EOF)
{
//just checking
for(i = 0; i < 100; i++)
{
printf("%d", braces[i]);
}
if(push_count != pop_count)
{
printf("%d and %d syntax error: braces", push_count, pop_count);
}
return -1;
}
else
{
return i;
}
}
Here is the output
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
The problems is:
Assignments inside the for loop is not working. (It's working when I put that outside of the loop)
I would like to know if there's something wrong with my code :).
There are several problems.
Lets go through it step by step
1) Your array initialization loop:
int brackets[limit];
char braces[limit];
for(i = 0; i < 100; i++)
{
braces[i] = 0;
}
You declare the array having size of limit but only initialize 100 items. Change 100 to limit to fully initialize it depending on the parameter of the function.
2) The conditional statement of the main for loop:
i < limit - 1 && (c = getc(cfile)) != EOF && c != '\n'
Although the first substatement is correct I have two remarks:
Firstly (c = getc(cfile)) != EOF might be one reason why the loop is never accessed and still everything is 000000.... Check if the file exists, the pointer is not NULL or other silent errors occured.
Secondly the c != '\n'. What if one of these characters occurs? In this case you won't continue with the next iteration but break out of the entire forloop. Remove it there and put it in the first line of the body like this:
if(c == '\n')
{
i -= 1; // to really skip the character and maintain the index.
continue;
}
3) s[i] = c;
Can you be certain, that the array is indeed sizeof limit?
4) Checking for curly braces
if(s[i] == '{')
{
braces[0] = s[i];
//push(s[i], braces);
++push_count;
}
else if(s[i] == '}')
{
pop(braces);
++pop_count;
}
You assign to braces[0] always, why?
5) Uninitialized access
if(c == '\n')
{
s[i] = c;
i++;
}
s[i] = '\0';
i = i -1; //Suuld zalgasan 0 -g toonoos hasna
You're now using the function-global variable i, which is never initialized properly for this block. What you do is to use a variable that is used basically everywhere ( which is basically also no problem from the memory point of view. ), but you rely on legacy values. Is this done by purpose? If no, reinitialize i properly. I have to ask this since i can't read your comments in code.
What I'm quite unhappy about is that you entirely rely on one variable in all the loops and statements. Usually a loop-index should never be altered from inside. Maybe you can come up with a cleaner design of the function like an additional index variable you parallelly increase without altering i. The additional index will be used for array access where appropriate whereas i really remains just a counter.
I think the problem is in this condition "c != '\n'" which is breaking the for loop right after the first line, before it reaches any brackets. And hence the output.
For the task of counting whether there are balanced braces in the data, the code is excessively complex. You could simply use:
int l_brace = 0;
int r_brace = 0;
int c;
while ((c = getchar()) != EOF)
{
if (c == '{')
l_brace++;
else if (c == '}')
r_brace++;
}
if (l_brace != r_brace)
printf("Number of { = %d; number of } = %d\n", l_brace, r_brace);
Of course, this can be confused by code such as:
/* This is a comment with an { in it */
char string[] = "{{{";
char c = '{';
There are no braces that mark control-of-flow statement grouping in that fragment, for all there are 5 left braces ({) in the source code. Parsing C properly is hard work.

Newbie C Array question over functions

I bought a C book called "The C (ANSI C) PROGRAMMING LANGUAGE" to try and teach myself, well C. Anyhow, the book includes a lot of examples and practices to follow across the chapters, which is nice.
Anyhow, the code below is my answer to the books "count the longest line type of program", the authors are using a for-loop in the function getLine(char s[], int lim). Which allows for a proper display of the string line inside the main() function. However using while won't work - for a reason that is for me unknown, perhaps someone might shed a light on the situation to what my error is.
EDIT: To summarize the above. printf("%s\n", line); won't display anything.
Thankful for any help.
#include <stdio.h>
#define MAXLINE 1024
getLine(char s[], int lim) {
int c, i = 0;
while((c = getchar()) != EOF && c != '\n' && i < lim) {
s[++i] = c;
}
if(c == '\n' && i != 0) {
s[++i] = c;
s[++i] = '\0';
}
return i;
}
main(void) {
int max = 0, len;
char line[MAXLINE], longest[MAXLINE];
while((len = getLine(line,MAXLINE)) > 0) {
if(len > max) {
max = len;
printf("%s\n", line);
}
}
return 0;
}
You have a number of serious bugs. Here's the ones I found and how to fix them.
change your code to postincrement i to avoid leaving the first array member uninitialised, and to avoid double printing the final character:
s[++i] = c;
...
s[++i] = c;
s[++i] = '\0';
to
s[i++] = c;
...
// s[++i] = c; see below
...
s[i++] = '\0';
and fix your EOF bug:
if(c == '\n' && i != 0) {
s[++i] = c;
s[++i] = '\0';
}
to
if(c == '\n')
{
s[i++] = '\n';
}
s[i] = '\0'
Theory
When writing programs that deal with strings, arrays or other vector-type structures it is vitally important that you check the logic of your program. You should do this by hand, and run a few sample cases through it, providing sample inputs to your program and thinking out what happens.
The cases you need to run through it are:
a couple general cases
all the edge cases
In this case, your edge cases are:
first character ever is EOF
first character is 'x', second character ever is EOF
first character is '\n', second character is EOF
first character is 'x', second character is '\n', third character is EOF
a line has equal to lim characters
a line has one less than lim characters
a line has one more than lim characters
Sample edge case
first character is 'x', second character is '\n', third character is EOF
getLine(line[MAXLINE],MAXLINE])
(s := line[MAXLINE] = '!!!!!!!!!!!!!!!!!!!!!!!!...'
c := undef, i := 0
while(...)
c := 'x'
i := 1
s[1] := 'x' => s == '!x!!!!...' <- first bug found
while(...)
c := '\n'
end of while(...)
if (...)
(c== '\n' (T) && i != 0 (T)) = T
i := i + 1 = 2
s[2] = '\n' => s == '!x\n!!!!'
i := i + 1 = 3
s[3] = '\0' => s == '!x\n\0!!!' <- good, it's terminated
return i = 3
(len = i = 3) > 0) = T (the while eval)
if (...)
len (i = 3) > max = F
max = 3 <- does this make sense? is '!x\n' a line 3 chars long? perhaps. why did we keep the '\n' character? this is likely to be another bug.
printf("%s\n", line) <- oh, we are adding ANOTHER \n character? it was definitely a bug.
outputs "!x\n\n\0" <- oh, I expected it to print "x\n". we know why it didn't.
while(...)
getLine(...)
(s := line[MAXLINE] = '!x\n\0!!!!!!!!!!!!!!!!!!!...' ; <- oh, that's fun.
c := undef, i := 0
while(...)
c := EOF
while terminates without executing body
(c == '\n' && i != 0) = F
if body not executed
return i = 0
(len = i = 0 > 0) = F
while terminates
program stops.
So you see this simple process, that can be done in your head or (preferably) on paper, can show you in a matter of minutes whether your program will work or not.
By following through the other edge cases and a couple general cases you will discover the other problems in your program.
It's not clear from your question exactly what problem you're having with getLine (compile error? runtime error?), but there are a couple of bugs in your implementation. Instead of
s[++i] = something;
You should be using the postfix operator:
s[i++] = something;
The difference is that the first version stores 'something' at the index of (i+1), but the second version will store something at the index of i. In C/C++, arrays are indexed from 0, so you need to make sure it stores the character in s[0] on the first pass through your while loop, in s[1] on the second pass through, and so on. With the code you posted, s[0] is never assigned to, which will cause the printf() to print out unintialised data.
The following implementation of getline works for me:
int getLine(char s[], int lim) {
int c;
int i;
i = 0;
while((c = getchar()) != EOF && c != '\n' && i < lim) {
s[i++] = c;
}
if(c == '\n' && i != 0) {
s[i++] = c;
s[i++] = '\0';
}
return i;
}
By doing ++i instead of i++, you are not assigning anything to s[0] in getLine()!
Also, you are unnecesarilly incrementing when assigning '\0' at the end of the loop, which BTW you should always assign, so take it out from the conditional.
Also add return types to the functions (int main and int getLine)
Watch out for the overflow as well - you are assigning to s[i] at the end with a limit of i == lim thus you may be assigning to s[MAXLINE]. This would be a - wait for it - stack overflow, yup.

Resources