C - Reading from a file issue with the last line - c

The expected input to my program for my assignment is something like
./program "hello" < helloworld.txt. The trouble with this however is that I must analyse every line that is in the program, so I have used the guard for the end of a line as:
while((c = getchar()) != EOF) {
if (c == '\n') {
/*stuff will be done*/
However, my problem with this is that if the helloworld.txt file contains:
hello
world
It will only read the first line(up to the second last line if there were to be more lines).
For this to be fixed, I have to strictly make a new line such that helloworld.txt looks something like:
hello
world
//
Is there another way around this?

Fix your algorithm. Instead of:
while((c = getchar()) != EOF) {
if (c == '\n') {
/* stuff will be done */
} else {
/* buffer the c character */
}
}
Do:
do {
c = getchar();
if (c == '\n' || c == EOF) {
/* do stuff with the buffered line */
/* clear the buffered line */
} else {
/* add the c character to the buffered line */
}
} while (c != EOF);
But please note that you shouldn't use the value of the c variable if it is EOF.

You need to re-structure your program so it can "do stuff" on EOF, if it has read any characters since the previous linefeed. That way, a non-terminated final line will still be processed.

Related

C code doesn't print whole paragraph with newlines

This is my C code:
#include <stdio.h>
int main()
{
int c = getchar();
while (c != EOF) {
if (c != '\n')
putchar(c);
else putchar(32);
c = getchar();
}
return 0;
}
I want to make a program that prints out a paragraph with newlines, by replacing the \n character with spaces. The problem is, it only prints out the last line, when I use the code provided above.
For, example, for the text:
This is
my
text
the result printed is text.
The paragraph is properly printed when I remove the if(), else conditions, and only leave the putchar(), without trying to replace anything.
What's the problem?
Your input file has CRLF newlines. You need to ignore the CR characters when you're replacing LF with space. Otherwise, printing the CR characters will go back to the beginning of the line and overwrite what was already printed.
#include <stdio.h>
int main()
{
int c;
while ((c = getchar()) != EOF) {
if (c == '\n') {
// replace newline with space
putchar(' ');
} else if (c == '\r') {
// ignore CR
} else {
putchar(c);
}
}
return 0;
}

Problems when trying to skip '\n' in reading txt files

I wrote a fiarly small program to help with txt files formatting, but when I tried to read from the input files and skip unwanted '\n' I actually skipped the next character after '\n' instead.
The characters I work on in the sample file is like this:
abcde
abc
ab
abcd
And my code looks like this:
while (!feof(fp1)) {
ch = fgetc(fp1);
if (ch != '\n') {
printf("%c",ch);
}
else {
ch = fgetc(fp1); // move to the next character
if (ch == '\n') {
printf("%c",ch);
}
}
}
The expected result is
abcdeabc
ababcd
But I actually got
abcdebc
abbcd
I guess the problem is in ch = fgetc(fp1); // move to the next character
, but I just can't find a correct way to implement this idea.
Think of the flow of your code (lines numbered below):
1: while (!feof(fp1)) {
2: ch = fgetc(fp1);
3: if (ch != '\n') {
4: printf("%c",ch);
5: }
6: else {
7: ch = fgetc(fp1); // move to the next character
8: if (ch == '\n') {
9: printf("%c",ch);
10: }
11: }
12: }
When you get a newline followed by non-newline, the flow is (starting at the else line): 6, 7, 8, 10, 11, 12, 1, 2.
It's that execution of the final 2 in that sequence that effectively throws away the non-newline character that you had read at 7.
If your intent is to basically throw away single newlines and convert sequences of newlines (two or more) to a single one(a), you can use something like the following pseudo-code:
set numNewlines to zero
while not end-file:
get thisChar
if numNewlines is one or thisChar is not newline:
output thisChar
if thisChar is newline:
increment numNewlines
else:
set numNewlines to zero
This reads the character in one place, making it less likely that you'll inadvertently skip one due to confused flow.
It also uses the newline history to decide what gets printed. It only outputs a newline on the second occurrence in a sequence of newlines, ignoring the first and any after the second.
That means taht a single newline will never be echoed and any group of two or more will be transformed into one.
Some actual C code that demonstrates this(b) follows:
#include <stdio.h>
#include <stdbool.h>
int main(void) {
// Open file.
FILE *fp = fopen("testprog.in", "r");
if (fp == NULL) {
fprintf(stderr, "Cannot open input file\n");
return 1;
}
// Process character by character.
int numNewlines = 0;
while (true) {
// Get next character, stop if none left.
int ch = fgetc(fp);
if (ch == EOF) break;
// Output only second newline in a sequence of newlines,
// or any non-nwline.
if (numNewlines == 1 || ch != '\n') {
putchar(ch);
}
// Manage sequence information.
if (ch == '\n') {
++numNewlines;
} else {
numNewlines = 0;
}
}
// Finish up cleanly.
fclose(fp);
return 0;
}
(a) It's unclear from your question how you want to handle sequences of three or more newlines so I've had to make an assumption.
(b) Of course, you shouldn't use this if your intent is to learn, because:
You'll learn more if you try yourself and have to fix any issues.
Educational institutions will almost certainly check submitted code against a web search, and you'll probably be pinged for plagiarism.
I'm just providing it for completeness.

reading a file using getc and skipping a line if it starts with semicolon

while((c = getc(file)) != -1)
{
if (c == ';')
{
//here I want to skip the line that starts with ;
//I don't want to read any more characters on this line
}
else
{
do
{
//Here I do my stuff
}while (c != -1 && c != '\n');//until end of file
}
}
Can I completely skip a line using getc if first character of line is a semicolon?
Your code contains a couple of references to -1. I suspect that you're assuming that EOF is -1. That's a common value, but it is simply required to be a negative value — any negative value that will fit in an int. Do not get into bad habits at the start of your career. Write EOF where you are checking for EOF (and don't write EOF where you are checking for -1).
int c;
while ((c = getc(file)) != EOF)
{
if (c == ';')
{
// Gobble the rest of the line, or up until EOF
while ((c = getc(file)) != EOF && c != '\n')
;
}
else
{
do
{
//Here I do my stuff
…
} while ((c = getc(file)) != EOF && c != '\n');
}
}
Note that getc() returns an int so c is declared as an int.
Let's assume that by "line" you mean a string of characters until you hit a designated end-of-line character (here assumed as \n, different systems use different characters or character sequences like \r\n). Then whether the current character c is in a semicolon-started line or not becomes a state information which you need to maintain across different iterations of the while-loop. For example:
bool is_new_line = true;
bool starts_with_semicolon = false;
int c;
while ((c = getc(file) != EOF) {
if (is_new_line) {
starts_with_semicolon = c == ';';
}
if (!starts_with_semicolon) {
// Process the character.
}
// If c is '\n', then next letter starts a new line.
is_new_line = c == '\n';
}
The code is just to illustrate the principle -- it's not tested or anything.

Kernighan & Ritchie code example confusion

Is there any reason of the second 'c = getchar()' mention in this code example?
#include <stdio.h>
/* copy input to output; 1st version */
int main(void) {
int c;
c = getchar();
while (c != EOF) {
putchar(c);
c = getchar(); // <-- I mean this one.
}
return 0;
}
c = getchar(); //read for the first time before entering while loop
while (c != EOF) {
putchar(c);
c = getchar(); // read the next input and go back to condition checking
}
return 0;
first getchar() reads the first time input character.
second getchar() keeps on reading next input(s), untill a EOF
In other words, the purpose of while (c != EOF) is to keep on checking whether the c is EOF or not. if c is not changed, then the while() loop is meaningless, isn't it? The second getch() is responsible for chnaging the value of c in each iteration.
yes, so it wont putchar EOF.
It reads the first character, checks that its not EOF, then putChars it, then gets another char, back to the top of the while loop and checks its not EOF.
The second c = getchar() is to read another char and yet anther one until EOF is met.
first c = getchar(); will only work Once, but c = getchar(); inside while loop will work every time until c != EOF.
c = getchar(); // Read value of `c` if `c != EOF` it will enter while loop else it will exit
while (c != EOF) { // checking condition
putchar(c); //printing value of c
c = getchar(); // again getting new value of c and checking in while loop,
//if condition is true it will continue, else it will exit
}
It's there because a while loop tests at the top but you really need the test in the middle. An alternative to duplicating code above the loop and inside it, is using break.
while (1) {
c = getchar();
if (c == EOF) break; /* test in middle */
putchar(c);
}
That's my inattention. I was running in terminal this version of code:
while((c = getchar()), c != EOF) {
putchar(c);
}
and couldn't see the difference between results. Stupid situation.
Thanks to all anyway.

Why is this program yielding wrong output

This program is supposed to remove all comments from a C source code (in this case comments are considered double slashes '//' and a newline character '\n' and anything in between them, and also anything between '/* ' and '*/'.
The program:
#include <stdio.h>
/* This is a multi line comment
testing */
int main() {
int c;
while ((c = getchar()) != EOF)
{
if (c == '/') //Possible comment
{
c = getchar();
if (c == '/') // Single line comment
while (c = getchar()) //While there is a character and is not EOF
if (c == '\n') //If a space character is found, end of comment reached, end loop
break;
else if (c == '*') //Multi line comment
{
while (c = getchar()) //While there is a character and it is not EOF
{
if (c == '*' && getchar() == '/') //If c equals '*' and the next character equals '/', end of comment reached, end loop
break;
}
}
else putchar('/'); putchar(c); //If not comment, print '/' and the character next to it
}
else putchar(c); //if not comment, print character
}
}
After I use this source code as its own input, this is the output I get:
#include <stdio.h>
* This is a multi line comment
testing *
int main() {
int c;
while ((c = getchar()) != EOF)
{
if (c == '') ////////////////
{
c = getchar();
if (c == '') ////////////////////
while (c = getchar()) /////////////////////////////////////////
if (c == '\n') ///////////////////////////////////////////////////////////////
break;
else if (c == '*') ///////////////////
{
while (c = getchar()) ////////////////////////////////////////////
{
No more beyond this point. I'm compiling it using g++ on the ubuntu terminal.
As you can see, multi lines comments had only their '/' characters removed, while single line ones, had all their characters replaced by '/'. Apart from that, any '/' characters that were NOT the beginning of a new comment were also removed, as in the line if (c == ''), which was supposed to be if (c == '/').
Does anybody know why? thanks.
C does not take notice of the way you indent your code. It only cares about its own grammar.
Look carefully at your elses and think about which if they attach to (hint: the closest open one).
There are other bugs, as well. EOF is not 0, so only the first while is correct. And what happens if the comment looks like this: /* something **/?
You have some (apparent) logic errors...
1.
while (c = getchar()) //While there is a character and is not EOF
You're assuming that EOF == 0. Why not be explicit and change the preceding line to:
while((c = getchar()) != EOF)
2.
else putchar('/'); putchar(c);
Are both of the putchars supposed to be part of the else clause? If so, you need braces {} around the two putchar statements. Also, give each putchar its own line; it not only looks nicer but it's more readable.
Conclusion
Other than what I've mentioned, your logic looks sound.
As already mentioned, the if/else matching is incorrect. One aditional missing functionality is that you must make it more stateful to keep track of whether you are inside a string or not, e.g.
printf("This is not // a comment\n");

Resources