Reading input from file in C - c

Okay so I have a file of input that I calculate the amount of words and characters in each line with success.
When I get to the end of the line using the code below it exits the loop and only reads in the first line. How do I move on to the next line of input to continue the program?
EDIT: I must parse each line separately so I cant use EOF
while( (c = getchar()) != '\n')

Change '\n' to EOF. You're reading until the end of the line when you want to read until the end of the file (EOF is a macro in stdio.h which corresponds to the character at the end of a file).
Disclaimer: I make no claims about the security of the method.

'\n' is the line feed (new line)-character, so the loop will terminate when the end of first line is reached. The end of the file is marked by an end-of-file (EOF)-characte. cstdio (or stdio.h), which contains the getchar()-function, has the EOF -constant defined, so just change the while-line to
while( (c = getchar()) != EOF)

From the man page: "reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error." EOF is a macro (often -1) for the return of this and related functions that indicates end of file. You want to check whether this is what you're getting back. Note that getc returns a signed int, but that valid values are unsigned chars cast to ints. What out if c is a signed char.

Well, the \n character is actually a combination of two characters, two bytes:
the 13th byte + the 10th byte. You could try something like,
int c2=getchar(),c1;
while(1)
{
c1=c2;
c2=getchar();
if(c1==EOF)
break;
if(c1==(char)13 && c2==(char)10)
break;
/*use c1 as the input character*/
}
this should test if two input characters make the proper couplet (13,10)

Related

How to read a non-text file containing EOF characters in C?

I'm trying to loop over all bytes of a file using a simple while loop, like so:
char c = fgetc(InputFile);
while (c != EOF)
{
doStuff(c)
c = fgetc(InputFile);
}
However, when working with non-text files, I've found that some of the bytes within the file (that aren't the last one) contain the value 255, and therefore register as EOF and the while loop ends prematurely.
How do I get around this and loop over all bytes?
As mentioned in the comments, you should assign the value returned by fgetc to an int variable, not a char. That way, you will be able to distinguish between a successfully input character that has the hex value 0xFF (fgetc will return 255) and a end-of-file condition (fgetc will return EOF, which is -1).
From the cppreference page for fgetc:
On success, returns the obtained character as an unsigned char
converted to an int. On failure, returns EOF.

Can EOF be on the same line as \n

As the title says can EOF be on the same line as "\n"? I am currently working on a project recreating getline in C and I am on the part of implementing what happens at EOF and I was trying to figure out will I get the last line of a text file as:
"blahblahblahblah\n\0"
or
"blahblahblah\n
\0"
Can EOF be on the same line as \n
Yes, but rarely.
EOF is a constant usually used to signal 1) end-of-file or a 2) rare input error. It is not usually encoded as a character in a text stream.
A return value of EOF from fgetc() and friends can occur at any time - even in the middle of a line due to an error.
Otherwise, no:
A end-of-file does not occur at the end of a line with a '\n'. Typically the last line in a file that contains at least 1 character (and no `'\n') will also be a line.
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2
EOF is a signal to notify that there is no more input. There is no actual data in the file called EOF at the end. EOF is not same of null character \0.
I think you want to know whether you can get an EOF without getting a newline first. The answer is yes. It is possible if there is no newline at the end of file, i.e. when the last line finished without a newline at end. You can generate such a file easily by opening a text editor, typing some text and save, quit without hitting ENTER. That file won't have any newline at end.
Expanding on #taskinoor's and #chux's answers and after looking at your examples, I think you might be confusing EOF with the '\0'-terminating byte for strings.
EOF is a value that functions like getc return to tell the user that it was not able to read any more from the stream. That's why you often see this:
// fp is a FILE*, it may be stdin
// it may be a file on the disk or something
// else
int c;
while((c = getc(fp)) != EOF)
{
// doing something with the character
}
Note here that c is declare as int, because EOF is a signed int and that
is what getc returns. I previously made a statement here1 that isn't that correct, chux
pointed out my mistake and I'd like to quote his/her comment here instead:
User chux comment on my answer
EOF, as a negative constant, is often -1 and can fit in a char, when char a signed char.
It is not so much that it does not fit, it is that storing a EOF in a char can be indistinguishable
from storing say a character with the value of 255 in a char.
'\0' is on the other hand is the byte that is used to terminate a string in C. In
C a strings is nothing more than a sequence of characters that ends with that
byte. We use char arrays to store strings. So EOF and \0 are different
things and have even different values.
Footnote
1Note here that c is declare as int, because EOF is a signed int and
actually doesn't fit in a char, so it will never be part of a c-string.

Count lines in ASCII file using C

I would like to count the number of lines in an ASCII text file.
I thought the best way to do this would be by counting the newlines in the file:
for (int c = fgetc(fp); c != EOF; c = fgetc(fp)) { /* Count word line endings. */
if (c == '\n') ++lines;
}
However, I'm not sure if this would account for the last line on all both MS Windows and Linux. That is if my text file finishes as below, without an explicit newline, is there one encoded there anyway or should I add an extra ++lines; after the for loop?
cat
dog
Then what about if there is an explicit newline at the end of the file? Or do I just need to test for this case by keeping track of the previously read value?
If there is no newline, one won't be generated. C tells you exactly what's there.
Text files are always expected to end with a line feed. There's no canonical way of handling files that don't.
Here's how some tools choose to deal with characters after the last line feed:
wc doesn't count it as a line (so you have good precedence for that)
Vim marks the file as [noeol], and saves the file without a trailing line feed
GNU sed treats the file as if it had a last line feed
sh's read exits with error, but still returns the data
Since behaviour is pretty much undefined, you can just do whatever's convenient or useful to you.
First, there will not be any implicitly encoded newline at the end of the last line. The only way there will be a newline is if the software or person that produced the file put it there. Putting it there is generally considered good practice, however.
The ultimate answer for what you should report as the line count depends on the convention that you need to follow for the software or people that will be using this line count, and probably what you can assume about the behavior of the input source as well.
Most command-line tools will terminate their output with a newline character. In this case, the sensible answer may be to report the number of newline characters as the number of actual lines.
On the other hand, when a text editor is displaying a file, you will see that the line numbering in the margin (if supported) contains a number for the last line whether it is empty or not. This is in part to tell the user that there is a blank line there, but if you want to count the number of lines displayed in the margin, it is one plus the number of newline characters in the file. It is typical for some coders to not terminate their last lines with a newline character (sometimes due to sloppiness), so in this case this convention would actually be the right answer.
I'm not sure any other conventions make much sense. For example, if you choose not to count the last line unless it is non-empty, then what counts as non-empty? The file ending after newline? What if there is whitespace on that line? What if there are several empty lines at the end of the file?
If you're going to use this method, you could always keep a separate counter for how many letters on the line you are at. If the count at the end is greater than 1, then you know there is stuff on the last line that wasn't counted.
int letters = 0
for (int c = fgetc(fp); c != EOF; c = fgetc(fp)) { /* Count word line endings. */
letters++; // Increase count on character
if (c == '\n')
{
++words;
letters = 0; // Set back to 0 after new line
}
}
if (letters > 0)
{
++words;
}
Your concern is real, the last line in the file may be missing the final end of line marker. The end of line marker is a single '\n' in Linux, a CR LF pair in Windows that the C runtime converts automatically into a '\n'.
You can simplify your code and handle the special case of the last line missing a linefeed this way:
int c, last = '\n', lines = 0;
while ((c = getc(fp)) != EOF) { /* Count word line endings. */
if (c == '\n')
lines += 1;
last = c;
}
if (last != '\n')
lines += 1;
Since you are concerned with speed, using getc instead of fgetc will help on platforms where it is defined as a macro that handles the stream structures directly and calls a function only to refill the buffer, every BUFSIZ characters or so, unless the stream is unbuffered.
How about this:
Create a flag for yourself to keep track of any non \n characters following a \n that is reset when c=='\n'.
After the EOF, check to see if the flag is true and increment if yes.
bool more_chars = false;
for (int c = fgetc(fp); c != EOF; c = fgetc(fp)) { /* Count word line endings. */
if (c == '\n') {
more_chars = false;
++words;
} else more_chars = true;
}
if(more_chars) words++;
Windows and UNIX/Linux style line breaks make no difference here. On either system a text file may or may not have a newline at the end of the last line.
If you always add 1 to the line count, this effectively counts the empty line at the end of the file when there is a newline at the end (i.e., file "foo\n" will count as having two lines: "foo" and ""). This may be an entirely reasonable solution, depending on how you want to define a line.
Another definition of a "line" is that it always ends in a newline, i.e., the file "foo\nbar" would only have one line ("foo") by this definition. This definition is used by wc.
Of course you could keep track of whether the newline was the last character in file and only add 1 to the count in case it wasn't. Then a "line" would be defined as either ending in a newline or being non-empty at the end of the file, which sounds quite complex to me.

C: printf() and putchar() questions

I was reading K&R book and wanted to test out printf() and putchar() functions in ways that I never tried. I encountered several unexpected events and would like to hear from more experienced programmers why that happens.
char c;
while((c = getchar()) != EOF) {
//put char(c);
printf("%d your character was.\n", c);
}
How would one get EOF (end of file) in the input stream (getchar() or scanf() functions)? Unexpected key that is not recognized by getchar()/scanf() function could produce it?
In my book, it says that c has to be an integer, because it needs to store EOF and the variable has to be big enough to hold any possible char that EOF can hold. This doesn't make sense for me, because EOF is a constant integer with a value of -1, which even char can store. Can anyone clarify what was meant by this?
What happens when I send "hello" or 'hello' to putchar() function? It expects to get an integer, but returns weird output, such as EE or oo, if I send the latter string or char sequence.
Why when I use printf() function that is written above I get two outputs? One is the one I entered and the other one is integer, which in ASCII is end of line. Does it produce the second output, because I press enter, which it assumes to be the second character?
Thanks.
on Linux, you can send it with Ctrl+d
you need an int, otherwise you can't make the difference between EOF and the last possible character (0xFFFF is not the same than 0x00FF)
putchar wants a character, not a string, if you're trying to give it a string, it'll print a part of the string address
you only get one output: the ascii value of the character you entered, the other "input" is what you typed in the terminal
edit - more details about 2
You need an int, because getchar can returns both a character of value -1 (0x00FF) and an integer of value -1 (0xFFFF), they don't have the same meaning: the character of value -1 is a valid character (for instance, it's ÿ in latin-1) while the integer of value -1 is EOF in this context.
Here's a simple program that shows the difference:
#include <stdio.h>
int main(int argc, char ** argv) {
{
char c = 0xFF; /* this is a valid char */
if (c == EOF) printf("wrong end of file detection\n");
}
{
int c = 0xFF; /* this is a valid char */
if (c == EOF) printf("wrong end of file detection\n");
}
}
The first test succeeds because 0xFF == -1 for char, while the second tests fails because 0x00FF != -1 for int.
I hope that makes it a bit clearer.
you must close the input stream to get EOF. Usually with CTRL-D in UNIX, but see you tty config (stty -a)
already answered
same
your tty echoes what you type by default. If you don't want this, set it in noecho mode (stty -echo). Becareful as some shells sets it again to echo. Try with sh. You must be aware taht tty also buffers your inputs until RETURN is types (see stty manual for raw mode if you need).

EOF missing for unix file [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
fgetc does not identify EOF
fgetc, checking EOF
I have created a file and named it "file.txt" in Unix. I tried to read the file content from my C program. I am not able to receive the EOF character. Unix doesn't store EOF character on file creation? If so what is the alternative way to read the EOF from a Unix created file using C.
Here's the code sample
int main(){
File *fp;
int nl,c;
nl =0;
fp = fopen("file.txt", "r");
while((c = fgetc(fp)) != EOF){
if (c=='\n')
nl++;
}
return 0;
}
If I explicitly give CTRL + D the EOF is detected even when I use char c.
This can happen if the type of c is char (and char is unsigned in your compiler, you can check this by examining the value of CHAR_MIN in ) and not int.
The value of EOF is negative according to the C standard.
So, implicitly casting EOF to unsigned char will lose the true value of EOF and the comparison will always fail.
UPDATE: There's a bigger problem that has to be addressed first. In the expression c = fgetc(fp) != EOF, fgetc(fp) != EOF is evaluated first (to 0 or 1) and then the value is assigned to c. If there's at least one character in the file, fgetc(fp) != EOF will evaluate to 0 and the body of the while loop will never execute. You need to add parentheses, like so: (c = fgetc(fp)) != EOF.
Missing parentheses. Should be:
while((c = fgetc(fp)) != EOF)
Remember: fgetc() returns an int, not a char. It has to return an int because its set of return values includes all possible valid characters plus a separate (negative) EOF indicator.
There are two possible traps if you use type char for c instead of int:
If the type char is signed with your compiler, you will detect a valid character as EOF. Often, the character ÿ (y-umlaut, officially known in Unicode as LATIN LOWER CASE Y WITH DIAERESIS, U+00FF, hex code 0xFF in the ISO 8859-1 aka Latin 1 code set) will be detected as equivalent to EOF, when it is a valid character.
If the type char is unsigned, then the comparison will never be true.
Both problems are serious, and both are avoided by using the correct type:
FILE *fp = fopen("file.txt", "r");
if (fp != 0)
{
int c;
int nl = 0;
while ((c = fgetc(fp)) != EOF)
if (c == '\n')
nl++;
printf("Number of lines: %d\n", nl);
}
Note that the type is FILE and not File. Note that you should check that the file was opened before trying to read via fp.
If I explicitly give CTRL + D, the EOF is detected even when I use char c.
This means that your compiler provides you with char as a signed type. It also means you will not be able to count lines accurately in files which contain ÿ.
Unlike CP/M and DOS, Unix does not use any character to indicate EOF; you reach EOF when there are no more characters to read. What confuses many people is that if you type a certain key combination at the terminal, programs detect EOF. What actually happens is that the terminal driver recognizes the character and sends any unread characters to the program. If there are no unread characters, the program gets 0 bytes returned, which is the same result you get when you've reached the end of file. So, the character combination (often, but not always, Ctrl-D) appears to 'send EOF' to the program. However, the character is not stored in a file if you are using cat >file; further, if you read a file which contains a control-D, that is a perfectly fine character with byte value 0x04. If a program generates a control-D and sends that to a program, that does not indicate EOF to the program. It is strictly a property of Unix terminals (tty and pty — teletype and pseudo-teletype — devices).
You do not show how you declare the variable c it should be of type int, not char.

Resources