Extra EOF character - c

I have a program that reads a file into a buffer structure. The problem I'm having is that when I look at the output of the file, there's an extra EOF character at the end. Ill post the related functions:(NOTE: I removed parameter checks and only posted code in the function related to the issue)
b_load
int b_load(FILE * const fi, Buffer * const pBD){
unsigned char character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars = 0; /*Counter of the amount of characters read into the buffer*/
/*Assigns main Buffer to tempBuffer*/
tempBuffer = pBD;
/*Infinite loop that breaks after EOF is read*/
while(1){
/*calls fgetc() and returns the char into the character variable*/
character = (unsigned char)fgetc(fi);
if(!feof(fi)){
tempBuffer = b_addc(pBD,character);
if(tempBuffer == NULL)
return LOAD_FAIL;
++num_chars;
}else{
break;
}
}
return num_chars;
}
b_print
int b_print(Buffer * const pBD){
int num_chars = 0;
if(pBD->addc_offset == 0)
printf("The buffer is empty\n");
/*Sets getc_offset to 0*/
b_set_getc_offset(pBD, 0);
pBD->eob=0;
/*b_eob returns the structures eob field*/
while (!b_eob(pBD)){
printf("%c",b_getc(pBD));
++num_chars;
}
printf("\n");
return num_chars;
}
b_getc
char b_getc(Buffer * const pBD){
if(pBD->getc_offset == pBD->addc_offset){
pBD->eob = 1;
return R_FAIL_1;
}
pBD->eob = 0;
return pBD->ca_head[(pBD->getc_offset)++];
}
at the end I end up with:
"a catÿ"
(the y is the EOF character)
It prints an EOF character but is never added to the buffer. When the driver code adds an EOF character to the end of the buffer, 2 appear. Any idea what is causing this? I might be using feof() wrong so that may be it, but it is required in the code

There is no "EOF character". EOF is a value returned by getchar() and related functions to indicate that they have no more input to read. It's a macro that expands to a negative integer constant expression, typically (-1).
(For Windows text files, an end-of-file condition may be triggered by a Control-Z character in a file. If you read such a file in text mode, you won't see that character; it will just act like it reached the end of the file at that point.)
Don't use the feof() function to detect that there's no more input to read. Instead, look at the value returned by whatever input function you're using. Different input functions use different ways to indicate that they weren't able to read anything; read the documentation for whichever one you're using. For example, fgets() returns a null pointer, getchar() returns EOF, and scanf() returns the number of items it was able to read.
getchar(), for example, returns either the character it just read (treated as an unsigned char and converted to int) or the value EOF to indicate that it wasn't able to read anything. The negative value of EOF is chosen specifically to avoid colliding with any valid value of type unsigned char. Which means you need to store the value returned by getchar() in an int object; if you store it in a char or unsigned char instead, you can lose information, and an actual character with the value 0xff can be mistaken for EOF.
The feof() function returns the value of the end-of-file indicator for the file you're reading from. That indicator becomes true after you've tried and failed to read from the file. And if you ran out of input because of an error, rather than because of an end-of-file condition, feof() will never become true.
You can use feof() and/or ferror() to determine why there was no more input to be read, but only after you've detected it by other means.
Recommended reading: Section 12 of the comp.lang.c FAQ, which covers stdio. (And the rest of it.)
UPDATE :
I haven't seen enough of your code to understand what you're doing with the Buffer objects. Your input look actually looks (almost) correct, though it's written in a clumsy way.
The usual idiom for reading characters from a file is:
int c; /* `int`, NOT `char` or `unsigned char` */
while ((c = fgetc(fi)) != EOF) {
/* process character in `c` */
}
But your approach, which I might rearrange like this:
while (1) {
c = fgetc(fi);
if (feof(fi) || ferror(fi)) {
/* no more input */
break;
}
/* process character in c */
}
should actually work. Note that I've added a check for ferror(f1). Could it be that you have an error on input (which you're not detecting)? That would cause c to contain EOF, or the value of EOF converted to the type of c. That's doubtful, though, since it would probably give you an infinite loop.
Suggested approach: Using either an interactive debugger or added printf calls, show the value of character every time through the loop. If your input loop is working correctly, then build a stripped-down version of your program with a hard-wired sequence of calls to b_addc(), and see if you can reproduce the problem that way.

There you go ...
int b_load(FILE * const fi, Buffer * const pBD){
int character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars ; /*Counter of the amount of characters read into the buffer*/
/*Infinite loop that breaks WHEN EOF is read*/
while(num_chars = 0; 1; num_chars++ ) {
character = fgetc(fi);
if (character == EOF || feof(fi)) break; // since you insist on the silly feof() ...
tempBuffer = b_addc(pBD, (unsigned char) character);
if(tempBuffer == NULL) return LOAD_FAIL;
}
}
return num_chars;
}

Related

The strange behavior of 'read' system function

The programm I tried writing should have been able to read a string of a length not longer than 8 characters and check if such string were present in the file. I decided to use 'read' system function for it, but I've come up with a strange behavior of this function. As it's written in manual, it must return 0 when the end of file is reached, but in my case when there were no more characters to read it still read a '\n' and returned 1 (number of bytes read) (I've checked the ASCII code of the read character and it's actually 10 which is of '\n'). So considering this fact I changed my code and it worked, but I still can't understand why does it behave in this way. Here is the code of my function:
int is_present(int fd, char *string)
{
int i;
char ch, buf[9];
if (!read(fd, &ch, 1)) //file is empty
return 0;
while (1) {
i = 0;
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
buf[i] = '\0';
if (!strncmp(string, buf, strlen(buf))) {
close(fd);
return 1;
}
if(!read(fd, &ch, 1)) //EOF reached
break;
}
close(fd);
return 0;
}
I think that your problem is in the inner read() call. There you are not checking the return of the function.
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
If the file happens to be at EOF when entering the function and ch equals '\n' then it will be an infinite loop, because read() will not modify the ch value. BTW, you are not checking the bounds of buf.
I'm assuming the question is 'why does read() work this way' and not 'what is wrong with my program?'.
This is not an error. From the manual page:
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
If you think about it read must work this way. If it returned 0 to indicate an end of file was reached when some data had been read, you would have no idea how much data had been read. Therefore read returns 0 only when no data is read because of an end-of-file condition.
Therefore in this case, where there is only a \n available, read() will succeed and return 1. The next read will return a zero to indicate end of file.
The read() function unless it finds a EOF keeps reading characters and places it on the buffer. here in this case \n is also considered as a character. hence it reads that also. Your code would have closed after it read the \n as there was nothing else other than EOF . So only EOF is the delimiter for the read() and every other character is considered normal. Cheers!

GNU-getline: strange behaviour about EOF

Test
In order to find the behaviour of getline() when confronted with EOF, I wrote the following test:
int main (int argc, char *argv[]) {
size_t max = 100;
char *buf = malloc(sizeof(char) * 100);
size_t len = getline(&buf, &max, stdin);
printf("length %zu: %s", len, buf);
}
And input1 is:
abcCtrl-DEnter
Result:
length 4: abc //notice that '\n' is also taken into consideration and printed
Input2:
abcEnter
Exactly same output:
length 4: abc
It seems that the EOF is left out out by getline()
Source code
So I find the source code of getline() and following is a related snippet of it (and I leave out some comments and irrelevant codes for conciseness):
while ((c = getc (stream)) != EOF)
{
/* Push the result in the line. */
(*lineptr)[indx++] = c;
/* Bail out. */
if (c == delim) //delim here is '\n'
break;
}
/* Make room for the null character. */
if (indx >= *n)
{
*lineptr = realloc (*lineptr, *n + line_size);
if (*lineptr == NULL)
return -1;
*n += line_size;
}
/* Null terminate the buffer. */
(*lineptr)[indx++] = 0;
return (c == EOF && (indx - 1) == 0) ? -1 : indx - 1;
Question
So my question is:
why length here is 4 (as far as I can see it should be 5)(as wiki says, It won't be a EOF if it not at the beginning of a line)
A similar question:EOF behavior when accompanied by other values but notice getline() in that question is different from GNU-getline
I use GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Ctrl-D causes your terminal to flush the input buffer if it isn’t already flushed. Otherwise, the end-of-file indicator for the input stream is set. A newline also flushes the buffer.
So you didn't close the stream, but only flushed the input buffer, which is why getline doesn't see an end-of-file indicator.
In neither of these cases, a literal EOT character (ASCII 0x04, ^D) is received by getline (in order to do so, you can type Ctrl-VCtrl-D).
Type
abcCtrl-DCtrl-D
or
abcEnterCtrl-D
to actually set the end-of-file indicator.
From POSIX:
Special characters
EOF
Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a <newline>, and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.
FYI, the ICANON flag is specified here.

Reading from stdin and storing \n and whitespace

I've been trying to use scanf to get input from stdin but it truncates the string after seeing whitespace or after hitting return.
What I'm trying to get is a way to read keyboard input that stores in the buffer linebreaks as well as whitespace. And ending when ctrl-D is pressed.
Should I try using fgets? I figured that wouldn't be optimal either since fgets returns after reading in a \n
There is no ready-made function to read everyting from stdin, but creating your own is fortunately easy. Untested code snippet, with some explanation in comments, which can read arbitrarily large amount of chars from stdin:
size_t size = 0; // how many chars have actually been read
size_t reserved = 10; // how much space is allocated
char *buf = malloc(reserved);
int ch;
if (buf == NULL) exit(1); // out of memory
// read one char at a time from stdin, until EOF.
// let stdio to handle input buffering
while ( (ch = getchar()) != EOF) {
buf[size] = (char)ch;
++size;
// make buffer larger if needed, must have room for '\0' below!
// size is always doubled,
// so reallocation is going to happen limited number of times
if (size == reserved) {
reserved *= 2;
buf = realloc(buf, reserved);
if (buf == NULL) exit(1); // out of memory
}
}
// add terminating NUL character to end the string,
// maybe useless with binary data but won't hurt there either
buf[size] = 0;
// now buf contains size chars, everything from stdin until eof,
// optionally shrink the allocation to contain just input and '\0'
buf = realloc(buf, size+1);
scanf() splits the input at whitespace boundaries, so it's not suitable in your case. Indeed fgets() is the better choice. What you need to do is keep reading after fgets() returns; each call will read a line of input. You can keep reading until fgets() returns NULL, which means that nothing more can be read.
You can also use fgetc() instead if you prefer getting input character by character. It will return EOF when nothing more can be read.
If you want to read all input, regardless of whether it is whitespace or not, try fread.
Read like this
char ch,line[20];
int i=0; //define a counter
//read a character assign it to ch,
//check whether the character is End of file or not and
//also check counter value to avoid overflow.
while((ch=getchar())!=EOF && i < 19 )
{
line[i]=ch;
i++;
}
line[i]='\0';

I don't understand the behavior of fgets in this example

While I could use strings, I would like to understand why this small example I'm working on behaves in this way, and how can I fix it ?
int ReadInput() {
char buffer [5];
printf("Number: ");
fgets(buffer,5,stdin);
return atoi(buffer);
}
void RunClient() {
int number;
int i = 5;
while (i != 0) {
number = ReadInput();
printf("Number is: %d\n",number);
i--;
}
}
This should, in theory or at least in my head, let me read 5 numbers from input (albeit overwriting them).
However this is not the case, it reads 0, no matter what.
I understand printf puts a \0 null terminator ... but I still think I should be able to either read the first number, not just have it by default 0. And I don't understand why the rest of the numbers are OK (not all 0).
CLARIFICATION: I can only read 4/5 numbers, first is always 0.
EDIT:
I've tested and it seems that this was causing the problem:
main.cpp
scanf("%s",&cmd);
if (strcmp(cmd, "client") == 0 || strcmp(cmd, "Client") == 0)
RunClient();
somehow.
EDIT:
Here is the code if someone wishes to compile. I still don't know how to fix
http://pastebin.com/8t8j63vj
FINAL EDIT:
Could not get rid of the error. Decided to simply add #ReadInput
int ReadInput(BOOL check) {
...
if (check)
printf ("Number: ");
...
# RunClient()
void RunClient() {
...
ReadInput(FALSE); // a pseudo - buffer flush. Not really but I ignore
while (...) { // line with garbage data
number = ReadInput(TRUE);
...
}
And call it a day.
fgets reads the input as well as the newline character. So when you input a number, it's like: 123\n.
atoi doesn't report errors when the conversion fails.
Remove the newline character from the buffer:
buf[5];
size_t length = strlen(buffer);
buffer[length - 1]=0;
Then use strtol to convert the string into number which provides better error detection when the conversion fails.
char * fgets ( char * str, int num, FILE * stream );
Get string from stream.
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str. (This means that you carry \n)
A terminating null character is automatically appended after the characters copied to str.
Notice that fgets is quite different from gets: not only fgets accepts a stream argument, but also allows to specify the maximum size of str and includes in the string any ending newline character.
PD: Try to have a larger buffer.

While (( c = getc(file)) != EOF) loop won't stop executing

I can't figure out why my while loop won't work. The code works fine without it... The purpose of the code is to find a secret message in a bin file. So I got the code to find the letters, but now when I try to get it to loop until the end of the file, it doesn't work. I'm new at this. What am I doing wrong?
main(){
FILE* message;
int i, start;
long int size;
char keep[1];
message = fopen("c:\\myFiles\\Message.dat", "rb");
if(message == NULL){
printf("There was a problem reading the file. \n");
exit(-1);
}
//the first 4 bytes contain an int that tells how many subsequent bytes you can throw away
fread(&start, sizeof(int), 1, message);
printf("%i \n", start); //#of first 4 bytes was 280
fseek(message, start, SEEK_CUR); //skip 280 bytes
keep[0] = fgetc(message); //get next character, keep it
printf("%c", keep[0]); //print character
while( (keep[0] = getc(message)) != EOF) {
fread(&start, sizeof(int), 1, message);
fseek(message, start, SEEK_CUR);
keep[0] = fgetc(message);
printf("%c", keep[0]);
}
fclose(message);
system("pause");
}
EDIT:
After looking at my code in the debugger, it looks like having "getc" in the while loop threw everything off. I fixed it by creating a new char called letter, and then replacing my code with this:
fread(&start, sizeof(int), 1, message);
fseek(message, start, SEEK_CUR);
while( (letter = getc(message)) != EOF) {
printf("%c", letter);
fread(&start, sizeof(int), 1, message);
fseek(message, start, SEEK_CUR);
}
It works like a charm now. Any more suggestions are certainly welcome. Thanks everyone.
The return value from getc() and its relatives is an int, not a char.
If you assign the result of getc() to a char, one of two things happens when it returns EOF:
If plain char is unsigned, then EOF is converted to 0xFF, and 0xFF != EOF, so the loop never terminates.
If plain char is signed, then EOF is equivalent to a valid character (in the 8859-1 code set, that's ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS), and your loop may terminate early.
Given the problem you face, we can tentatively guess you have plain char as an unsigned type.
The reason that getc() et al return an int is that they have to return every possible value that can fit in a char and also a distinct value, EOF. In the C standard, it says:
ISO/IEC 9899:2011 §7.21.7.1 The fgetc() function
int fgetc(FILE *stream);
If the end-of-file indicator for the input stream pointed to by stream is not set and a
next character is present, the fgetc function obtains that character as an unsigned char converted to an int ...
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-
file indicator for the stream is set and the fgetc function returns EOF.
Similar wording applies to the getc() function and the getchar() function: they are defined to behave like the fgetc() function except that if getc() is implemented as a macro, it may take liberties with the file stream argument that are not normally granted to standard macros — specifically, the stream argument expression may be evaluated more than once, so calling getc() with side-effects (getc(fp++)) is very silly (but change to fgetc() and it would be safe, but still eccentric).
In your loop, you could use:
int c;
while ((c = getc(message)) != EOF) {
keep[0] = c;
This preserves the assignment to keep[0]; I'm not sure you truly need it.
You should be checking the other calls to fgets(), getc(), fread() to make sure you are getting what you expect as input. Especially on input, you cannot really afford to skip those checks. Sooner, rather than later, something will go wrong and if you aren't religiously checking the return statuses, your code is likely to crash, or simply 'go wrong'.
There are 256 different char values that might be returned by getc() and stored in a char variable like keep[0] (yes, I'm oversummarising wildly). To detect end-of-file reliably, EOF has to have a value different from all of them. That's why getc() returns int rather than char: because a 257th distinct value for EOF wouldn't fit into a char.
Thus you need to store the value returned by getc() in an int at least until you check it against EOF:
int tmpc;
while( (tmpc = getc(message)) != EOF) {
keep[0] = tmpc;
...

Resources