GNU-getline: strange behaviour about EOF

GNU-getline: strange behaviour about EOF - c

Test
In order to find the behaviour of getline() when confronted with EOF, I wrote the following test:
int main (int argc, char *argv[]) {
size_t max = 100;
char *buf = malloc(sizeof(char) * 100);
size_t len = getline(&buf, &max, stdin);
printf("length %zu: %s", len, buf);
}
And input1 is:
abcCtrl-DEnter
Result:
length 4: abc //notice that '\n' is also taken into consideration and printed
Input2:
abcEnter
Exactly same output:
length 4: abc
It seems that the EOF is left out out by getline()
Source code
So I find the source code of getline() and following is a related snippet of it (and I leave out some comments and irrelevant codes for conciseness):
while ((c = getc (stream)) != EOF)
{
/* Push the result in the line. */
(*lineptr)[indx++] = c;
/* Bail out. */
if (c == delim) //delim here is '\n'
break;
}
/* Make room for the null character. */
if (indx >= *n)
{
*lineptr = realloc (*lineptr, *n + line_size);
if (*lineptr == NULL)
return -1;
*n += line_size;
}
/* Null terminate the buffer. */
(*lineptr)[indx++] = 0;
return (c == EOF && (indx - 1) == 0) ? -1 : indx - 1;
Question
So my question is:
why length here is 4 (as far as I can see it should be 5)(as wiki says, It won't be a EOF if it not at the beginning of a line)
A similar question:EOF behavior when accompanied by other values but notice getline() in that question is different from GNU-getline
I use GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2

Ctrl-D causes your terminal to flush the input buffer if it isn’t already flushed. Otherwise, the end-of-file indicator for the input stream is set. A newline also flushes the buffer.
So you didn't close the stream, but only flushed the input buffer, which is why getline doesn't see an end-of-file indicator.
In neither of these cases, a literal EOT character (ASCII 0x04, ^D) is received by getline (in order to do so, you can type Ctrl-VCtrl-D).
Type
abcCtrl-DCtrl-D
or
abcEnterCtrl-D
to actually set the end-of-file indicator.
From POSIX:
Special characters
EOF
Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a <newline>, and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.
FYI, the ICANON flag is specified here.

Related

The strange behavior of 'read' system function

The programm I tried writing should have been able to read a string of a length not longer than 8 characters and check if such string were present in the file. I decided to use 'read' system function for it, but I've come up with a strange behavior of this function. As it's written in manual, it must return 0 when the end of file is reached, but in my case when there were no more characters to read it still read a '\n' and returned 1 (number of bytes read) (I've checked the ASCII code of the read character and it's actually 10 which is of '\n'). So considering this fact I changed my code and it worked, but I still can't understand why does it behave in this way. Here is the code of my function:
int is_present(int fd, char *string)
{
int i;
char ch, buf[9];
if (!read(fd, &ch, 1)) //file is empty
return 0;
while (1) {
i = 0;
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
buf[i] = '\0';
if (!strncmp(string, buf, strlen(buf))) {
close(fd);
return 1;
}
if(!read(fd, &ch, 1)) //EOF reached
break;
}
close(fd);
return 0;
}

I think that your problem is in the inner read() call. There you are not checking the return of the function.
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
If the file happens to be at EOF when entering the function and ch equals '\n' then it will be an infinite loop, because read() will not modify the ch value. BTW, you are not checking the bounds of buf.

I'm assuming the question is 'why does read() work this way' and not 'what is wrong with my program?'.
This is not an error. From the manual page:
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
If you think about it read must work this way. If it returned 0 to indicate an end of file was reached when some data had been read, you would have no idea how much data had been read. Therefore read returns 0 only when no data is read because of an end-of-file condition.
Therefore in this case, where there is only a \n available, read() will succeed and return 1. The next read will return a zero to indicate end of file.

The read() function unless it finds a EOF keeps reading characters and places it on the buffer. here in this case \n is also considered as a character. hence it reads that also. Your code would have closed after it read the \n as there was nothing else other than EOF . So only EOF is the delimiter for the read() and every other character is considered normal. Cheers!

Dynamically allocate user inputted string

I am trying to write a function that does the following things:
Start an input loop, printing '> ' each iteration.
Take whatever the user enters (unknown length) and read it into a character array, dynamically allocating the size of the array if necessary. The user-entered line will end at a newline character.
Add a null byte, '\0', to the end of the character array.
Loop terminates when the user enters a blank line: '\n'
This is what I've currently written:
void input_loop(){
char *str = NULL;
printf("> ");
while(printf("> ") && scanf("%a[^\n]%*c",&input) == 1){
/*Add null byte to the end of str*/
/*Do stuff to input, including traversing until the null byte is reached*/
free(str);
str = NULL;
}
free(str);
str = NULL;
}
Now, I'm not too sure how to go about adding the null byte to the end of the string. I was thinking something like this:
last_index = strlen(str);
str[last_index] = '\0';
But I'm not too sure if that would work though. I can't test if it would work because I'm encountering this error when I try to compile my code:
warning: ISO C does not support the 'a' scanf flag [-Wformat=]
So what can I do to make my code work?
EDIT: changing scanf("%a[^\n]%*c",&input) == 1 to scanf("%as[^\n]%*c",&input) == 1 gives me the same error.

First of all, scanf format strings do not use regular expressions, so I don't think something close to what you want will work. As for the error you get, according to my trusty manual, the %a conversion flag is for floating point numbers, but it only works on C99 (and your compiler is probably configured for C90)
But then you have a bigger problem. scanf expects that you pass it a previously allocated empty buffer for it to fill in with the read input. It does not malloc the sctring for you so your attempts at initializing str to NULL and the corresponding frees will not work with scanf.
The simplest thing you can do is to give up on n arbritrary length strings. Create a large buffer and forbid inputs that are longer than that.
You can then use the fgets function to populate your buffer. To check if it managed to read the full line, check if your string ends with a "\n".
char str[256+1];
while(true){
printf("> ");
if(!fgets(str, sizeof str, stdin)){
//error or end of file
break;
}
size_t len = strlen(str);
if(len + 1 == sizeof str){
//user typed something too long
exit(1);
}
printf("user typed %s", str);
}
Another alternative is you can use a nonstandard library function. For example, in Linux there is the getline function that reads a full line of input using malloc behind the scenes.

No error checking, don't forget to free the pointer when you're done with it. If you use this code to read enormous lines, you deserve all the pain it will bring you.
#include <stdio.h>
#include <stdlib.h>
char *readInfiniteString() {
int l = 256;
char *buf = malloc(l);
int p = 0;
char ch;
ch = getchar();
while(ch != '\n') {
buf[p++] = ch;
if (p == l) {
l += 256;
buf = realloc(buf, l);
}
ch = getchar();
}
buf[p] = '\0';
return buf;
}
int main(int argc, char *argv[]) {
printf("> ");
char *buf = readInfiniteString();
printf("%s\n", buf);
free(buf);
}

If you are on a POSIX system such as Linux, you should have access to getline. It can be made to behave like fgets, but if you start with a null pointer and a zero length, it will take care of memory allocation for you.
You can use in in a loop like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for strcmp
int main(void)
{
char *line = NULL;
size_t nline = 0;
for (;;) {
ptrdiff_t n;
printf("> ");
// read line, allocating as necessary
n = getline(&line, &nline, stdin);
if (n < 0) break;
// remove trailing newline
if (n && line[n - 1] == '\n') line[n - 1] = '\0';
// do stuff
printf("'%s'\n", line);
if (strcmp("quit", line) == 0) break;
}
free(line);
printf("\nBye\n");
return 0;
}
The passed pointer and the length value must be consistent, so that getline can reallocate memory as required. (That means that you shouldn't change nline or the pointer line in the loop.) If the line fits, the same buffer is used in each pass through the loop, so that you have to free the line string only once, when you're done reading.

Some have mentioned that scanf is probably unsuitable for this purpose. I wouldn't suggest using fgets, either. Though it is slightly more suitable, there are problems that seem difficult to avoid, at least at first. Few C programmers manage to use fgets right the first time without reading the fgets manual in full. The parts most people manage to neglect entirely are:
what happens when the line is too large, and
what happens when EOF or an error is encountered.
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer...
I don't feel I need to stress the importance of checking the return value too much, so I won't mention it again. Suffice to say, if your program doesn't check the return value your program won't know when EOF or an error occurs; your program will probably be caught in an infinite loop.
When no '\n' is present, the remaining bytes of the line are yet to have been read. Thus, fgets will always parse the line at least once, internally. When you introduce extra logic, to check for a '\n', to that, you're parsing the data a second time.
This allows you to realloc the storage and call fgets again if you want to dynamically resize the storage, or discard the remainder of the line (warning the user of the truncation is a good idea), perhaps using something like fscanf(file, "%*[^\n]");.
hugomg mentioned using multiplication in the dynamic resize code to avoid quadratic runtime problems. Along this line, it would be a good idea to avoid parsing the same data over and over each iteration (thus introducing further quadratic runtime problems). This can be achieved by storing the number of bytes you've read (and parsed) somewhere. For example:
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL, *temp;
do {
size_t alloc_size = bytes_read * 2 + 1;
temp = realloc(bytes, alloc_size);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
temp = fgets(bytes + bytes_read, alloc_size - bytes_read, f); /* Parsing data the first time */
bytes_read += strcspn(bytes + bytes_read, "\n"); /* Parsing data the second time */
} while (temp && bytes[bytes_read] != '\n');
bytes[bytes_read] = '\0';
return bytes;
}
Those who do manage to read the manual and come up with something correct (like this) may soon realise the complexity of an fgets solution is at least twice as poor as the same solution using fgetc. We can avoid parsing data the second time by using fgetc, so using fgetc might seem most appropriate. Alas most C programmers also manage to use fgetc incorrectly when neglecting the fgetc manual.
The most important detail is to realise that fgetc returns an int, not a char. It may return typically one of 256 distinct values, between 0 and UCHAR_MAX (inclusive). It may otherwise return EOF, meaning there are typically 257 distinct values that fgetc (or consequently, getchar) may return. Trying to store those values into a char or unsigned char results in loss of information, specifically the error modes. (Of course, this typical value of 257 will change if CHAR_BIT is greater than 8, and consequently UCHAR_MAX is greater than 255)
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL;
do {
if ((bytes_read & (bytes_read + 1)) == 0) {
void *temp = realloc(bytes, bytes_read * 2 + 1);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
}
int c = fgetc(f);
bytes[bytes_read] = c >= 0 && c != '\n'
? c
: '\0';
} while (bytes[bytes_read++]);
return bytes;
}

Return value of fgets()

I have just recently started working with I/O in C. Here is my question - I have a file, from which I read my input. Then I use fgets() to get strings in a buffer which I utilise in some way. Now, what happens if the input is too short for the buffer i.e. if the first read by fgets() reaches EOF. Should fgets() return NULL(as I have read in fgets() documentation)? It seems that it doesn't and I get my input properly. Besides even my feof(input) does not say that we have reached EOF. Here is my code snippet.
char buf[BUFSIZ];
FILE *input,
*output;
input = fopen(argv[--argc], "r");
output = fopen(argv[--argc], "w");
/**
* If either of the input or output were unable to be opened
* we exit
*/
if (input == NULL) {
fprintf(stdout, "Failed to open file - %s.\n", argv[argc + 1]);
exit(EXIT_FAILURE);
}
if (output == NULL) {
fprintf(stdout, "Failed to open file - %s.\n", argv[argc + 0]);
exit(EXIT_FAILURE);
}
if (fgets(buf, sizeof(buf), input) != NULL) {
....
}
/**
* After the fgets() condition exits it is because, either -
* 1) The EOF was reached.
* 2) There is a read error.
*/
if (feof(input)) {
fprintf(stdout, "Reached EOF.\n");
}
else if (ferror(input)) {
fprintf(stdout, "Error while reading the file.\n");
}

The documentation for fgets() does not say what you think it does:
From my manpage
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading
stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored
after the last character in the buffer.
And later
gets() and fgets() return s on success, and NULL on error or when end of file occurs while no characters have been read.
I don't read that as saying an EOF will be treated as an error condition and return NULL. Indeed it says a NULL would only occur where EOF occurs when no characters have been read.
The POSIX standard (which defers to the less accessible C standard) is here: http://pubs.opengroup.org/onlinepubs/009695399/functions/fgets.html and states:
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer, and shall set errno to indicate the error.
This clearly indicates it's only going to return a NULL if it's actually at EOF when called, i.e. if any bytes are read, it won't return NULL.

Extra EOF character

I have a program that reads a file into a buffer structure. The problem I'm having is that when I look at the output of the file, there's an extra EOF character at the end. Ill post the related functions:(NOTE: I removed parameter checks and only posted code in the function related to the issue)
b_load
int b_load(FILE * const fi, Buffer * const pBD){
unsigned char character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars = 0; /*Counter of the amount of characters read into the buffer*/
/*Assigns main Buffer to tempBuffer*/
tempBuffer = pBD;
/*Infinite loop that breaks after EOF is read*/
while(1){
/*calls fgetc() and returns the char into the character variable*/
character = (unsigned char)fgetc(fi);
if(!feof(fi)){
tempBuffer = b_addc(pBD,character);
if(tempBuffer == NULL)
return LOAD_FAIL;
++num_chars;
}else{
break;
}
}
return num_chars;
}
b_print
int b_print(Buffer * const pBD){
int num_chars = 0;
if(pBD->addc_offset == 0)
printf("The buffer is empty\n");
/*Sets getc_offset to 0*/
b_set_getc_offset(pBD, 0);
pBD->eob=0;
/*b_eob returns the structures eob field*/
while (!b_eob(pBD)){
printf("%c",b_getc(pBD));
++num_chars;
}
printf("\n");
return num_chars;
}
b_getc
char b_getc(Buffer * const pBD){
if(pBD->getc_offset == pBD->addc_offset){
pBD->eob = 1;
return R_FAIL_1;
}
pBD->eob = 0;
return pBD->ca_head[(pBD->getc_offset)++];
}
at the end I end up with:
"a catÿ"
(the y is the EOF character)
It prints an EOF character but is never added to the buffer. When the driver code adds an EOF character to the end of the buffer, 2 appear. Any idea what is causing this? I might be using feof() wrong so that may be it, but it is required in the code

There is no "EOF character". EOF is a value returned by getchar() and related functions to indicate that they have no more input to read. It's a macro that expands to a negative integer constant expression, typically (-1).
(For Windows text files, an end-of-file condition may be triggered by a Control-Z character in a file. If you read such a file in text mode, you won't see that character; it will just act like it reached the end of the file at that point.)
Don't use the feof() function to detect that there's no more input to read. Instead, look at the value returned by whatever input function you're using. Different input functions use different ways to indicate that they weren't able to read anything; read the documentation for whichever one you're using. For example, fgets() returns a null pointer, getchar() returns EOF, and scanf() returns the number of items it was able to read.
getchar(), for example, returns either the character it just read (treated as an unsigned char and converted to int) or the value EOF to indicate that it wasn't able to read anything. The negative value of EOF is chosen specifically to avoid colliding with any valid value of type unsigned char. Which means you need to store the value returned by getchar() in an int object; if you store it in a char or unsigned char instead, you can lose information, and an actual character with the value 0xff can be mistaken for EOF.
The feof() function returns the value of the end-of-file indicator for the file you're reading from. That indicator becomes true after you've tried and failed to read from the file. And if you ran out of input because of an error, rather than because of an end-of-file condition, feof() will never become true.
You can use feof() and/or ferror() to determine why there was no more input to be read, but only after you've detected it by other means.
Recommended reading: Section 12 of the comp.lang.c FAQ, which covers stdio. (And the rest of it.)
UPDATE :
I haven't seen enough of your code to understand what you're doing with the Buffer objects. Your input look actually looks (almost) correct, though it's written in a clumsy way.
The usual idiom for reading characters from a file is:
int c; /* `int`, NOT `char` or `unsigned char` */
while ((c = fgetc(fi)) != EOF) {
/* process character in `c` */
}
But your approach, which I might rearrange like this:
while (1) {
c = fgetc(fi);
if (feof(fi) || ferror(fi)) {
/* no more input */
break;
}
/* process character in c */
}
should actually work. Note that I've added a check for ferror(f1). Could it be that you have an error on input (which you're not detecting)? That would cause c to contain EOF, or the value of EOF converted to the type of c. That's doubtful, though, since it would probably give you an infinite loop.
Suggested approach: Using either an interactive debugger or added printf calls, show the value of character every time through the loop. If your input loop is working correctly, then build a stripped-down version of your program with a hard-wired sequence of calls to b_addc(), and see if you can reproduce the problem that way.

There you go ...
int b_load(FILE * const fi, Buffer * const pBD){
int character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars ; /*Counter of the amount of characters read into the buffer*/
/*Infinite loop that breaks WHEN EOF is read*/
while(num_chars = 0; 1; num_chars++ ) {
character = fgetc(fi);
if (character == EOF || feof(fi)) break; // since you insist on the silly feof() ...
tempBuffer = b_addc(pBD, (unsigned char) character);
if(tempBuffer == NULL) return LOAD_FAIL;
}
}
return num_chars;
}

Reading from stdin and storing \n and whitespace

I've been trying to use scanf to get input from stdin but it truncates the string after seeing whitespace or after hitting return.
What I'm trying to get is a way to read keyboard input that stores in the buffer linebreaks as well as whitespace. And ending when ctrl-D is pressed.
Should I try using fgets? I figured that wouldn't be optimal either since fgets returns after reading in a \n

There is no ready-made function to read everyting from stdin, but creating your own is fortunately easy. Untested code snippet, with some explanation in comments, which can read arbitrarily large amount of chars from stdin:
size_t size = 0; // how many chars have actually been read
size_t reserved = 10; // how much space is allocated
char *buf = malloc(reserved);
int ch;
if (buf == NULL) exit(1); // out of memory
// read one char at a time from stdin, until EOF.
// let stdio to handle input buffering
while ( (ch = getchar()) != EOF) {
buf[size] = (char)ch;
++size;
// make buffer larger if needed, must have room for '\0' below!
// size is always doubled,
// so reallocation is going to happen limited number of times
if (size == reserved) {
reserved *= 2;
buf = realloc(buf, reserved);
if (buf == NULL) exit(1); // out of memory
}
}
// add terminating NUL character to end the string,
// maybe useless with binary data but won't hurt there either
buf[size] = 0;
// now buf contains size chars, everything from stdin until eof,
// optionally shrink the allocation to contain just input and '\0'
buf = realloc(buf, size+1);

scanf() splits the input at whitespace boundaries, so it's not suitable in your case. Indeed fgets() is the better choice. What you need to do is keep reading after fgets() returns; each call will read a line of input. You can keep reading until fgets() returns NULL, which means that nothing more can be read.
You can also use fgetc() instead if you prefer getting input character by character. It will return EOF when nothing more can be read.

If you want to read all input, regardless of whether it is whitespace or not, try fread.

Read like this
char ch,line[20];
int i=0; //define a counter
//read a character assign it to ch,
//check whether the character is End of file or not and
//also check counter value to avoid overflow.
while((ch=getchar())!=EOF && i < 19 )
{
line[i]=ch;
i++;
}
line[i]='\0';

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

GNU-getline: strange behaviour about EOF - c

Related

The strange behavior of 'read' system function

Dynamically allocate user inputted string

Return value of fgets()

Extra EOF character

Reading from stdin and storing \n and whitespace

Categories

Resources