Can EOF be on the same line as \n - c

As the title says can EOF be on the same line as "\n"? I am currently working on a project recreating getline in C and I am on the part of implementing what happens at EOF and I was trying to figure out will I get the last line of a text file as:
"blahblahblahblah\n\0"
or
"blahblahblah\n
\0"

Can EOF be on the same line as \n
Yes, but rarely.
EOF is a constant usually used to signal 1) end-of-file or a 2) rare input error. It is not usually encoded as a character in a text stream.
A return value of EOF from fgetc() and friends can occur at any time - even in the middle of a line due to an error.
Otherwise, no:
A end-of-file does not occur at the end of a line with a '\n'. Typically the last line in a file that contains at least 1 character (and no `'\n') will also be a line.
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2

EOF is a signal to notify that there is no more input. There is no actual data in the file called EOF at the end. EOF is not same of null character \0.
I think you want to know whether you can get an EOF without getting a newline first. The answer is yes. It is possible if there is no newline at the end of file, i.e. when the last line finished without a newline at end. You can generate such a file easily by opening a text editor, typing some text and save, quit without hitting ENTER. That file won't have any newline at end.

Expanding on #taskinoor's and #chux's answers and after looking at your examples, I think you might be confusing EOF with the '\0'-terminating byte for strings.
EOF is a value that functions like getc return to tell the user that it was not able to read any more from the stream. That's why you often see this:
// fp is a FILE*, it may be stdin
// it may be a file on the disk or something
// else
int c;
while((c = getc(fp)) != EOF)
{
// doing something with the character
}
Note here that c is declare as int, because EOF is a signed int and that
is what getc returns. I previously made a statement here1 that isn't that correct, chux
pointed out my mistake and I'd like to quote his/her comment here instead:
User chux comment on my answer
EOF, as a negative constant, is often -1 and can fit in a char, when char a signed char.
It is not so much that it does not fit, it is that storing a EOF in a char can be indistinguishable
from storing say a character with the value of 255 in a char.
'\0' is on the other hand is the byte that is used to terminate a string in C. In
C a strings is nothing more than a sequence of characters that ends with that
byte. We use char arrays to store strings. So EOF and \0 are different
things and have even different values.
Footnote
1Note here that c is declare as int, because EOF is a signed int and
actually doesn't fit in a char, so it will never be part of a c-string.

Related

fgets doesn't read in at most one less than size characters from stream

I am learning fgets from the manpage. I did some tests on fgets to make sure I understand it. One of the tests I did results in behaviour contrary to what is specified in the man page. The man page says:
char *fgets(char s[restrict .size], int size, FILE *restrict stream);
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an EOF
or a newline. If a newline is read, it is stored into the buffer. A
terminating null byte ('\0') is stored after the last character in the
buffer.
But it doesn't "read in at most one less than size characters from stream". As demonstrated by the following program:
#include<stdio.h>
#include<stdlib.h>
int main(){
FILE *fp;
fp=fopen("sample", "r");
char *s=calloc(50, sizeof(char));
while(fgets(s,2,fp)!=NULL) printf("%s",s);
}
The sample file:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The output of the compiled binary:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The expected output according to the man page:
t
j
a
4
5
Is the man page wrong, or am I missing something?
Is the man page wrong or am i missing something?
I won't say that the man page is wrong but it could be more clear.
There are 3 things that may stop fgets from reading from the stream.
The buffer is full (i.e. only room left for the termination character)
A newline character was read from the stream
End-Of-File occured
The quoted man page only mentions two of those conditions clearly.
Reading stops after an EOF or a newline.
That is #2 and #3 are mentioned very explicit while #1 is (kind of) derived from
reads in at most one less than size characters from stream
Here is another description from https://man7.org/linux/man-pages/man3/fgets.3p.html
... read bytes from stream into the array pointed to by s until n-1 bytes are read, or a newline is read and transferred to s, or an end-of-file condition is encountered.
where the 3 cases are clearly mentioned.
But yes... you are missing something. Once the buffer gets full, the rest of the current line is not read and discarded. The rest will stay in the stream and be available for the next read. So nothing is lost. You just need more fgets calls to read all data.
As suggested in a number of comments (e.g. Fe2O3 and Lundin) you can see this if you change the print statement so that it includes a delimiter of some kind. For instance (from Lundin):
printf("|%s|",s);
This will make clear exactly what you got from the individual fgets calls.
In the provided quote there is writte clear
If a newline is read, it is stored into the buffer.
Where do you see that this call fgets(s,2,fp) reads the new line character for example when reading this line?
thiis is line no. 1
The line contains only one new line character at its end.
This call reads only one character after another that is character by character that is appended by the terminating zero character '\0'.
So the read strings look like
{ 't', '\0' }
{ 'h', '\0' },
{ 'i', '\0' }
// ...
{ '1', '\0' }
{ '\n', '\0' }
If you have a call of fgets like that
fgets(s,n,fp)
then at most n-1 characters are read from the input stream. One character is reserved for the terminating zero character '\0' to build a string.
From the C Standard (7.21.7.2 The fgets function)
2 The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array

Does the gets() function in C automatically add a NULL character at the end of the input string?

I am writing a simple program to convert a number(+ve,32-bit) from binary to decimal. Here's my code:
int main()
{
int n=0,i=0;
char binary[33];
gets(binary);
for (i = 0; i < 33, binary[i] != '\0'; i++)
n=n*2+binary[i]-'0';
printf("%d",n);
}
If I remove binary[i]!='\0', then it gives wrong answer due to garbage values but if I don't it gives the correct answer. My question is: does the gets function automatically add a '\0' (NULL) character at the end of the string or is this just a coincidence?
Yes it does, writing past the end of binary[33] if it needs to.
Never use gets; automatic buffer overrun.
See Why is the gets function so dangerous that it should not be used? for details.
When gets was last supported (though deprecated) by the C standard, it had the following description (§ 7.19.7.7, The gets function):
The gets function reads characters from the input stream pointed to by stdin, into the
array pointed to by s, until end-of-file is encountered or a new-line character is read.
Any new-line character is discarded, and a null character is written immediately after the last character read into the array.
This means that if the string read from stdin was exactly as long as, or longer than, the array pointed to by s, gets would still (try to) append the null character to the end of the string.
Even if you are on a compiler or C standard revision that supports gets, don't use it. fgets is much safer since it requires the size of the buffer being written to as a parameter, and will not write past its end. Another difference is that it will leave the newline in the buffer, unlike gets did.

if my scanf variable is a float and a user inputs a character how can i prompt them to input a number? assuming the scanf is inside a do while loop

i have tried to use k = getchar() but it doesn't work too;
here is my code
#include<stdio.h>
int main()
{
float height;
float k=0;
do
{
printf("please type a value..\n");
scanf("%f",&height);
k=height;
}while(k<0);// i assume letters and non positive numbers are below zero.
//so i want the loop to continue until one types a +ve float.
printf("%f",k);
return 0;
}
i want a if a user types letters or negative numbers or characters he/she should be prompted to type the value again until he types a positive number
Like Govind Parmar already suggested, it is better/easier to use fgets() to read a full line of input, rather than use scanf() et al. for human-interactive input.
The underlying reason is that the interactive standard input is line-buffered by default (and changing that is nontrivial). So, when the user starts typing their input, it is not immediately provided to your program; only when the user presses Enter.
If we do read each line of input using fgets(), we can then scan and convert it using sscanf(), which works much like scanf()/fscanf() do, except that sscanf() works on string input, rather than an input stream.
Here is a practical example:
#include <stdlib.h>
#include <stdio.h>
#define MAX_LINE_LEN 100
int main(void)
{
char buffer[MAX_LINE_LEN + 1];
char *line, dummy;
double value;
while (1) {
printf("Please type a number, or Q to exit:\n");
fflush(stdout);
line = fgets(buffer, sizeof buffer, stdin);
if (!line) {
printf("No more input; exiting.\n");
break;
}
if (sscanf(line, " %lf %c", &value, &dummy) == 1) {
printf("You typed %.6f\n", value);
continue;
}
if (line[0] == 'q' || line[0] == 'Q') {
printf("Thank you; now quitting.\n");
break;
}
printf("Sorry, I couldn't parse that.\n");
}
return EXIT_SUCCESS;
}
The fflush(stdout); is not necessary, but it does no harm either. It basically ensures that everything we have printf()'d or written to stdout, will be flushed to the file or device; in this case, that it will be displayed in the terminal. (It is not necessary here, because standard output is also line buffered by default, so the \n in the printf pattern, printing a newline, also causes the flush.
I do like to sprinkle those fflush() calls, wherever I need to remember that at this point, it is important for all output to be actually flushed to output, and not cached by the C library. In this case, we definitely want the prompt to be visible to the user before we start waiting for their input!
(But, again, because that printf("...\n"); before it ends with a newline, \n, and we haven't changed the standard output buffering, the fflush(stdout); is not needed there.)
The line = fgets(buffer, sizeof buffer, stdin); line contains several important details:
We defined the macro MAX_LINE_LEN earlier on, because fgets() can only read a line as long as the buffer it is given, and will return the rest of that line in following calls.
(You can check if the line read ended with a newline: if it does not, then either it was the final line in an input file that does not have a newline at the end of the last line, or the line was longer than the buffer you have, so you only received the initial part, with the rest of the line still waiting for you in the buffer.)
The +1 in char buffer[MAX_LINE_LEN + 1]; is because strings in C are terminated by a nul char, '\0', at end. So, if we have a buffer of 19 characters, it can hold a string with at most 18 characters.
Note that NUL, or nul with one ell, is the name of the ASCII character with code 0, '\0', and is the end-of-string marker character.
NULL (or sometimes nil), however, is a pointer to the zero address, and in C99 and later is the same as (void *)0. It is the sentinel and error value we use, when we want to set a pointer to a recognizable error/unused/nothing value, instead of pointing to actual data.
sizeof buffer is the number of chars, total (including the end-of-string nul char), used by the variable buffer.
In this case, we could have used MAX_LINE_LEN + 1 instead (the second parameter to fgets() being the number of characters in the buffer given to it, including the reservation for the end-of-string char).
The reason I used sizeof buffer here, is because it is so useful. (Do remember that if buffer was a pointer and not an array, it would evaluate to the size of a pointer; not the amount of data available where that pointer points to. If you use pointers, you will need to track the amount of memory available there yourself, usually in a separate variable. That is just how C works.)
And also because it is important that sizeof is not a function, but an operator: it does not evaluate its argument, it only considers the size (of the type) of the argument. This means that if you do something silly like sizeof (i++), you'll find that i is not incremented, and that it yields the exact same value as sizeof i. Again, this is because sizeof is an operator, not a function, and it just returns the size of its argument.
fgets() returns a pointer to the line it stored in the buffer, or NULL if an error occurred.
This is also why I named the pointer line, and the storage array buffer. They describe my intent as a programmer. (That is very important when writing comments, by the way: do not describe what the code does, because we can read the code; but do describe your intent as to what the code should do, because only the programmer knows that, but it is important to know that intent if one tries to understand, modify, or fix the code.)
The scanf() family of functions returns the number of successful conversions. To detect input where the proper numeric value was followed by garbage, say 1.0 x, I asked sscanf() to ignore any whitespace after the number (whitespace means tabs, spaces, and newlines; '\t', '\n', '\v', '\f', '\r', and ' ' for the default C locale using ASCII character set), and try to convert a single additional character, dummy.
Now, if the line does contain anything besides whitespace after the number, sscanf() will store the first character of that anything in dummy, and return 2. However, because I only want lines that only contain the number and no dummy characters, I expect a return value of 1.
To detect the q or Q (but only as the first character on the line), we simply examine the first character in line, line[0].
If we included <string.h>, we could use e.g. if (strchr(line, 'q') || strchr(line, 'Q')) to see if there is a q or Q anywhere in the line supplied. The strchr(string, char) returns a pointer to the first occurrence of char in string, or NULL if none; and all pointers but NULL are considered logically true. (That is, we could equivalently write if (strchr(line, 'q') != NULL || strchr(line, 'Q') != NULL).)
Another function we could use declared in <string.h> is strstr(). It works like strchr(), but the second parameter is a string. For example, (strstr(line, "exit")) is only true if line has exit in it somewhere. (It could be brexit or exitology, though; it is just a simple substring search.)
In a loop, continue skips the rest of the loop body, and starts the next iteration of the loop body from the beginning.
In a loop, break skips the rest of the loop body, and continues execution after the loop.
EXIT_SUCCESS and EXIT_FAILURE are the standard exit status codes <stdlib.h> defines. Most prefer using 0 for EXIT_SUCCESS (because that is what it is in most operating systems), but I think spelling the success/failure out like that makes it easier to read the code.
I wouldn't use scanf-family functions for reading from stdin in general.
fgets is better since it takes input as a string whose length you specify, avoiding buffer overflows, which you can later parse into the desired type (if any). For the case of float values, strtof works.
However, if the specification for your deliverable or homework assignment requires the use of scanf with %f as the format specifier, what you can do is check its return value, which will contain a count of the number of format specifiers in the format string that were successfully scanned:
§ 7.21.6.2:
The [scanf] function returns the value of the macro EOF if an input failure occurs
before the first conversion (if any) has completed. Otherwise, the function returns the
number of input items assigned, which can be fewer than provided for, or even zero, in
the event of an early matching failure.
From there, you can diagnose whether the input is valid or not. Also, when scanf fails, stdin is not cleared and subsequent calls to scanf (i.e. in a loop) will continue to see whatever is in there. This question has some information about dealing with that.

C programming language (scanf)

I have read strings with spaces in them using the following scanf() statement.
scanf("%[^\n]", &stringVariableName);
What is the meaning of the control string [^\n]?
Is is okay way to read strings with white space like this?
This mean "read anything until you find a '\n'"
This is OK, but would be better to do this "read anything until you find a '\n', or read more characters than my buffer support"
char stringVariableName[256] = {}
if (scanf("%255[^\n]", stringVariableName) == 1)
...
Edit: removed & from the argument, and check the result of scanf.
The format specifier "%[^\n]" instructs scanf() to read up to but not including the newline character. From the linked reference page:
matches a non-empty sequence of character from set of characters.
If the first character of the set is ^, then all characters not
in the set are matched. If the set begins with ] or ^] then the ]
character is also included into the set.
If the string is on a single line, fgets() is an alternative but the newline must be removed as fgets() writes it to the output buffer. fgets() also forces the programmer to specify the maximum number of characters that can be read into the buffer, making it less likely for a buffer overrun to occur:
char buffer[1024];
if (fgets(buffer, 1024, stdin))
{
/* Remove newline. */
char* nl = strrchr(buffer, '\n');
if (nl) *nl = '\0';
}
It is possible to specify the maximum number of characters to read via scanf():
scanf("%1023[^\n]", buffer);
but it is impossible to forget to do it for fgets() as the compiler will complain. Though, of course, the programmer could specify the wrong size but at least they are forced to consider it.
Technically, this can't be well defined.
Matches a nonempty sequence of characters from a set of expected
characters (the scanset).
If no l length modifier is present, the corresponding argument shall
be a pointer to the initial element of a character array large enough
to accept the sequence and a terminating null character, which will be
added automatically.
Supposing the declaration of stringVariableName looks like char stringVariableName[x];, then &stringVariableName is a char (*)[x];, not a char *. The type is wrong. The behaviour is undefined. It might work by coincidence, but anything that relies on coincidence doesn't work by my definition.
The only way to form a char * using &stringVariableName is if stringVariableName is a char! This implies that the character array is only large enough to accept a terminating null character. In the event where the user enters one or more characters before pressing enter, scanf would be writing beyond the end of the character array and invoking undefined behaviour. In the event where the user merely presses enter, the %[...] directive will fail and not even a '\0' will be written to your character array.
Now, with that all said and done, I'll assume you meant this: scanf("%[^\n]", stringVariableName); (note the omitted ampersand)
You really should be checking the return value!!
A %[ directive causes scanf to retrieve a sequence of characters consisting of those specified between the [ square brackets ]. A ^ at the beginning of the set indicates that the desired set contains all characters except for those between the brackets. Hence, %[^\n] tells scanf to read as many non-'\n' characters as it can, and store them into the array pointed to by the corresponding char *.
The '\n' will be left unread. This could cause problems. An empty field will result in a match failure. In this situation, it's possible that no data will be copied into your array (not even a terminating '\0' character). For this reason (and others), you really need to check the return value!
Which manual contains information about the return values of scanf? The scanf manual.
Other people have explained what %[^\n] means.
This is not an okay way to read strings. It is just as dangerous as the notoriously unsafe gets, and for the same reason: it has no idea how big the buffer at stringVariableName is.
The best way to read one full line from a file is getline, but not all C libraries have it. If you don't, you should use fgets, which knows how big the buffer is, and be aware that you might not get a complete line (if the line is too long for the buffer).
Reading from the man pages for scanf()...
[ Matches a non-empty sequence of characters from the
specified set of accepted characters; the next pointer must be a
pointer to char, and there must be enough room for all the characters
in the string, plus a terminating null byte. The usual skip of
leading white space is suppressed. The string is to be made up of
characters in (or not in) a particular set; the set is defined by the
characters between the open bracket [ character and a close bracket ]
character. The set excludes those characters if the first character
after the open bracket is a circumflex (^). To include a close
bracket in the set, make it the first character after the open bracket
or the circumflex; any other position will end the set. The hyphen
character - is also special; when placed between two other
characters, it adds all intervening characters to the set. To
include a hyphen, make it the last character before the final close
bracket. For instance, [^]0-9-] means the set "everything except
close bracket, zero through nine, and hyphen". The string ends with
the appearance of a character not in the (or, with a
circumflex, in) set or when the field width runs out.
In a nutshell, the [^\n] means that read everything from the string that is not a \n and store that in the matching pointer in the argument list.

Reading input from file in C

Okay so I have a file of input that I calculate the amount of words and characters in each line with success.
When I get to the end of the line using the code below it exits the loop and only reads in the first line. How do I move on to the next line of input to continue the program?
EDIT: I must parse each line separately so I cant use EOF
while( (c = getchar()) != '\n')
Change '\n' to EOF. You're reading until the end of the line when you want to read until the end of the file (EOF is a macro in stdio.h which corresponds to the character at the end of a file).
Disclaimer: I make no claims about the security of the method.
'\n' is the line feed (new line)-character, so the loop will terminate when the end of first line is reached. The end of the file is marked by an end-of-file (EOF)-characte. cstdio (or stdio.h), which contains the getchar()-function, has the EOF -constant defined, so just change the while-line to
while( (c = getchar()) != EOF)
From the man page: "reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error." EOF is a macro (often -1) for the return of this and related functions that indicates end of file. You want to check whether this is what you're getting back. Note that getc returns a signed int, but that valid values are unsigned chars cast to ints. What out if c is a signed char.
Well, the \n character is actually a combination of two characters, two bytes:
the 13th byte + the 10th byte. You could try something like,
int c2=getchar(),c1;
while(1)
{
c1=c2;
c2=getchar();
if(c1==EOF)
break;
if(c1==(char)13 && c2==(char)10)
break;
/*use c1 as the input character*/
}
this should test if two input characters make the proper couplet (13,10)

Resources