reading string from a file via stdin in c11

reading string from a file via stdin in c11 - c

So I have a .txt file that I want to read via stdin in c11 program using scanf().
The file is essentially many lines made of one single string.
example:
hello
how
are
you
How can I know when the file is finished, I tried comparing a string with a string made only with eof character but the code loops in error.
Any advice is much appreciated.

Linux manual says (RETURN section):
RETURN VALUE
On success, these functions return the number of input items
successfully matched and assigned; this can be fewer than
provided for, or even zero, in the event of an early matching
failure.
The value EOF is returned if the end of input is reached before
either the first successful conversion or a matching failure
occurs. EOF is also returned if a read error occurs, in which
case the error indicator for the stream (see ferror(3)) is set,
and errno is set to indicate the error.
So test if the return value of scanf equals to EOF.

You can read the file redirected from standard input using scanf(), one word at time, testing for successful conversion, until no more words can be read from stdin.
Here is a simple example:
#include <stdio.h>
int main() {
char word[40];
int n = 0;
while (scanf("%39s", word) == 1) {
printf("%d: %s\n", ++n, word);
}
return 0;
}
Note that you must tell scanf() the maximum number of characters to store into the destination array before the null pointer. Otherwise, any longer word present in the input stream will cause undefined behavior, a flaw attackers can try and exploit using specially crafted input.

Related

How do you differentiate between the end of file and an error using fscanf() in C?

According to manual, int fscanf(FILE *stream, const char *format, ...) returns the number of input items successfully matched and assigned, which can be fewer than provided for, or even zero in the event of an early matching failure. How shall I differentiate between:
zero matched and assigned items
end of the file
an empty file?

See the 'return values' section of the POSIX specification for fscanf(), for example.
Zero matched and assigned items is reported by fscanf() returning 0.
End of file is reported by fscanf() returning EOF.
An empty file is reported by the first call to fscanf() returning EOF.
Note that the prescription makes it difficult to spot the difference between an empty file (no bytes) and a file with no useful data in it. Consider this test program (I called it eof59.c, compiled to create eof59):
#include <stdio.h>
int main(void)
{
char buffer[256];
int rc;
while ((rc = scanf("%255s", buffer)) != EOF)
printf("%d: %s\n", rc, buffer);
}
If you run it with /dev/null as input (the ultimate empty file), it says nothing. However, it also says nothing if you feed it a file with a single blank (and no newline), or just a newline, or indeed any sequence of white space. You can also experiment replace the format string with " x %255s". Feeding it a file with just an x (possibly with white space around it) generates no output. Feed it a file with a y as the first character (other than white space) and the program runs for a long time reporting 0: on each line of output.
Note that while (!feof(file)) is always wrong, but after a function such as scanf() has returned EOF, you can legitimately use feof() and ferror() to disambiguate between genuine EOF and an error on the file stream (such as a disk crashing or …).
if (feof(stdin))
printf("EOF on standard input\n");
if (ferror(stdin))
printf("Error on standard input\n");
With the code shown, you should normally see 'EOF on standard input'; it would probably be quite hard to generate (even simulate) an error on standard input. You should not see both messages, and you should always see one of the messages.

Does feof() work when called after reading in last line?

I just read in a string using the following statement:
fgets(string, 100, file);
This string that was just read in was the last line. If I call feof() now will it return TRUE? Is it the same as calling feof() right at the start before reading in any lines?

No, don't use feof() to detect the end of the file. Instead check for a read failure, for example fgets() will return NULL if it attempts to read past the end of the file whereas feof() will return 0 until some function attempts to read past the end of the file, only after that it returns non-zero.

Does feof() work when called after reading in last line?
No.
feof() becomes true when reading past the end of data. Reading the last line may not be pass the end of data if the last line ended in '\n'.

The short answer is NO. Here is why:
If fgets successfully read the '\n' at the end of the line, the end-of-file indicator in the FILE structure has not been set. Hence feof() will return 0, just like it should before reading anything, even on an empty file.
feof() can only be used to distinguish between end-of-file and read-error conditions after an input operation failed. Similarly, ferr() can be used to check for read-error after an input operation failed.
Programmers usually ignore the difference between end-of-file and read-error. Hence they only rely on checking if the input operation succeeded or failed. Thus they never use feof(), and so should you.
The behavior is somewhat similar as that of errno: errno is set by some library functions in case of error or failure. It is not reset to 0 upon success. Checking errno after a function call is only meaningful if the operation failed and if errno was cleared to 0 before the function call.
If you want to check if you indeed reached to the of file, you need to try and read extra input. For example you can use this function:
int is_at_end_of_file(FILE *f) {
int c = getc(file);
if (c == EOF) {
return 1;
} else {
ungetc(c, file);
return 0;
}
}
But reading extra input might not be worthwhile if reading from the console: it will require for the user to type extra input that will be kept in the input stream. If reading from a pipe or a device, the side effect might be even more problematic. Alas, there is no portable way to test if a FILE stream is associated with an actual file.

Confused with making an input into an empty array.

Say I make an input :
"Hello world" // hit a new line
"Goodbye world" // second input
How could I scan through the two lines and input them separately in two different arrays. I believe I need to use getchar until it hits a '\n'. But how do I scan for the second input.
Thanks in advance. I am a beginner in C so please It'd be helpful to do it without pointers as I haven't covered that topic.

Try this code out :
#include<stdio.h>
int main(void)
{
int flx=0,fly=0;
char a,b[10][100];
while(1)
{
a=getchar();
if(a==EOF) exit(0);
else if(a=='\n')
{
flx++;
fly=0;
}
else
{
b[flx][fly++]=a;
}
}
}
Here I use a two dimensional array to store the strings.I read the input character by character.First i create an infinite loop which continues reading characters.If the user enters the end of File character the input stops. If there is a newline character then flx variable is incremented and the next characters are stored in the next array position.You can refer to the strings stored with b[n] where n is the index.

The function that you should probably look at is fgets. At least on my system, the definition is as follows:
char *fgets(char * restrict str, int size, FILE * restrict stream);
So a very simple program to read input from the keyboard would run something like this:
#include <stdio.h>
#include <stdlib.h>
#define MAXSTRINGSIZE 128
int main(void)
{
char array[2][MAXSTRINGSIZE];
int i;
void *result;
for (i = 0; i < 2; i++)
{
printf("Input String %d: ", i);
result = fgets(&array[i][0], MAXSTRINGSIZE, stdin);
if (result == NULL) exit(1);
}
printf("String 1: %s\nString 2: %s\n", &array[0][0], &array[1][0]);
exit(0);
}
That compiles and runs correctly on my system. The only issue with fgets though is that is retains the newline character \n in the string. So if you don't want that, you will need to remove it. As for the *FILE parameter, stdin is a predefined *FILE structure that indicates standard input, or file descriptor 0. There are also stdout for standard output (file descriptor 1) and a stderr for error messages and diagnostics (file descriptor 2). The file descriptor numbers correspond to the ones used in a shell like so:
$$$-> cat somefile > someotherfile 2>&1
What that does is take outfile of file descriptor 2 and redirect it to 1 with 1 in turn being redirected to a file. In addition, I am using the & operator because we are addressing parts of an array, and the functions in question (fgets, printf) require pointers. As for the result, the man page for gets and fgets states the following:
RETURN VALUES
Upon successful completion, fgets() and gets() return a pointer to the string. If end-of-file occurs before any characters are read,
they return NULL and the buffer contents remain unchanged. If an
error occurs, they return NULL and the buffer contents are
indeterminate. The fgets() and gets() functions do not distinguish
between end-of-file and error, and callers must use feof(3) and
ferror(3) to determine which occurred.
So to make your code more robust, if you get a NULL result, you need to check for errors using ferror or end of file using feof and respond approperiately. Furthermore, never EVER use gets. The only way that you can use it securely is that you have to have the ability to see into the future, which clearly nobody can do so it cannot be used securely. It will just open you up for a buffer overflow attack.

what does fscanf being == 1 do

There is more to this code obviously but I am just curious as to what this line of code actually does. I know the while loop and such but am new to the fscanf()
while (fscanf(input_file, "%s", curr_word) == 1)

fscanf() returns the number of input items successfully scanned and stored.
as per the man page
Return Value
These functions return the number of input items successfully matched and assigned, which can be fewer than provided for, or even zero in the event of an early matching failure.
In your case
while (fscanf(input_file, "%s", curr_word) == 1)
fsaacf() will return a value of 1 if it is able to successfully scan a string (as per the %s format specifier) from input_file and put it into curr_word.

fscanf(input_file, "%s", curr_word) reads the input stream input_file and stores the next sequence of non spacing characters into the array pointed to by cuur_word and appends a '\0' byte. As you can see, the size of this array is not passed to fscanf. This is a classical case of potential buffer overflow, a security flaw that can be exploited by a hacker by storing appropriate contents in the input stream.
After gets, the scanf family of library functions is the best source of buffer overflow bugs one can find.
It is very difficult to use fscanf correctly. Most C programmers should avoid it.

Little trouble with fgets and error handling

i have recently started to learn C and have encountered a little problem with a code snipped.
I want want wo read a character string, 20 chars long, over the stdin, so I have chosen fgets.
Here is my code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
unsigned int length = 20;
char input_buffer[length];
fgets(input_buffer, length, stdin);
if(input_buffer == NULL)
printf("Error, input too long");
else
printf("%s", input_buffer);
return 0;
}
It compiles without any error, if I enter a sentence shorter than 20 characters, everything is fine. But when I try to test the error handling, it fails.
The output is:
peter#antartica ~/test % ./test
Hello world, this is a very long test
Hello world, this i%
Am I doing anything wrong? I thought fgets returns a NULL pointer if it fails(which should be the case here)?
Thank you for any help.

From fgets reference:
On success, the function returns str.
If the end-of-file is
encountered while attempting to read a character, the eof indicator is
set (feof). If this happens before any characters could be read, the
pointer returned is a null pointer (and the contents of str remain
unchanged).
If a read error occurs, the error indicator (ferror) is
set and a null pointer is also returned (but the contents pointed by
str may have changed).
So it may return NULL without generating an error in the event of reaching end-of-file. You may need to see ferror return value to be sure about it.
Please note fgets() will automatically append a null character (\0) to the end of the string, which may cause the string to be truncated, since you specify how many characters must be read. The call won't fail if a string with more than length characters is found.

On error, fgets() returns NULL, but the contents of the buffer are indeterminate.
You are checking the buffer, not the return value.
Try:
if ( fgets(input_buffer, length, stdin) == NULL )
{
printf("Error, input too long");
}
else
{
printf("%s", input_buffer);
}
Good advice: Always use {}, even for one-line blocks. It really helps avoiding errors and makes code better readable.
Edit: +1 to Mauren: A too-long input line is actually not considered an error, but silently truncated, so your whole concept won't work as intended. If in doubt, always check the docs for the function you're using. Try man fgets -- on the command line in a Unix-ish environment, or if in a pinch, in a web search engine. ;)