Using scanf in a while loop - c

Probably an extremely simple answer to this extremely simple question:
I'm reading "C Primer Plus" by Pratta and he keeps using the example
while (scanf("%d", &num) == 1)...
Is the == 1 really necessary? It seems like one could just write:
while (scanf("%d", &num))
It seems like the equality test is unnecessary since scanf returns the number of objects read and 1 would make the while loop true. Is the reason to make sure that the number of elements read is exactly 1 or is this totally superfluous?

In C, 0 is evaluated to false and everything else to true. Thus, if scanf returned EOF, which is a negative value, the loop would evaluate to true, which is not what you'd want.

Since scanf returns the value EOF (which is -1) on end of file, the loop as written is correct. It runs as long as the input contains text that matches %d, and stops either at the first non-match or end of file.
It would have been clearer at a glance if scanf were expecting more than one input....
while (scanf("%d %d", &x, &y)==2) { ... }
would exit the loop when the first time it was unable to match two values, either due to end of file end of file (scanf returns EOF (which is -1)) or on input matching error (e.g. the input xyzzy 42 does not match %d %d so scanf stops on the first failure and returns 0 without writing to either x or y) when it returns some value less than 2.
Of course, scanf is not your friend when parsing real input from normal humans. There are many pitfalls in its handling of error cases.
Edit: Corrected an error: scanf returns EOF on end of file, or a non-negative integer counting the number of variables it successfully set.
The key point is that since any non-zero value is TRUE in C, failing to test the return value correctly in a loop like this can easily lead to unexpected behavior. In particular, while(scanf(...)) is an infinite loop unless it encounters input text that cannot be converted according to its format.
And I cannot emphasize strongly enough that scanf is not your friend. A combination of fgets and sscanf might be enough for some simple parsing, but even then it is easily overwhelmed by edge cases and errors.

You understood the C code correctly.
Sometimes the reason for testing the number of items read is that someone wants to make sure that all items were read instead of scanf quitting early when it the input didn't match the expected type. In this particular case it didn't matter.
Usually scanf is a poor choice of functions because it doesn't meet the needs of interactive input from a human user. Usually a combination of fgets and sscanf yield better results. In this particular case it didn't matter.
If later chapters explain why some kinds of coding practices are better than this trivial example, good. But if not, you should dump the book you're reading.
On the other hand, your substitute code isn't exactly a substitute. If scanf returns -1 then your while loop will execute.

While you are correct it is not strictly necessary, some people prefer it for several reasons.
First, by comparing to 1 it becomes an explicit boolean value (true or false). Without the comparison, you are testing on an integer, which is valid in C, but not in later languages (like C#).
Secondly, some people would read the second version in terms of while([function]), instead of while([return value]), and be momentarily confused by testing a function, when what is clearly meant is testing the return value.
This can be completely a matter of personal preference, and as far as I'm concerned, both are valid.

One probably could write it without an explicit comparison (see the JRL's answer though), but why would one? I'd say that comparison-less conditions should only be used with values that have explicitly boolean semantics (like an isdigit() call, for example). Everything else should use an explicit comparison. In this case (the result of scanf) the semantics is pronouncedly non-boolean, so the explicit comparison is in order.
Also, the comparison one can usually omit is normally a comparison with zero. When you feel the urge to omit the comparison with something else (like 1 in this case) it is better to think twice and make sure you know what your are doing (see the JRL's answer again).
In any case, when the comparison can be safely omitted and you actually omit it, the actual semantical meaning of the condition remains the same. It has absolutely no impact on the efficiency of the resultant code, if that's something you are worrying about.

Related

ignoring return value (C Programming) [duplicate]

This question already has answers here:
Warning: ignoring return value of 'scanf', declared with attribute warn_unused_result
(11 answers)
Closed 4 years ago.
I just started learning C programming a few days back and I'm trying out some problems from the Kattis (open.kattis.com) website. I came up with this problem along the way where I don't really understand what it means.
//two stones
#include <stdio.h>
int main()
{
int n,x=2;
printf("The number of stones placed on the ground is ");
scanf("%d",&n);
if (n%x != 0)
{
printf("The winner of the game is Alice! \n");
}
else
{
printf("The winner of the game is Bob! \n");
}
return 0;
}
This appeared >>
warning: ignoring return value of scanf, declared with attribute warn_unused_result [-Wunused-result]
regarding scanf("%d",&n);
Can anyone explain what's wrong with this and how to rectify this problem? Thanks.
scanf has a return value that indicates success:
C Standard; §7.19.6.4.3:
The scanf function returns the value of the macro EOF if an input failure occurs before
any conversion. Otherwise, the scanf function returns the number of input items
assigned, which can be fewer than provided for, or even zero, in the event of an early
matching failure.
If you have a format string in your call to scanf that has one format specifier, then you can check that scanf succeeded in receiving an input of that type from the stdin by comparing its return value to 1.
Your compiler is warning you about this not specifically because scanf returns a value, but because it's important to inspect the result of scanf. A standard-compliant implementation of printf, for example, will also return a value (§7.19.6.3.3), but it's not critical to the soundness of your program that you inspect it.
You are ignoring the return value of the scanf call.
That is what the compiler warns about and was told to treat as an error.
Please understand that there are many subtle mistakes possible to be done with scanf() and not caring about the success, which is indicated by the return value.
To hide the problem which the compiler kindly notifies you about, I recommend to first try the "obvious" straight forward approach
int IreallyWantToIgnoreTheImportantInfo;
/* ... */
IreallyWantToIgnoreTheImportantInfo = scanf("%d",&n);
This will however only move the problem somewhere else and the valid reason about ignoring the scanf() return value will then probably (or maybe "hopefully") turn into a "variable set but never used" warning.
The proper way to really solve the problem here is to USE the return value.
You could e.g. make a loop, which attempts reading user input (giving an explanation and removing unscanned attempts) until the return value indicates success.
That would probably make much better code, at least much more robust.
If you really really want to ignore, without instead ignoring a variable which contains the info, then try
(void) scanf("%d",&n); /* I really really do not care ... */
But, please take that as completly helpfuly as I mean it, that is not a good idea.
Can someone explain to me what's wrong with this and how to rectify this problem?
Many C functions return values to their callers. C does not require the caller to acknowledge or handle such return values, but usually, ignoring return values constitutes a program flaw. This is because ignoring the return value usually means one of these things is happening:
the function was called in order to obtain its return value, so failing to do anything with that value is an error in itself, or
the function was called primarily for its side effects, but its return value, which conveys information about the function's success in producing those side effects, was ignored. Occasionally the caller really doesn't care about the function's [degree of] success, but usually, ignoring the return value means the program is assuming complete success, such that it will malfunction if that was not actually achieved.
scanf() is ordinarily called for its side effects: reading formatted data from the standard input and recording it in the specified objects. Its return value indicates how many of the given input fields were successfully processed, and if the end of the stream or an I/O error was encountered before parsing any fields, then the return value indicates that, too, via a special return value.
If you do not verify that scanf read all the fields you expected it to do, then you do not know whether it gave you any data to work with, nor can you be confident about the state of the input. For example, suppose that when you run your program, you enter "x" instead of a number. What do you think it will do?
You appear to be using GCC and GLIBC. These are instrumented to produce warnings by default when the return values of certain functions, including scanf, are ignored. This catches many of the more common cases of such flaws. To avoid such warnings, check the return value (appropriately). For example,
if (scanf("%d",&n) != 1) {
fputs("Invalid input -- aborting.\n", stderr);
exit(1);
}
What happen is that your compiler is configured to warn you if you don't check the value returned by scanf.
You have many way to "fix" this :
you can configure your compiler to stop warning you. This is usually a bad idea, but since you're still learning C, it may be counterproductive to focus yourself on the error checking at this step.
You can put the result of scanf in a variable .... and do nothing with it. It will probably fool the compiler. Same as previous, not a good idea ...
You can actually do the error check of scanf. It will be probably confusing since you're learning C, but at last it will be a very good habit to have : each time you call a function that may fail, check if it succeed or failed. To do that, you will have to read the scanf manual : Don't try to read it all, you will probably have an headache before the end. Juste read the "Return Value" section, it's enougth.
Good luck !
What your compiler is warning you about is this:
You are reading input (which you don't control) with scanf(). You tell scanf() that you expect an integer "%d", and scanf() is supposed to place the result of the conversion into the variable you supplied with &n.
Now, what happens when your user does not input an integer, but says "evil message" instead? Well, scanf() cannot convert that into an integer. Accordingly, your variable n will not be initialized. And the only way for your program to realize that something went wrong, is to check what scanf() returns. If it returns that it has made 1 successful conversions, everything's ok. If it returns some other value, you know that some garbage was input into your program. (Usually you would want to bail out with a descriptive error message in case of an error, but details depend on the context)
Failures to handle input errors are among the easiest to exploit security vulnerabilities, and the developers of your compiler know this. Thus, they think that it's generally a really bad idea to ignore the return value of scanf() as it's the only way to catch input errors with scanf(). And they conveniently tell you about this. Try to follow their advise, and make sure that you actually handle the errors that may occur, or prove that they are safe to ignore.

Solve ÿ end of file in C

When I copy the content of a file to another in C at the end of the output file I have this character ÿ. I understand thanks to this forum that it is the EOF indicator but I don't understand what to do in order to get rid of it in the output file.
This is my code:
second_file = fopen(argv[2], "w+");
while (curr_char != EOF)
{
curr_char = fgetc(original_file);
fputc(curr_char, second_file);
}
printf("Your file has been successfully copy\n");
fclose(second_file);
fclose(original_file);
For each character you read, you have two things to do:
Check to see if it's EOF.
If not, write it to the output.
Your problem is you're doing these two things in the wrong order.
There are potentially several different ways of solving this. Which one you pick depends on how much you care about your program looking good, as opposed to merely working.
One. Starting with the code you wrote, we could change it to:
while (curr_char != EOF)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
Here, we explicitly test to see if the character is EOF, immediately after reading it, before writing it. If it's EOF, we break out of the loop early. This will work, but it's ugly: there are two different places where we test for EOF, and one of them never "fires". (Also, as a commentator reminded me, there's the problem that the first time through the loop, we're testing curr_char before we've ever set it.)
Two. You could rearrange it like this:
curr_char = getc(original_file);
while (curr_char != EOF)
{
putc(curr_char, second_file);
curr_char = getc(original_file);
}
Here, we read an initial character, and as long as it's not EOF, we write it and read another. This will work just fine, but but it's still a little bit ugly, because this time there are two different places where we read the character.
Three. You could rearrange it like this:
while ((curr_char = getc(original_file)) != EOF)
{
putc(curr_char, second_file);
}
This is the conventional way of writing a character-copying loop in C. The call to getc and the assignment to curr_char are buried inside of the controlling expression of the while loop. It depends on the fact that in C, an assignment expression has a value just like any other expression. That is, the value of the expression a = b is whatever value we just assigned to a (that is, b's value). So the value of the expression curr_char = getc(original_file) is the character we just read. So when we say while ((curr_char = getc(original_file)) != EOF), what we're actually saying is, "Call getc, assign the result to curr_char, and if it's not equal to EOF, take another trip around the loop."
(If you're still having trouble seeing this, I've written other explanations in these notes and this writeup.)
This code is both good and bad. It's good because we've got exactly one place we read characters, one place we test characters, and one place we write characters. But it's a little bit bad because, let's admit it, it's somewhat cryptic at first. It's hard to think about that assignment-buried-inside-the-while-condition. It's code like this that gives C a reputation as being full of obscure gobbledegook.
But, at least in this case, it really is worth learning the idiom, and becoming comfortable with it, because the reductions to just one read and one test and one write really are virtues. It doesn't matter so much in a trivial case like this, but in real programs which are complicated for other reasons, if there's some key piece of functionality that happens in two different places, it's extremely easy to overlook this fact, and to make a change to one of them but forget to make it to the other.
(In fact, this happened to me just last week at work. I was trying to fix a bug in somebody else's code. I finally figured out that when the code did X, it was inadvertently clearing Y. I found the place where it did X, and I added some new code to properly recreate Y. But when I tested my fix, it didn't work! It turned out there were two separate places where the code did X, and I had found and fixed the wrong one.)
Finally, here's an equivalently minimal but unconventional way of writing the loop:
while (1)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
This is kind of like number 1, but it gets rid of the redundant condition in the while loop, and replaces it with the constant 1, which is "true" in C. This will work just fine, too, and it shares the virtue of having one read, one test, and one write. It actually ends up doing exactly the same operations and in exactly the same order as number 3, but by being laid out linearly it may be easier to follow.
The only problem with number 4 is that it's an unconventional, "break in the middle" loop. Personally, I don't have a problem with break-in-the-middle loops, and I find they come up from time to time, but if I wrote one and someone said "Steve, that's ugly, it's not an idiom anyone recognizes, it will confuse people", I'd have to agree.
P.S. I have replaced your calls to fgetc and fputc with the more conventional getc and putc. I'm not sure who told you to use fgetc and fputc, and there are obscure circumstances where you need them, but they're so rare that in my opinion one might as well forget that the "f" variants exist, and always use getc and putc.

Can we have while loop test two arguments at the same time without &&/||

I was checking Beej's guide to IPC and one line of code took my attention.
In the particular page, the while loop in speak.c has two conditions to check while (gets(s), !feof(stdin)).
So my question is how is this possible as I have seen while look testing only one condition most of the time.
PS: I am little new to these. Will be grateful for any help. Thanks!
The snippet
while (gets(s), !feof(stdin))
uses the comma operator, first it executes gets(s), then it tests !feof(stdin), which is the result of the condition.
By the way don't use gets, it's extremely unsafe. Be wary of sources using it, they probably aren't good sources for learning the language.
The code
while(gets(s), !feof(stdin)) {
/* loop body */
}
is equivalent to
gets(s);
while(!feof(stdin)) {
/* loop body */
gets(s);
}
just more concise as it avoids the repetition of gets before the loop and in the loop body.
A couple of people have already pointed out some of the problems with this. I certainly agree that using gets (at all) is a lousy idea.
I think it's worth mentioning one other detail though: since this uses feof(file) as the condition for exiting the loop, it can/will also misbehave if you encounter an error before the end of the file. When an error occurs, the error flag will be set but (normally) the EOF flag won't be -- and since you can't read from the file any more (due to the error) it never will be either, so this will go into an infinite loop.
The right way to do the job is with fgets, and check its return value:
while (fgets(s, length_of_s, stdin))
process(s);
This tests for fgets succeeding at reading from the file, so it'll exit the loop for either end of file or an error.
One other minor detail: when fgets reads a string, it normally retains the new-line at the end of the line (where gets throws it away). You'll probably have to add a little more code to strip it off is it's present (and possibly deal with a line longer than the buffer you allocated if the newline isn't present).
This test is using the comma operator and has been used as a way of getting the next line of text using gets(s) and testing for end-of-file using !feof(stdin).
This syntax doesn't evaluate two expression. It executes first the gets(s) and then evaluates !feof(stdin) which may be modified by the gets() function call.
It's not a very good way to do it since it both use gets(), which is not a safe function and it's quite uneasy to read for a beginner (hence your question).

How to use sscanf correctly and safely

First of all, other questions about usage of sscanf do not answer my question because the common answer is to not use sscanf at all and use fgets or getch instead, which is impossible in my case.
The problem is my C professor wants me to use scanf in a program. It's a requirement.
However the program also must handle all the incorrect input.
The program must read an array of integers. It doesn't matter in what format the integers
for the array are supplied. To make the task easier, the program might first read the size of the array and then the integers each in a new line.
The program must handle the inputs like these (and report errors appropriately):
999999999999999...9 (numbers larger than integer)
12a3 (don't read this as an integer 12)
a...z (strings)
11 aa 22 33\n all in one line (this might be handled by discarding everything after 11)
inputs larger than the input array
There might be more incorrect cases, these are the only few I could think of.
If the erroneous input is supplied, the program must ask the user to input again until
the correct input is given, but the previous correct input must be kept (only incorrect
input must be cleared from the input stream).
Everything must conform to C99 standard.
The scanf family of function cannot be used safely, especially when dealing with integers. The first case you mentioned is particularly troublesome. The standard says this:
If this object does not have an appropriate type, or if the result of
the conversion cannot be represented in the object, the behavior is
undefined.
Plain and simple. You might think of %5d tricks and such but you'll find they're not reliable. Or maybe someone will think of errno. The scanf functions aren't required to set errno.
Follow this fun little page: they end up ditching scanf altogether.
So go back to your C professor and ask them: how exactly does C99 mandate that sscanf will report errors ?
Well, let sscanf accept all inputs as %s (i.e. strings) and then program analyze them
If you must use scanf to accept the input, I think you start with something a bit like the following.
int array[MAX];
int i, n;
scanf("%d", &n);
for (i = 0; i < n && !feof(stdin); i++) {
scanf("%d", &array[i]);
}
This will handle (more or less) the free-format input problem since scanf will automatically skip leading whitespace when matching a %d format.
The key observation for many of the rest of your concerns is that scanf tells you how many format codes it parsed successfully. So,
int matches = scanf("%d", &array[i]);
if (matches == 0) {
/* no integer in the input stream */
}
I think this handles directly concerns (3) and (4)
By itself, this doesn't quite handle the case of the input12a3. The first time through the loop, scanf would parse '12as an integer 12, leaving the remaininga3` for the next loop. You would get an error the next time round, though. Is that good enough for your professor's purposes?
For integers larger than maxint, eg, "999999.......999", I'm not sure what you can do with straight scanf.
For inputs larger than the input array, this isn't a scanf problem per se. You just need to count how many integers you've parsed so far.
If you're allowed to use sscanf to decode strings after they've been extracted from the input stream by something like scanf("%s") you could also do something like this:
while (...) {
scanf("%s", buf);
/* use strtol or sscanf if you really have to */
}
This works for any sequence of white-space separated words, and lets you separate scanning the input for words, and then seeing if those words look like numbers or not. And, if you have to, you can use scanf variants for each part.
The problem is my C professor wants me to use scanf in a program.
It's a requirement.
However the program also must handle all the incorrect input.
This is an old question, so the OP is not in that professor's class any more (and hopefully the professor is retired), but for the record, this is a fundamentally misguided and basically impossible requirement.
Experience has shown that when it comes to interactive user input, scanf is suitable only for quick-and-dirty situations when the input can be assumed to correct.
If you want to read an integer (or a floating-point number, or a simple string) quickly and easily, then scanf is a nice tool for the job. However, its ability to gracefully handle incorrect input is basically nonexistent.
If you want to read input robustly, reliably detecting incorrect input, and perhaps warning the user and asking them to try again, scanf is simply not the right tool for the job. It's like trying to drive screws with a hammer.
See this answer for some guidelines for using scanf safely in those quick-and-dirty situations. See this question for suggestions on how to do robust input using something other than scanf.
scanf("%s", string) into long int_string = strtol(string, &end_pointer, base:10)

When/why is it a bad idea to use the fscanf() function?

In an answer there was an interesting statement: "It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure. I prefer to use fgets() to get each line in and then sscanf() that."
Could you expand upon when/why it might be better to use fgets() and sscanf() to read some file?
Imagine a file with three lines:
1
2b
c
Using fscanf() to read integers, the first line would read fine but on the second line fscanf() would leave you at the 'b', not sure what to do from there. You would need some mechanism to move past the garbage input to see the third line.
If you do a fgets() and sscanf(), you can guarantee that your file pointer moves a line at a time, which is a little easier to deal with. In general, you should still be looking at the whole string to report any odd characters in it.
I prefer the latter approach myself, although I wouldn't agree with the statement that "it's almost always a bad idea to use fscanf()"... fscanf() is perfectly fine for most things.
The case where this comes into play is when you match character literals. Suppose you have:
int n = fscanf(fp, "%d,%d", &i1, &i2);
Consider two possible inputs "323,A424" and "323A424".
In both cases fscanf() will return 1 and the next character read will be an 'A'. There is no way to determine if the comma was matched or not.
That being said, this only matters if finding the actual source of the error is important. In cases where knowing there is malformed input error is enough, fscanf() is actually superior to writing custom parsing code.
When fscanf() fails, due to an input failure or a matching failure, the file pointer (that is, the position in the file from which the next byte will be read) is left in a position other than where it would be had the fscanf() succeeded. This is typically undesirable in sequential file reads. Reading one line at a time results in the file input being predictable, while single line failures can be handled individually.
There are two reasons:
scanf() can leave stdin in a state that's difficult to predict; this makes error recovery difficult if not impossible (this is less of a problem with fscanf()); and
The entire scanf() family take pointers as arguments, but no length limit, so they can overrun a buffer and alter unrelated variables that happen to be after the buffer, causing seemingly random memory corruption errors that very difficult to understand, find, and debug, particularly for less experienced C programmers.
Novice C programmers are often confused about pointers and the “address-of” operator, and frequently omit the & where it's needed, or add it “for good measure” where it's not. This causes “random” segfaults that can be hard for them to find. This isn't scanf()'s fault, so I leave it off my list, but it is worth bearing in mind.
After 23 years, I still remember it being a huge pain when I started C programming and didn't know how to recognize and debug these kinds of errors, and (as someone who spent years teaching C to beginners) it's very hard to explain them to a novice who doesn't yet understand pointers and stack.
Anyone who recommends scanf() to a novice C programmer should be flogged mercilessly.
OK, maybe not mercilessly, but some kind of flogging is definitely in order ;o)
It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure. I prefer to use fgets() to get each line in and then sscanf() that.
You can always use ftell() to find out current position in file, and then decide what to do from there. Basicaly, if you know what you can expect then feel free to use fscanf().
Basically, there's no way to to tell that function not to go out of bounds for the memory area you've allocated for it.
A number of replacements have come out, like fnscanf, which is an attempt to fix those functions by specifying a maximum limit for the reader to write, thus allowing it to not overflow.

Resources