Solve ÿ end of file in C - c

When I copy the content of a file to another in C at the end of the output file I have this character ÿ. I understand thanks to this forum that it is the EOF indicator but I don't understand what to do in order to get rid of it in the output file.
This is my code:
second_file = fopen(argv[2], "w+");
while (curr_char != EOF)
{
curr_char = fgetc(original_file);
fputc(curr_char, second_file);
}
printf("Your file has been successfully copy\n");
fclose(second_file);
fclose(original_file);

For each character you read, you have two things to do:
Check to see if it's EOF.
If not, write it to the output.
Your problem is you're doing these two things in the wrong order.
There are potentially several different ways of solving this. Which one you pick depends on how much you care about your program looking good, as opposed to merely working.
One. Starting with the code you wrote, we could change it to:
while (curr_char != EOF)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
Here, we explicitly test to see if the character is EOF, immediately after reading it, before writing it. If it's EOF, we break out of the loop early. This will work, but it's ugly: there are two different places where we test for EOF, and one of them never "fires". (Also, as a commentator reminded me, there's the problem that the first time through the loop, we're testing curr_char before we've ever set it.)
Two. You could rearrange it like this:
curr_char = getc(original_file);
while (curr_char != EOF)
{
putc(curr_char, second_file);
curr_char = getc(original_file);
}
Here, we read an initial character, and as long as it's not EOF, we write it and read another. This will work just fine, but but it's still a little bit ugly, because this time there are two different places where we read the character.
Three. You could rearrange it like this:
while ((curr_char = getc(original_file)) != EOF)
{
putc(curr_char, second_file);
}
This is the conventional way of writing a character-copying loop in C. The call to getc and the assignment to curr_char are buried inside of the controlling expression of the while loop. It depends on the fact that in C, an assignment expression has a value just like any other expression. That is, the value of the expression a = b is whatever value we just assigned to a (that is, b's value). So the value of the expression curr_char = getc(original_file) is the character we just read. So when we say while ((curr_char = getc(original_file)) != EOF), what we're actually saying is, "Call getc, assign the result to curr_char, and if it's not equal to EOF, take another trip around the loop."
(If you're still having trouble seeing this, I've written other explanations in these notes and this writeup.)
This code is both good and bad. It's good because we've got exactly one place we read characters, one place we test characters, and one place we write characters. But it's a little bit bad because, let's admit it, it's somewhat cryptic at first. It's hard to think about that assignment-buried-inside-the-while-condition. It's code like this that gives C a reputation as being full of obscure gobbledegook.
But, at least in this case, it really is worth learning the idiom, and becoming comfortable with it, because the reductions to just one read and one test and one write really are virtues. It doesn't matter so much in a trivial case like this, but in real programs which are complicated for other reasons, if there's some key piece of functionality that happens in two different places, it's extremely easy to overlook this fact, and to make a change to one of them but forget to make it to the other.
(In fact, this happened to me just last week at work. I was trying to fix a bug in somebody else's code. I finally figured out that when the code did X, it was inadvertently clearing Y. I found the place where it did X, and I added some new code to properly recreate Y. But when I tested my fix, it didn't work! It turned out there were two separate places where the code did X, and I had found and fixed the wrong one.)
Finally, here's an equivalently minimal but unconventional way of writing the loop:
while (1)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
This is kind of like number 1, but it gets rid of the redundant condition in the while loop, and replaces it with the constant 1, which is "true" in C. This will work just fine, too, and it shares the virtue of having one read, one test, and one write. It actually ends up doing exactly the same operations and in exactly the same order as number 3, but by being laid out linearly it may be easier to follow.
The only problem with number 4 is that it's an unconventional, "break in the middle" loop. Personally, I don't have a problem with break-in-the-middle loops, and I find they come up from time to time, but if I wrote one and someone said "Steve, that's ugly, it's not an idiom anyone recognizes, it will confuse people", I'd have to agree.
P.S. I have replaced your calls to fgetc and fputc with the more conventional getc and putc. I'm not sure who told you to use fgetc and fputc, and there are obscure circumstances where you need them, but they're so rare that in my opinion one might as well forget that the "f" variants exist, and always use getc and putc.

Related

Can we have while loop test two arguments at the same time without &&/||

I was checking Beej's guide to IPC and one line of code took my attention.
In the particular page, the while loop in speak.c has two conditions to check while (gets(s), !feof(stdin)).
So my question is how is this possible as I have seen while look testing only one condition most of the time.
PS: I am little new to these. Will be grateful for any help. Thanks!
The snippet
while (gets(s), !feof(stdin))
uses the comma operator, first it executes gets(s), then it tests !feof(stdin), which is the result of the condition.
By the way don't use gets, it's extremely unsafe. Be wary of sources using it, they probably aren't good sources for learning the language.
The code
while(gets(s), !feof(stdin)) {
/* loop body */
}
is equivalent to
gets(s);
while(!feof(stdin)) {
/* loop body */
gets(s);
}
just more concise as it avoids the repetition of gets before the loop and in the loop body.
A couple of people have already pointed out some of the problems with this. I certainly agree that using gets (at all) is a lousy idea.
I think it's worth mentioning one other detail though: since this uses feof(file) as the condition for exiting the loop, it can/will also misbehave if you encounter an error before the end of the file. When an error occurs, the error flag will be set but (normally) the EOF flag won't be -- and since you can't read from the file any more (due to the error) it never will be either, so this will go into an infinite loop.
The right way to do the job is with fgets, and check its return value:
while (fgets(s, length_of_s, stdin))
process(s);
This tests for fgets succeeding at reading from the file, so it'll exit the loop for either end of file or an error.
One other minor detail: when fgets reads a string, it normally retains the new-line at the end of the line (where gets throws it away). You'll probably have to add a little more code to strip it off is it's present (and possibly deal with a line longer than the buffer you allocated if the newline isn't present).
This test is using the comma operator and has been used as a way of getting the next line of text using gets(s) and testing for end-of-file using !feof(stdin).
This syntax doesn't evaluate two expression. It executes first the gets(s) and then evaluates !feof(stdin) which may be modified by the gets() function call.
It's not a very good way to do it since it both use gets(), which is not a safe function and it's quite uneasy to read for a beginner (hence your question).

Using scanf in a while loop

Probably an extremely simple answer to this extremely simple question:
I'm reading "C Primer Plus" by Pratta and he keeps using the example
while (scanf("%d", &num) == 1)...
Is the == 1 really necessary? It seems like one could just write:
while (scanf("%d", &num))
It seems like the equality test is unnecessary since scanf returns the number of objects read and 1 would make the while loop true. Is the reason to make sure that the number of elements read is exactly 1 or is this totally superfluous?
In C, 0 is evaluated to false and everything else to true. Thus, if scanf returned EOF, which is a negative value, the loop would evaluate to true, which is not what you'd want.
Since scanf returns the value EOF (which is -1) on end of file, the loop as written is correct. It runs as long as the input contains text that matches %d, and stops either at the first non-match or end of file.
It would have been clearer at a glance if scanf were expecting more than one input....
while (scanf("%d %d", &x, &y)==2) { ... }
would exit the loop when the first time it was unable to match two values, either due to end of file end of file (scanf returns EOF (which is -1)) or on input matching error (e.g. the input xyzzy 42 does not match %d %d so scanf stops on the first failure and returns 0 without writing to either x or y) when it returns some value less than 2.
Of course, scanf is not your friend when parsing real input from normal humans. There are many pitfalls in its handling of error cases.
Edit: Corrected an error: scanf returns EOF on end of file, or a non-negative integer counting the number of variables it successfully set.
The key point is that since any non-zero value is TRUE in C, failing to test the return value correctly in a loop like this can easily lead to unexpected behavior. In particular, while(scanf(...)) is an infinite loop unless it encounters input text that cannot be converted according to its format.
And I cannot emphasize strongly enough that scanf is not your friend. A combination of fgets and sscanf might be enough for some simple parsing, but even then it is easily overwhelmed by edge cases and errors.
You understood the C code correctly.
Sometimes the reason for testing the number of items read is that someone wants to make sure that all items were read instead of scanf quitting early when it the input didn't match the expected type. In this particular case it didn't matter.
Usually scanf is a poor choice of functions because it doesn't meet the needs of interactive input from a human user. Usually a combination of fgets and sscanf yield better results. In this particular case it didn't matter.
If later chapters explain why some kinds of coding practices are better than this trivial example, good. But if not, you should dump the book you're reading.
On the other hand, your substitute code isn't exactly a substitute. If scanf returns -1 then your while loop will execute.
While you are correct it is not strictly necessary, some people prefer it for several reasons.
First, by comparing to 1 it becomes an explicit boolean value (true or false). Without the comparison, you are testing on an integer, which is valid in C, but not in later languages (like C#).
Secondly, some people would read the second version in terms of while([function]), instead of while([return value]), and be momentarily confused by testing a function, when what is clearly meant is testing the return value.
This can be completely a matter of personal preference, and as far as I'm concerned, both are valid.
One probably could write it without an explicit comparison (see the JRL's answer though), but why would one? I'd say that comparison-less conditions should only be used with values that have explicitly boolean semantics (like an isdigit() call, for example). Everything else should use an explicit comparison. In this case (the result of scanf) the semantics is pronouncedly non-boolean, so the explicit comparison is in order.
Also, the comparison one can usually omit is normally a comparison with zero. When you feel the urge to omit the comparison with something else (like 1 in this case) it is better to think twice and make sure you know what your are doing (see the JRL's answer again).
In any case, when the comparison can be safely omitted and you actually omit it, the actual semantical meaning of the condition remains the same. It has absolutely no impact on the efficiency of the resultant code, if that's something you are worrying about.

How to determine, inside of "while(fscanf != EOF){blah}", if the next fscanf is going to return EOF?

I've got some code executing in a while(fscanf != EOF) loop.
However, even when fscanf has finished executing, I need to keep running that code until some conditions are met.
I mean I guess I could copy/paste the code to outside the while(fscanf) loop, and only use global variables, but that seems messy. Surely someone has encountered something like this before and has a cleaner solution.
Couldn't you just modify the while condition to account for these additional conditions?
while(fscanf(...) != EOF || !(some_other_conditions_met))
You can't - the feof() function only tests for the end of file condition after a a read operation - there is no way of testing if the next operation will fail. You need to change the logic of your program, as C stream I/O doesn't directly support what you are asking for.
The answer to the title of your question is that it's pretty well impossible. Imagine getting a random number and deciding which of two format strings you're going to pass to fscanf. You want an advance prediction whether fscanf will get eof when you don't even know what the random number will be?
The answer to the body of your question is more or less what Andreas Brinck wrote. I have a feeling you might need something more like this though:
for (;;)
{
// .....
eofReached = (fscanf(..) == EOF);
// .....
if (eofReached && otherConditionsMet) break;
// .....
}
Cleaner solutions avoid using scanf/fscanf. It's much less error-prone to read an entire line at a time and to parse the line with sscanf instead.
From the comp.lang.c FAQ: Why does everyone say not to use scanf? What should I use instead?
The entry on feof usage is also relevant.

Rule of precedence == over =

I am just wondering would it be better to do this:
if((fd = open(filename, O_RDWR)) == -1)
{
fprintf(stderr, "open [ %s ]\n", strerror(errno));
return 1;
}
or this
fd = open(filename, O_RDWR);
if(fd == -1)
{
fprintf(stderr, "open [ %s ]\n", strerror(errno));
return 1;
}
Many thanks for any suggestions,
Yuck, split it up. What do you gain by mashing it all on one line? Let's compare and contrast:
Single-line:
Advantages:
Disadvantages: Hard to read, prone to error. (Consider your first revision.)
Multi-line:
Advantages: Easy to read, less error-prone.
Disadvantages:
I think it's clear. :)
"Sometimes putting it on one line makes more sense, for example: while ((c=getchar())!=EOF)"
That's fine, but that isn't the case here. There are times when not splitting it up makes more sense, but in general, don't.
"It saves more vertical space"
If one line is killing your ability to see the function, you need to 1) buy a monitor with a resolution higher than 640x480, and 2) Write smaller functions.
Really, I've never understood that argument for anything, functions should easily fit on any screen, regardless of a one-line difference.
"Multiple lines make it look complex"
Not really, shoving it on one line is arguably harder to read and more complex looking. Splitting things up makes it simpler to process one bit at a time, one shouldn't assume two lines makes it twice as complex.
Several people have argued in favor of the second. I disagree with them. While there was (apparently) initially a minor issue with = vs. == in the first one, I'd argue that it IS a minor issue.
A much bigger issue is that it's all too common for people (especially if they're in a hurry) to skip over the error checking -- leaving out the if (whatever == -1) completely, usually on the theory that what they're working on is quick, throwaway code and checking for the error isn't really needed. This is a really bad habit; I can practically guarantee that every person reading this has seen real code that skipped error checking like this, even though it really, really should have had it.
In code like this, attempting to open the file and checking for an error in having done so should be inextricably bound together. Putting the two into the same statement reflects the proper intent. Separating the two is just plain wrong -- they should not be separated in any way at any time for any reason. This should be coded as a single operation because it should be a single operation. It should always be thought of and coded as a single operation.
The excuses for doing otherwise are, in my opinion, quite weak. The reality is that anybody who uses C needs to be able to read code that combines an assignment with a conditional test. Just for an obvious example, a loop like while ((ch=getchar()) != EOF) pretty much needs to be written as a combined assignment and test -- attempting to test for EOF separately usually leads to code that just doesn't work correctly, and if you do make it work correctly, the code is substantially more complex.
Likewise, with the problem of - vs. ==. Since I didn't see the defect to start with, I'm not sure how much separating the two would have done to avoid problems, but my immediate guess is that it probably made almost no difference at all. Compilers that will warn you when what was supposed to be a condition contains only an assignment have been around for years (e.g. gcc). In most cases, the symptoms are almost immediately obvious anyway -- in short, the fact that you made a particular typo in one part of this posting but not the other doesn't prove (or honestly even indicate) much of anything about the relative difficulty of the two.
Based on that kind of evidence, I'd apparently believe that "not" is harder to type than "immediately", since I just typed "immediately" without a problem, but had to correct "not" (twice, no less) before it came out right in the previous sentence. I'm pretty sure if we went by how often I mistype it, "the" is the single most difficult word in the English language.
Maybe something where the parentheses make the ordering obvious?
if((fd = open(filename, O_RDWR)) == -1)
In this example, I'll join the chorus saying the second method is better.
The tougher case is when it's in a loop, like:
while ((c=getchar())!=-1)
{
... do something ...
}
versus
while (true)
{
c=getchar();
if (c==-1)
break;
... do something ...
}
In cases like that I prefer to do it on one line because then it makes clear what is controlling the loop, which I think overrides the drawbacks of the complex combination of assignment and testing.
Its a style thing - you're not asking a precedence (not presidence).
Many people will argue the latter example is clearer.
The second is better for readability's sake, but I know I do the first too often. The = operator will take precedence, especially since you have it in quotes, allowing the assigned value to be returned & compared by the == operator.
Except for standard idioms -- ones that are so common that everyone immediately gets what you are trying to do -- I would avoid doing the assignment in the conditional. First, it's more difficult to read. Second, you leave yourself open (at least in weakly typed languages that interpret zero as false and non-zero as true) to creating bugs by using an errant assignment operator in the conditional check.
This is a matter of style and is subjective. They do the same thing. I tend to prefer the later because I find it easier to read, and easier to set breakpoints/examine variables in the debugger.
(-1 == __whatever__)
to minimize typo
The first case is very normal when you're writing an input loop, because the alternative is to have to write the input command twice -- once right before the loop, and once right at the end of the loop.
while ( (ch=getchar()) != -1){
//do something with it
}
I think that the second way is more normal for an if statement, where you don't have the same concern.

When/why is it a bad idea to use the fscanf() function?

In an answer there was an interesting statement: "It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure. I prefer to use fgets() to get each line in and then sscanf() that."
Could you expand upon when/why it might be better to use fgets() and sscanf() to read some file?
Imagine a file with three lines:
1
2b
c
Using fscanf() to read integers, the first line would read fine but on the second line fscanf() would leave you at the 'b', not sure what to do from there. You would need some mechanism to move past the garbage input to see the third line.
If you do a fgets() and sscanf(), you can guarantee that your file pointer moves a line at a time, which is a little easier to deal with. In general, you should still be looking at the whole string to report any odd characters in it.
I prefer the latter approach myself, although I wouldn't agree with the statement that "it's almost always a bad idea to use fscanf()"... fscanf() is perfectly fine for most things.
The case where this comes into play is when you match character literals. Suppose you have:
int n = fscanf(fp, "%d,%d", &i1, &i2);
Consider two possible inputs "323,A424" and "323A424".
In both cases fscanf() will return 1 and the next character read will be an 'A'. There is no way to determine if the comma was matched or not.
That being said, this only matters if finding the actual source of the error is important. In cases where knowing there is malformed input error is enough, fscanf() is actually superior to writing custom parsing code.
When fscanf() fails, due to an input failure or a matching failure, the file pointer (that is, the position in the file from which the next byte will be read) is left in a position other than where it would be had the fscanf() succeeded. This is typically undesirable in sequential file reads. Reading one line at a time results in the file input being predictable, while single line failures can be handled individually.
There are two reasons:
scanf() can leave stdin in a state that's difficult to predict; this makes error recovery difficult if not impossible (this is less of a problem with fscanf()); and
The entire scanf() family take pointers as arguments, but no length limit, so they can overrun a buffer and alter unrelated variables that happen to be after the buffer, causing seemingly random memory corruption errors that very difficult to understand, find, and debug, particularly for less experienced C programmers.
Novice C programmers are often confused about pointers and the “address-of” operator, and frequently omit the & where it's needed, or add it “for good measure” where it's not. This causes “random” segfaults that can be hard for them to find. This isn't scanf()'s fault, so I leave it off my list, but it is worth bearing in mind.
After 23 years, I still remember it being a huge pain when I started C programming and didn't know how to recognize and debug these kinds of errors, and (as someone who spent years teaching C to beginners) it's very hard to explain them to a novice who doesn't yet understand pointers and stack.
Anyone who recommends scanf() to a novice C programmer should be flogged mercilessly.
OK, maybe not mercilessly, but some kind of flogging is definitely in order ;o)
It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure. I prefer to use fgets() to get each line in and then sscanf() that.
You can always use ftell() to find out current position in file, and then decide what to do from there. Basicaly, if you know what you can expect then feel free to use fscanf().
Basically, there's no way to to tell that function not to go out of bounds for the memory area you've allocated for it.
A number of replacements have come out, like fnscanf, which is an attempt to fix those functions by specifying a maximum limit for the reader to write, thus allowing it to not overflow.

Resources