This question already has answers here:
EOF exercise 1-6 K&R The C programming language
(1 answer)
Why does this C program print weird characters in output?
(3 answers)
Why does printf not flush after the call unless a newline is in the format string?
(10 answers)
Closed 5 years ago.
I am a noob teaching myself to program in C using The C Programming Language, Second Edition (by K&R). In Chapter 1 Section 1.5.1 File Copying, the authors touch very briefly on operational precedence when making comparison between values, underscoring the importance of using parenthesis, in this case, to ensure that assignment is made to the variable 'c' before the comparison is evaluated. They make the assertion that:
c = getchar() != EOF
is equivalent to
c = (getchar() != EOF)
Which "has the undesired effect of setting c to 0 or 1, depending on whether or not the call of getchar encountered end of file"
The authors then pose Excercise 1-6 - Verify that the expression getchar () != EOF is 0 or 1
Based on the author's previous assertion, this seemed almost trivial so I created this code:
#include <stdio.h>
main()
{
int c;
while (c = (getchar() != EOF))
putchar(c);
}
Unfortunately, when I run the program, it simply outputs whatever characters I type in the command window rather than the expected string of 1 or 0 if EOF is encountered.
While I am a noob, I think I get the logic that the authors are trying to teach and yet I can not demonstrate this simple task. In this case, should not the variable c take on the value that the comparison expression evaluated to rather than whatever character getchar() happens to fetch, particularly because of the location of the parenthesis? If c is indeed taking on the value of the comparison, putchar() should only output 0 or 1 and yet, as formulated, it outputs what I type in the command window. What am I doing wrong? What do I not understand? Could it be my compiler? I am coding in in Visual Studio 2017 Community edition on Windows 10 on x64 architecture. I have Tiny C Compiler but have not tried executing from command prompt with TCC yet.
When you run the program, the characters that you see doesn't come from your program. It's the console (or terminal)'s echo funcion that shows whatever character you have typed (and you can even erase them before you hit Enter). Your program only outpus characters with ASCII code 0 or 1, both of which are invisible.
If you change putchar(c) to printf("%d", c) you'll be able to see a sequence of 1s. No zero will appear because when c becomes zero, the loop stops and it won't be printed.
Characters '0' and '1' have the ASCII code of 48 and 49, respectively, despite the fact that your terminal may use another encoding. If you want to output a literal number 0, use the character notation. You can also try putchar(48) but don't use this too much (You'll later find out that it's highly discouraged to use magic numbers in your program).
putchar('0');
^ ^
The assertion that
c = getchar() != EOF;
is equivalent to
c = (getchar() != EOF);
is because of operator precedence. The operator != (inequality) has a higher precedence over = (value assignment), so it gets evaluated prior to assignment.
Finally, it's extremely rare for someone to have written that. The correct intention is to write this:
while ( (c = getchar()) != EOF )
The thing is c = (getchar() != EOF) it will get one input character and then it compares. Result will be 1 in case it is not EOF. Then it is assigned. The value of assignment statement is the value that is being assigned. It enters the loop. Prints the character having ascii value 1. But that character is non-printable so you don't see anything.
Once it gets EOF it will break from the loop. So you never get to see anything other than the character which has ascii value of 1.(Even you don't see that also as it is non-printable). These are known as ascii-control characters. (Not belong to the printable class).
Also you said as c=0 or c=1 it should print 0 or 1. Then try this simple code
int c= 68;
putchar(c);
Check the output and you will get the idea what happens when we try to print. It's the character whose ascii code is 68, that is printed, not the value 68.
The right way to do it would be ((c = getchar()) != EOF).
Originally I mentioned that on some machines it prints some funny characters. The representation of the non-printable characters depend on the used charset. It might be some non-standard encoding (non-unicode) which assigns to the ascii code 1 some representation.(breaking the idea of nonprintables)
Related
Exercise 1-9 in The C Programming Language by Denis Ritchie and Brian Kernighan, second edition:
Write a program to copy its input to its output, replacing each string of one or more blanks by a single blank.
Given that I'm using the book as a reference, I know only the C principles which have been discussed in the book up to Exercise 1-9, that is, variable assignments, while- and for-loops, if-statements, symbolic constants, character I/O via getchar() and putchar(), escape sequences and the printf() function. Maybe I'm forgetting something, but that's most of it. (I'm at page 20, where the exercise is at.)
Here's my (not working) code:
#include <stdio.h>
main()
{
int c;
while ((c = getchar()) != EOF) { // As long as EOF is not met, repeat the following…
if(c == ' ') { // If the input character is a blank, proceed (otherwise skip)…
putchar(c); // Output the blank space which was just inputed…
while(c == ' ') { // As long as more spaces keep coming in, don't do anything (proceed when another character comes along)…
;
}
}
else { // When a character other than a blank is inputed, output that character…
putchar(c);
} // Now retest the master while-loop condition (EOF not met) and proceed…
}
}
What I'm getting as a result is a working input-to-output program, that keeps on inputting and stops outputting the moment a blank is typed. (An exception to this is if the blank is removed with a backspace before entering a new line in the console.)
For example, the input abcde\nabcde abcde\nabcde will yield the output abcde, omitting the second and third lines, given that a blank is contained in the former. Here I am obviously using \n to represent an inputted new line (normally using the Enter key).
What have I done wrong, and what could I do to fix this issue? I know there are several working models of this program spread all over the internet, but I'm wondering why this one (which is my creation) in particular doesn't work. Again, do note that my knowledge of C is mostly limited to the first twenty pages of the book whose details are provided below.
Specs:
I'm running Eclipse version 2021-12 (4.22.0) on Debian GNU/Linux 11 (bullseye). I downloaded the pre-compiled Eclipse version from the official Eclipse.org website.
References:
Kernighan, B.W. and Ritchie, D.M. (1988). The C programming language / ANSI C Version. Englewood Cliffs, N.J.: Prentice Hall.
while(c == ' ') is forever loop.
you should try to remember previous character and if prevous character is whitespace and current character is also whitespace, skip it.
I know this has been discussed before, but I want to make sure I understand correctly, what is happening in this program, and why. On page 20 of Dennis Ritchie's textbook, The C Programming Language, we see this program:
#include <stdio.h>
int main()
{
int c;
c = getchar();
while(c != EOF){
putchar(c);
c = getchar();
}
return 0;
}
When executed, the program reads each character keyed in and prints them out in the same order after the user hits enter. This process is repeated indefinitely unless the user manually exits out of the console. The sequence of events is as follows:
The getchar() function reads the first character keyed in and assigns its value to c.
Because c is an integer type, the character value that getchar() passed to c is promoted to it's corresponding ASCII integer value.
Now that c has been initialized to some integer value, the while loop can test to see if that value equals the End-Of-File character. Because the EOF character has a macro value of -1, and because none of the characters that are possible to key in have a negative decimal ASCII value, the condition of the while loop will always be true.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop. If the user only keys in one character before execution, then the program reads the <return> value as the next character and prints a new line and waits for the next input to be keyed in.
Is any of this remotely correct?
Yes, you've basically got it. But it's even simpler: getchar and putchar return and accept int types respectively already. So there's no type promotion happening. You're just taking in characters and sending them out in a loop until you see EOF.
Your intuition about why those should be int and not some char form is likely correct: the int type allows for a sentinel EOF value that is outside the value range of any possible character value.
(The K&R stdio functions are very old at this point, they don't know about Unicode and etc, and some of the underlying design rationales are if not murky, just not relevant. Not a lot of practical code these days would use these functions. That book is excellent for a lot of things but the code examples are fairly archaic.)
(Also, fwiw, your question title refers to "copying a file", which you still can do this way, but there are more canonical ways)
Well, it is correct in idea, but not in details, and that's where the devil is in.
The getchar() function reads the first character from standard input and returns it as an unsigned char promoted to int (or the special EOF value if no character was read)
The return value is assigned into c, which is of type int (as it should, as if it were a char strange things could happen)
Now that c has been assigned some integer value, the while loop can test to see if that value equals the value of the EOF macro.
Because the EOF macro has an implementation-specified negative value, and because the characters were converted to unsigned char and promoted to int, none of them have a negative value (at least not in any systems that you'd meet a a novice), the condition of the while loop will always be true until the End-of-File condition happens or an error happens when reading standard input.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop.
The standard input, if it is connected to a terminal device, is usually line-buffered, meaning that the program does not receive any of the characters on the line until the user has completed the line and hit the Enter key.
Instead of ASCII, we speak of the execution character set, which nowadays might often be individual bytes of UTF-8 encoded Unicode characters. EOF is negative in binary too, we do not need to think about "its decimal value". The char and unsigned char types are numbers too, and the character constants are of type int - i.e. on systems where the execution character set is compatible with ASCII, writing ' ' will be the same thing as writing 32, though of course clearer to those who don't remember ASCII codes.
Finally, C is very strict about the meaning of initialization. It is the setting of the initial value into a variable when it is declared.
int c = getchar();
has an initialization.
int c;
c = getchar();
has c uninitialized, and then assigned a value. Knowing the distinction makes it easier to understand compiler error messages when they refer to initialization or assignment.
#include <stdio.h>
int main()
{
int c;
while ( (c = getchar()) != EOF )
{
if (c >= 65 && c <= 90)
c += 32;
else if (c >= 97 &&c <= 122)
c -= 32;
putchar(c);
}
return 0;
}
In the code how is the input interpreted, processed and printed?
For example, if we input abcde we get output ABCDE. As we are processing input character by character, we should get the output character by character but we get output once we press enter. Till we press enter where is the output store.
what are the functions?
getchar() is a function that is used to get a single character from
the console.
putchar() is a function that is used to return the character written
as an unsigned char cast to an int.
#include <stdio.h>
int main()
{
int c;
while ( (c=getchar()) != EOF ){
if (c >= 65 && c <= 90)
c += 32;
else if (c >= 97 &&c <= 122)
c -= 32;
putchar(c);
}
return 0;
}
c is int so when you read character from keyboard it stores it as integer not character.
checkout ASCII.
so 65 = A, 90 = Z, 97 = a, 122 = z,
Here is what i understood from the code:
create variable c to store our character.
get character from user and store it to c.
while c is not equal to EOF(end of file)
if c is greater than or equal to 65 and c is less than or equal to 90
add 32 to c
or else if c is greater than 97 and c is less than or equal to 122
minus 32 from c
return c value into character and print it.
At first, I would appreciate that you have asked such a good question that most of the programmer won't even think about. In this article, I tried best to solve the matter.
Queue: Follows the principal First In First Out. The principal is similar to the priority followed while a queue of people waiting to buy new iPhones.
Input stream behaves like a queue. Consider it as a Scooby Doo. Even what It has eaten is enough for it's day it won't stop eating. If someone offers two tonnes of food in the breakfast, Scooby will eat all the breakfast and still asks for the Lunch.
The input stream is similar to Scooby Doo. Look this code!
char c=getchar();
Here one character is enough for c but when you run this code. You can type as many characters as you want in the console but no matter what when you press enter your enter: c will assign to the first character you had typed.
But notice that checking for EOF is bad practice because of so many reasons I will list the reasons at the end.
When coming to your Question!. Why is the character not printed one by one?
Studying about stream itself is a big area. So just for your convenience think, that input stream will have some hidden file(Queue) to store whatever character you have typed.
Typing character in the stream is like a machine gun firing continuously. There is no option you have to wait for the machine gun to stop firing for doing the counter-attack.
Likewise, while you are typing the character in the stream the file will simply push each character into it. Once you have typed enter or the EOF command 'Cntl+D'.
Then your code will look into the file character by character(Remember the principal first IN). If a condition meets it will stop executing there and it won't care about the next the remaining characters in the file.
In your case, it will look all the characters in the file after the user types enter or EOF command and it will change their cases(upper to lower and vice-versa)
As I promised!
Why one should not check for EOF!
One reason is that the keystroke that creates an end-of-file condition from the
keyboard is not standard, but differs between platforms.
Another reason is that terminating the program is best done more explicitly
(e.g. with a "quit" command).
A third reason is that other error conditions might arise, and this check will not detect them.
So try to avoid it.
stdin and stdout are buffered streams. You can set them to be unbuffered, but that can cause a significant performance hit. And, even if you set them to unbuffered, it still may not be effective from a terminal interface because the terminal often does it's own buffering.
See this for more info:
Should I set stdout and stdin to be unbuffered in C?
This question references Reflections on Trusting Trust, figure 2.
Take a look at this snippet of code, from figure 2:
...
c = next( );
if(c != '\\')
return(c);
c = next( );
if (c != '\\')
return('\\');
if (c == 'n')
return('\n');
It says:
This is an amazing piece of code. It "knows" in a completely portable way what character code is compiled for a new line in any character set. The act of knowing then allows it to recompile itself, thus perpetuating the knowledge.
I would like to read the rest of the paper. Can someone explain how the above code is recompiling itself? I'm not sure I understand how this snippet of code relates to the code in "Stage 1":
(source: bell-labs.com)
The stage 2 example is very interesting because it is an extra level of indirection with a self replicating program.
What he means is that since this compiler code is written in C it is completely portable because it detects the presence of a literal \n and returns the character code for \n without ever knowing what that actual character code is since the compiler was written in C and compiled for the system.
The paper goes on to show you very interesting trojan horse with the compiler. If you use this same technique to make the compiler insert a bug into any program, then remove move the bug from the source code, the compiler will compile the bug into the supposedly bug free compiler.
It is a bit confusing but essentially it is about multiple levels of indirection.
What this piece of code does is to translate escape characters, which is part of the job of a C compiler.
c = next( );
if(c != '\\')
return(c);
Here, if c is not \\(the character \), means it's not the start of an escape character, so return itself.
If it is, then it's the start of an escape character.
c = next( );
if (c == '\\')
return('\\');
if (c == 'n')
return('\n');
Here you have a typo in your question, it's if (c == '\\'), not if (c != '\\'). This piece of code continue to examine the character following \, it's clear, if it's \, then the whole escape character is \\, so return it. The same for \n.
The description of that code, from Ken Thompson's paper is: (emphasis added)
Figure 2 is an idealization of the code in the C compiler that interprets the character escape sequence.
So you're looking at part of a C compiler. The C compiler is written in C, so it will be used to compile itself (or, more accurately, the next version of itself). Hence the statement that the code is able to "recompile itself".
I'm new to programming and I can't seem to get my head around why the following happens in my code, which is:
#include <stdio.h>
/*copy input to output; 1st version */
main()
{
int c;
c = getchar();
while (c != EOF) {
putchar(c);
c = getchar();
}
}
So after doing some reading, I've gathered the following:
Nothing executes until I hit Enter as getchar() is a holding function.
Before I hit Enter, all my keystrokes are stored in a buffer
When getchar() is called upon, it simply goes looks at the first value in the buffer, becomes that value, and then removes that value from the buffer.
My question is that when I remove the first c = getchar() the resulting piece of code has exactly the same functionality as the original code, albeit before I type anything a smiley face symbol immediately appears on the screen. Why does this happen? Is it because putchar(c) doesn't hold up the code, and tries to display c, which isn't yet defined, hence it outputs some random symbol? I'm using Code::Blocks if that helps.
The function you listed will simply echo back to you every character you type at it. It is true that the I/O is "buffered". It is the keyboard input driver of the operating system that is doing this buffering. While it's buffering keys you press, it echoes each key back at you. When you press a newline the driver passes the buffered characters along to your program and getchar then sees them.
As written, the function should work fine:
c = getchar(); // get (buffer) the first char
while (c != EOF) { // while the user has not typed ^D (EOF)
putchar(c); // put the character retrieved
c = getchar(); // get the next character
}
Because of the keyboard driver buffering, it will only echo back every time you press a newline or you exit with ^D (EOF).
The smiley face is coming from what #YuHao described: you might be missing the first getchar in what you ran, so putchar is echoing junk. Probably a 0, which looks like a smiley on your screen.
If you ommit the first getchar(), the code will look like this:
int c;
while (c != EOF) {
putchar(c);
c = getchar();
}
Here, c is uninitialized, so calling putchar(c) the first time will output a garbage value, that's where you get the smiley face.
"I'm new to programming"
You're not advised to learn programming using C (and the difficulties you're going through are because of the C language). For example, my first computer science classes were in pascal. Other univesities may use scheme or lisp, or even structured natural languages to teach programming. MIT's online classes are given in python.
C is not a language you would want to use in the first months of programming. The specific reason in your case is due to the fact that the language allowed you to use the value of an uninitialized value.
When you declare the integer variable "c", it gets an implicitly reserved space on the program stack, but without have any meaningful value: it's "trash", the value is whatever value was already on memory at that time. The C language requires that the programmer implicitly knows that he needs to assign some value before using a variable. Removing the first getchar results in uses before assignment in the while condition (c != EOF) and putchar(c), both before c has any meaningful value.
Consider the same code rewritten in python:
import sys
c = sys.stdin.read(1)
while c != '':
c = sys.stdin.read(1)
sys.stdout.write(c)
Remove the initial read and you get the following error:
hdante#aielwaste:/tmp$ python3 1.py
Traceback (most recent call last):
File "1.py", line 3, in <module>
while c != '':
NameError: name 'c' is not defined
That's a NameError: you used the value without assigned to it resulted in a language error.
For more information, try an online course, for example:
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-00-introduction-to-computer-science-and-programming-fall-2008/video-lectures/
About uninitialized values:
http://en.wikipedia.org/wiki/Uninitialized_variable