Two distinct characters sharing same ASCII value in C - c

I am using Linux x86_64 with gcc 4.8.1.
Code:
#include <stdio.h>
int main(int argc, char *argv[])
{
int ch;
do
{
printf("ch : ");
ch = getchar(); //Q Why CTRL+M = 10 and not 13?
getchar();
printf("ch = %d\n\n", ch);
}while(ch != 'z');
return 0;
}
Output:
ch : ^N
ch = 14
ch :
ch = 10
ch :
ch = 10
ch : z
ch = 122
Question:
In above program when I enter Ctrl+J (linefeed character) it spits 10 which is indeed the ASCII of \n But when I feed Ctrl+M (carriage-return character) then too it spits 10 instead of 13 (ASCII value of \r).
What's going on? Does \n and \r share the same ASCII value? Then which character represents ASCII 13?
EDIT:
$ uname -a
Linux Titanic 3.11.0-26-generic #45-Ubuntu SMP Tue Jul 15 04:02:06 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

The problem is that ICRNL is enabled in the terminal driver. Here's a snippet from the man page for tcsetattr(3), which is used from C to set terminal attributes:
ICRNL Translate carriage return to newline on input (unless IGNCR is set).
To disable ICRNL, you can run the following command before your program (or use tcsetattr() directly):
$ stty -icrnl
stty -a lets you view the current terminal settings.
Note that the above will prevent your Enter key from working normally too (since it generates a carriage return, while the terminal driver is waiting for a linefeed to terminate the line before sending it to your program). You will have to use Ctrl-J instead. :)
Below is a tangent on why Return still works in the shell with IGNCR disabled (in Bash at least), in case you're interested:
Bash uses the readline library to read commands. Before reading a command, readline puts the terminal into noncanonical mode, where input is unbuffered (can be read a character at a time, as soon as a character is typed). readline therefore sees the carriage return character as soon as it is typed, and happens to accept it as a line terminator.
Noncanonical mode is needed to implement fancy line editing like being able to move the cursor with the cursor keys and insert text in the middle of a command. Text UI libraries like ncurses also use this mode.
While your C program runs, the terminal is in canonical mode instead, where the terminal driver does line buffering (sends the input to the process a line at a time). This mode only has rudimentary line editing (e.g., erasing is supported) and does not interpret the cursor keys, which is why you get strange characters sequences on the screen when you press them. (Those characters are the terminal escape sequences generated by the cursor keys, which become visible in this mode. A handy command to experiment with is a plain cat with no arguments.)
Canonical mode is enabled/disabled through ICANON, which is an option just like IGNCR. Experimenting with it from the shell might be a bit tricky though, since the shell sets and resets it as programs (like stty) are run.

I don't know the ^J keyboard shortcut, but I'm willing to bet that if you feed your code with a fixed character (not read from terminal) '\r' and '\n' you'll get the proper ASCII values. This means it is either your terminal setup that's wrong like #alk said, or ^J doesn't do what you think it does...

Related

Ending a Loop with EOF (without enter)

i am currently trying to end a while loop with something like this:
#include <stdio.h>
int main()
{
while(getchar() != EOF)
{
if( getchar() == EOF )
break;
}
return 0;
}
When i press CTRL+D on my Ubuntu, it ends the loop immediately. But on Windows i have to press CTRL+Z and then press ENTER to close the loop. Can i get rid of the ENTER on Windows?
The getchar behavior
For linux the EOF char is written with ctrl + d, while on Windows it is written by the console when you press enter after changing an internal status of the CRT library through ctrl + z (this behaviour is kept for retrocompatibility with very old systems). If I'm not wrong it is called soft end of file. I don't think you can bypass it, since the EOF char is actually consumed by your getchar when you press enter, not when you press ctrl + z.
As reported here:
In Microsoft's DOS and Windows (and in CP/M and many DEC operating systems), reading from the terminal will never produce an EOF. Instead, programs recognize that the source is a terminal (or other "character device") and interpret a given reserved character or sequence as an end-of-file indicator; most commonly this is an ASCII Control-Z, code 26. Some MS-DOS programs, including parts of the Microsoft MS-DOS shell (COMMAND.COM) and operating-system utility programs (such as EDLIN), treat a Control-Z in a text file as marking the end of meaningful data, and/or append a Control-Z to the end when writing a text file. This was done for two reasons:
Backward compatibility with CP/M. The CP/M file system only recorded the lengths of files in multiples of 128-byte "records", so by convention a Control-Z character was used to mark the end of meaningful data if it ended in the middle of a record. The MS-DOS filesystem has always recorded the exact byte-length of files, so this was never necessary on MS-DOS.
It allows programs to use the same code to read input from both a terminal and a text file.
Other information are also reported here:
Some modern text file formats (e.g. CSV-1203[6]) still recommend a trailing EOF character to be appended as the last character in the file. However, typing Control+Z does not embed an EOF character into a file in either MS-DOS or Microsoft Windows, nor do the APIs of those systems use the character to denote the actual end of a file.
Some programming languages (e.g. Visual Basic) will not read past a "soft" EOF when using the built-in text file reading primitives (INPUT, LINE INPUT etc.), and alternate methods must be adopted, e.g. opening the file in binary mode or using the File System Object to progress beyond it.
Character 26 was used to mark "End of file" even if the ASCII calls it Substitute, and has other characters for this.
If you modify your code like that:
#include <stdio.h>
int main() {
while(1) {
char c = getchar();
printf("%d\n", c);
if (c == EOF) // tried with also -1 and 26
break;
}
return 0;
}
and you test it, on Windows you will see that the EOF (-1) it is not written in console until you press enter. Beore of that a ^Z is printed by the terminal emulator (I suspect). From my test, this behavior is repeated if:
you compile using the Microsoft Compiler
you compile using GCC
you run the compiled code in CMD window
you run the compiled code in bash emulator in windows
Update using Windows Console API
Following the suggestion of #eryksun, I successfully written a (ridiculously complex for what it can do) code for Windows that changes the behavior of conhost to actually get the "exit when pressing ctrl + d". It does not handle everything, it is only an example. IMHO, this is something to avoid as much as possible, since the portability is less than 0. Also, to actually handle correctly other input cases a lot more code should be written, since this stuff detaches the stdin from the console and you have to handle it by yourself.
The methods works more or less as follows:
get the current handler for the standard input
create an array of input records, a structure that contains information about what happens in the conhost window (keyboard, mouse, resize, etc.)
read what happens in the window (it can handle the number of events)
iterate over the event vector to handle the keyboard event and intercept the required EOF (that is a 4, from what I've tested) for exiting, or prints any other ascii character.
This is the code:
#include <windows.h>
#include <stdio.h>
#define Kev input_buffer[i].Event.KeyEvent // a shortcut
int main(void) {
HANDLE h_std_in; // Handler for the stdin
DWORD read_count, // number of events intercepted by ReadConsoleInput
i; // iterator
INPUT_RECORD input_buffer[128]; // Vector of events
h_std_in = GetStdHandle( // Get the stdin handler
STD_INPUT_HANDLE // enumerator for stdin. Others exist for stdout and stderr
);
while(1) {
ReadConsoleInput( // Read the input from the handler
h_std_in, // our handler
input_buffer, // the vector in which events will be saved
128, // the dimension of the vector
&read_count); // the number of events captured and saved (always < 128 in this case)
for (i = 0; i < read_count; i++) { // and here we iterate from 0 to read_count
switch(input_buffer[i].EventType) { // let's check the type of event
case KEY_EVENT: // to intercept the keyboard ones
if (Kev.bKeyDown) { // and refine only on key pressed (avoid a second event for key released)
// Intercepts CTRL + D
if (Kev.uChar.AsciiChar != 4)
printf("%c", Kev.uChar.AsciiChar);
else
return 0;
}
break;
default:
break;
}
}
}
return 0;
}
while(getchar() != EOF)
{
if( getchar() == EOF )
break;
}
return 0;
Here it is inconsistent.
If getchar() != EOF it will enter the loop, otherwise (if getchar() == EOF) it will not enter the loop. So, there is no reason to check getchar() == EOF inside the loop.
On the other hand, you call getchar() 2 times, you wait to enter 2 characters instead of only 1.
What did you try to do ?

getchar and EOF

I was working on the following example of C code from Deitel & Deitel. It seems that the code is supposed to print the characters entered before EOF, in the reverse order. But I have to press EOF (ctrl+z in windows) several times and Enter key to get it done. Could you please let me know why it does not respond at the first EOF?
#include <stdio.h>
int main( void )
{
int c;
if ( ( c = getchar() ) != EOF ) {
main();
printf( "%c", c );
} /* end if */
return 0;
}
Well getchar(3) is a function that operates in buffer mode, so you have to input some characters, and press the character used to signal end of data (^D in unix, ^Z in windows)
The problem here is that windows console driver is not specified the same way as the unix tty driver, so the behaviour will, in general, not be the same... Try to test the program in a real unix environment (or linux) and see if the input, at least is reversed, as the example said.
In unix, the behaviour of terminal input is that ^D is interpreted as soon as it is pressed, but if some input is in the driver buffer before it, it will make those input available to the program (so you'll have to press it a second time to signal EOF condition, which consists in a read(2) that results in 0 characters actually read). In case you have pressed <Return> before ^D (return makes all data available to the application, with the difference that the \n char is also appended to the data read), the input buffer is empty, so the EOF condition comes inmediately, after the <return> char.
In windows, you need to press <return> for anything to be read (the ^Z to be interpreted), and things complicate.
By the way, I have executed your program on a BSD unix system, with the following result:
$ a.out
apsiodjfpaosijdfa
^D
afdjisoapfjdoispa$ _
Explanation: first line is the input line "apsiodjfpaosijdfa", followed by a \n, and ^D signalling end of input. All this data goes to the application at once, and getchar() then processes it character by character. It prints the \n first (making the line to appear below ^D) and then the input chars, reversed. As there was no \r at the beginning of data, no return is issued at end, and the prompts appears next to the output.
The final _ signals the position of the cursor at the end.
If you don't want to deal with end of data characters (or don't have any unix at hand to make the test) you can use a text file to test your program (no eof char, only the actual end of the file), by redirecting program input from a file, like in this example that uses the original source code as input:
$ a.out <pru.c
}
;0 nruter
/* fi dne */ }
;) c ,"c%" (ftnirp
;)(niam
{ ) FOE =! ) )(rahcteg = c ( ( fi
;c tni
{
) diov (niam tni
>h.oidts< edulcni#$ _
It's to do with they way the windows console is passing the CTRL+Z to the program. It's probably waiting for you to compose a line and not recognising a line until it has a non- CTRL-Z character in it. So it waits until you accidentally press space and also enter.
Just echo everything in a scratch program to see exactly what is going on.

How to enter a backspace character on a shell command line?

I am trying to do Exercise 1-10 in K&R. I've got the program working and running. So far I've come to know that the backspace character is cooked with the operating system. How can I input the backspace character in Mac OS X?
Not sure what you mean by "cooked with the operating system". I guess you're asking how to enter a backspace character on the shell command line without it being interpreted as a backspace, that is, without actually erasing the previous character.
The key combination for the ASCII backspace control character is ^H (hold down Ctrl and press H. However, pressing this combination on the shell command line simply performs the "backspace" operation. To quote a control character in Bash, which is the OS X default shell, and in Zsh you type a combination of two characters: ^V^H.
You can use (non destructive) backspace \b in printf and re-write. This way:
$ cat w.c
#include <stdio.h>
int main ()
{
printf("abcd\n");
printf("abc\bd\n");
}
$ ./w
abcd
abd
UPDATE
Same story using putchar():
$ cat w.c
#include <stdio.h>
int main ()
{
printf("abcd\n");
putchar('a');
putchar('b');
putchar('c');
putchar('\b');
putchar('d');
putchar('\n');
}
Same output...

execute after EOF in C

I'm doing homework for my C programming class. The question states "Write a program which reads input as a stream of characters until encountering EOF". I'm using Xcode on my macbook and the only way I know to make the program encounter EOF is using ctrl + D or ctrl + Z. But it will exit my program completely.
For example I have this code:
int main()
{
int ch;
while ((ch = getchar()) != EOF)
{
putchar(ch);
}
printf("%d",ch);
return 0;
}
Is there away for the code to execute the printf("%d",ch) after the loop (after i hit ctrl + D on my keyboard)?
You can test your program using (with a POSIX shell) here documents.
First compile your source code mycode.c into a binary mybin with
gcc -std=c99 -Wall -Wextra -g mycode.c -o mybin
(it could be clang or cc instead of gcc)
then run mybin with a "here document" like
./mybin << EOF
here is my
input
EOF
You could also use input redirection. Make some file myfile.txt and run ./mybin < myfile.txt
You could even run your program on its own source code: ./mybin < mycode.c
And the input could even come from some pipe, e.g. ls * | ./mybin
BTW, what you are observing is that stdin, when it is a terminal, is line-buffered. See this answer (most of it should apply to MacOSX).
Notice that your code is incorrect: you are missing an #include <stdio.h> near the top of the file, and your main should really be int main(int argc, char**argv) (BTW you could improve your program so that when arguments are given, they are file names to be read). At last the ending printf would surely show -1 which is generally the value of EOF
Also, it is much better to end your printf format control string with \n or else use appropriately fflush(3)
Notice that end-of-file is not an input (or a valid char), it is a condition on some input file stream like stdin, and the getchar(3) function is specified to return EOF which is an int outside of the range of char (on my Linux system EOF is -1 because char-s are between 0 and 255 included). You might test end-of-file condition after some input operation (never before!) using feof(3)
A terminal in usual cooked mode is handled by the kernel so that when you press Ctrl D an end-of-file condition is triggered on the file descriptor (often STDIN_FILENO i.e. 0) related to that terminal.
I'm using Xcode on my macbook and the only way I know to make the program encounter EOF is using ctrl + D or ctrl + Z. But it will exit my program completely.
No it won't. If you run your program in the Xcode debugger, provided the console pane has the focus, all input you type will go to your program (note that, by default, stdin is line buffered, so you'll only see output when you press the return key). If you hit control-d (not control-z), you're program will exit the loop and print -1 in the console window (which is what you expect because that is the value of EOF in OS X).
Here is the result when I ran your program without change in the Xcode debugger (I typed command-r in Xcode)
bgbgdfsfd
bgbgdfsfd
hgfdgf
hgfdgf
-1
Regular font was typed by me, bold font was from your program. At the end of each of the lines typed by me, I pressed carriage return. After your program printed hgfdgf I typed control-D. Your program then printed the value of the last thing it got from getchar() which was EOF which is -1 in the OS X C library.
Edit
If you are unsure that your program is printing the EOF, change your printf format string to (for example)
printf("Last character is [%d]\n", ch);
Then instead of -1 your program will output Last character is [-1] on the last line.
First of all ctrl+z does not input EOF to your program. If you hit ctrl+Z your shell will put your program to sleep.
Second, if you want to handle these ctrl+Z in your program you need to learn about signal handling in C.
And I think because you were hitting ctrl+Z you were not seeing any output on the screen.
Make sure you are sending the EOF signal, not a signal that actually terminates your program.
For example, for c program running in windows the EOF is represent by typing ctrl+z and pressing enter. Doing this will exit your while loop but still runs the rest of your program.
However, ctrl+c, which some people may have mistakenly tried for EOF, actually kills your program and will prevent the code behind your while loop from executing.
For mac you will need to find what is the input that corresponds to EOF, and make sure that is what you are sending through rather than the kill signal, which I suspect is what you are doing here.

EOF in Windows command prompt doesn't terminate input stream

Code:
#include <stdio.h>
#define NEWLINE '\n'
#define SPACE ' '
int main(void)
{
int ch;
int count = 0;
while((ch = getchar()) != EOF)
{
if(ch != NEWLINE && ch != SPACE)
count++;
}
printf("There are %d characters input\n" , count);
return 0;
}
Question:
Everything works just fine, it will ignore spaces and newline and output the number of characters input to the screen (in this program I just treat comma, exclamation mark, numbers or any printable special symbol character like ampersand as character too) when I hit the EOF simulation which is ^z.
But there's something wrong when I input this line to the program. For example I input this: abcdefg^z, which means I input some character before and on the same line as ^z. Instead of terminating the program and print out total characters, the program would continue to ask for input.
The EOF terminating character input only works when I specify ^z on a single line or by doing this: ^zabvcjdjsjsj. Why is this happening?
This is true in almost every terminal driver. You'll get the same behavior using Linux.
Your program isn't actually executing the loop until \n or ^z has been entered by you at the end of a line. The terminal driver is buffering the input and it hasn't been sent to your process until that occurs.
At the end of a line, hitting ^z (or ^d on Linux) does not cause the terminal driver to send EOF. It only makes it flush the buffer to your process (with no \n).
Hitting ^z (or ^d on Linux) at the start of a line is interpreted by the terminal as "I want to signal EOF".
You can observe this behavior if you add the following inside your loop:
printf("%d\n",ch);
Run your program:
$ ./test
abc <- type "abc" and hit "enter"
97
98
99
10
abc97 <- type "abc" and hit "^z"
98
99
To better understand this, you have to realize that EOF is not a character. ^z is a user command for the terminal itself. Because the terminal is responsible for taking user input and passing it to processes, this gets tricky and thus the confusion.
A way to see this is by hitting ^v then hitting ^z as input to your program.
^v is another terminal command that tells the terminal, "Hey, the next thing I type - don't interpret that as a terminal command; pass it to the process' input instead".
^Z is only translated by the console to an EOF signal to the program when it is typed at the start of a line. That's just the way that the Windows console works. There is no "workaround" to this behaviour that I know of.

Resources