I'm trying to understand how terminal I/O works.
When a terminal is placed in non-canonical mode like so (missing error handling):
struct termios term_original, term_current;
tcgetattr(STDIN_FILENO, &term_original);
term_current = term_original;
term_current.c_lflag &= ~(ICANON | ISIG | IEXTEN | ECHO);
term_current.c_iflag &= ~(BRKINT | ICRNL | IGNBRK | IGNCR | INLCR | INPCK | ISTRIP | IXON | PARMRK);
term_current.c_oflag &= ~(OPOST);
term_current.c_cc[VMIN] = 1;
term_current.c_cc[VTIME] = 0;
tcsetattr(STDIN_FILENO, TCSADRAIN, &term_current);
A simple read loop can read in the data generated by each button press like so:
char c;
while (read(0, &c, 1) != -1) { PRINT_CHAR(c); }
Now,
Pressing Esc on my keyboard generates: 0x1b.
Pressing F1 generates: 0x1b 0x4f 0x50.
Pressing F5 generates: 0x1b 0x5b 0x31 0x35 0x7e.
In terms of reading and processing this input, how does one determine where the output from one button press ends and the next one begins? I could find no discernible pattern, and the fact that Esc generates a single byte which is also identical to the first byte of output for most multi-byte generating button presses seems to suggest there is none. Is there some other mechanism for determining where the button boundaries are?
Programs rely on keys not being pressed too fast. If the delay is less than say 100ms, this is one key press; otherwise there are two separate events.
Yes program actually pause for some time after ESC is being pressed, in order to make sure it's ESC and no some other key. Sometimes this pause can be discerned with the naked eye.
Some programs recognize the ESCDELAY environment variable for fine-tuning this timing.
And yes this is not perfect, you can fool the system by pressing keys too fast.
Okay, thanks to n.m., I was set on the right track here.
Trying to read one byte at a time is incorrect. Rather one should attempt to read multiple characters at once.
Something like the following:
int r, i;
char buffer[10]; //10 chosen arbitrarily
while ((r = read(STDIN_FILENO, buffer, sizeof(buffer))) != -1)
{
printf("%d bytes: ", r);
for (i = 0; i < r; ++i) { PRINT_CHAR(buffer[i]); }
printf("\r\n");
}
In this case, the read() call will return as soon as a button is pressed, and will return the amount of bytes read. Now the bytes can be used to identify the button or character in question.
Pressing the top row of buttons using above loop, I'm seeing:
1 bytes: 1b
3 bytes: 1b 4f 50
3 bytes: 1b 4f 51
3 bytes: 1b 4f 52
3 bytes: 1b 4f 53
5 bytes: 1b 5b 31 35 7e
5 bytes: 1b 5b 31 37 7e
On my machine, I appear to be getting:
A single byte for ASCII characters.
0x1b as the first character, followed by other characters for special buttons (F1-F12, Up, Down, etc...).
Some other multi-byte sequence for non ASCII characters, which turns out to be the UTF-8 representation of the character in question.
I tried jamming down the buttons on my keyboard like a mad man, but the above loop was always able to identify correctly which bytes are a single unit.
However this may not work completely as desired on a heavily taxed machine, or over a buffered high latency network connection. Perhaps in those situations, more bytes from multiple latter button presses will have already found themselves in the terminal buffer, causing multiple buttons to appear as one.
In such a situation, there probably is no way to ensure errors won't occur, however they can be minimized. Single byte characters always appear to be in the range of 0x00-0x7F. Special buttons are always multi-byte and begin with 0x1B followed by something within 0x00-0x7F. Multi-byte characters are always in the range 0x80-0xFF. The UTF-8 encoding sequence also has the first byte indicate how many bytes are in the current character. Given this information, there's enough to ensure errors are minimal and do not propagate to upcoming reads unnecessarily.
Lastly, it's important to stress that what I described is for my machine (PC, classic US 101 keyboard, Terminal encoding set to UTF-8). A full program should minimally see what character encoding the terminal is using.
Ultimately, you must determine these by context. Based on the characters you receive after the escape, you can determine the overall sequence length of known sequences, then return to interpreting characters normally.
You should be able to look up the escape sequences for known terminals.
It is possible some of your function keys have locally configured expansions, especially if they do not match the codes for whatever variety of standard terminal is otherwise implemented.
Related
Why do we use \r and \b in C? I know that \b makes "hello\bworld" into "hellworld". but we can just write "hellworld" instead of "hello\bworld". So, what purpose does it serves in a program? This question describes what \b do , but doesn't clarify why it is used.
Its' to a large degree just historical reasons. Many characters can actually be traced back to analogue typewriters! In the past, you had computers with matrix printers instead of monitors. In those days, bold font was achieved by printing something twice in the same location.
For example, in Linux we use the character \n for a new line, but in Windows it is \r\n so what is those characters? Well \n (newline) is to move the head of the typewriter one line down and \r is the carriage return, which returns the typewriter carriage to the beginning of the line.
Many of these characters are not used very much anymore. They are mostly considered legacy. They are not really that useful in modern programming. You can use \b to go back and overwrite stuff that you have previously written, but today you would use libraries like ncurses to achieve the same thing. In the old days, you could actually use these to get pretty exact positioning of stuff, but on modern terminal emulators, that's no longer the case. For instance, old terminals had fixed sizes. The sizes may not have been standardized, but they did not change during runtime and were the same for the same machine every time you run a program.
I could consider using \b and \r if I wanted to write a cli application with some kind of progress bar. Example:
#include <stdio.h>
#include <unistd.h>
int main(void) {
int n = 0;
while(n<=100) {
printf("\rProgress: %d\%", n);
fflush(stdout);
sleep(1);
n+=10;
}
}
You could achieve the same thing with \b instead of \r, but mostly it's easier to just reprint the whole line. I cannot see any situation where I would use \b in code.
A similar thing can be done if you want to simulate human writing in a text based game. But I would say that these kind are mostly for the case where you don't have the time and/or energy to learn how to use proper modern methods.
Let's look at the first 32 characters in the ascii table:
0 Null char
1 Start of Heading
2 Start of Text
3 End of Text
4 End of Transmission
5 Enquiry
6 Acknowledgment
7 Bell
8 Back Space
9 Horizontal Tab
10 Line Feed
11 Vertical Tab
12 Form Feed
13 Carriage Return
14 Shift Out / X-On
15 Shift In / X-Off
16 Data Line Escape
17 Device Control 1 (oft. XON)
18 Device Control 2
19 Device Control 3 (oft. XOFF)
20 Device Control 4
21 Negative Acknowledgement
22 Synchronous Idle
23 End of Transmit Block
24 Cancel
25 End of Medium
26 Substitute
27 Escape
28 File Separator
29 Group Separator
30 Record Separator
31 Unit Separator
Almost all of these have been replaced by other things. For instance by higher level protocols, like tcp and such, but also by libraries like ncurses. In C, the null character is useful for strings, but it could have been solved in other ways, like making it possible to retrieve the size of an array when it's passed to a function.
You use them to overwrite something you wrote previously. You wouldn't normally use them in the same string, but in different output calls, e.g.
printf("hello");
fflush(stdout); // need to flush because we didn't write a newline
// do some stuff here
printf("\rgoodbye\n"); // This replaces hello
Those escape sequences are used today mainly for creating some kind of CLI-"GUI", for CLI-"animations", like showing a "loading progress" and some special tricks.
In the past those escape sequences were used mainly to control teleprinter and for punched cards.
For example, for deleting the last punched character on a punch card, you used:
putchar('\b'); putchar(127); /* DEL */.
\ris used to move the cursor in the beginning of above line and \n is used to move the cursor to beginning of next line.
#include <stdio.h>
#include <unistd.h>
#include<iostream>
using namespace std;
int main() {
cout<<"Hello";
cout<<"\r";
cout<<"Gone ";
}
Above program will not display Hello.
#include <stdio.h>
#include <unistd.h>
int main() {
int i=0;
int n=4;
char pattern[n];
pattern[0]=(char)179;
pattern[1]=(char)47;
pattern[2]=(char)196;
pattern[3]=(char)92;
long long int count=0;
while(count<100000000){
printf("%c",pattern[i]);
i=(i+1)%n;
printf("\r");
count++;
}
}
Test it by yourself.
\n" for new line
"\b" for a backspace, means if u print it, cursor will print and come back 1 character. For example.... cout<<"hello\bHi"; will print "HellHi". because after printing Hello, compiler found a \b escape sequence. so it came 1 character back (at 'o' of Hello) and stat printing Hi from o of Hello..... so u got HellHi instead of HelloHi.
'\r' is used for carriage return to come 1 line back or in start of line.
\b = it's purpose is backspace
\r = purpose is carriage return
\r is a carriage return character; it tells your terminal emulator to move the cursor at the start of the line.
The cursor is the position where the next characters will be rendered.
So, printing a \r allows to override the current line of the terminal emulator.
Whereas \b is backspace character it takes the cursor one character backward so that when you will type next thing the characters ahead of cursor will be overwritten.
Thus \b illustrates the working of backspace whereas carriage return simply moves the cursor to the starting of the current line.
I'm creating a program to implement a linux shell
I've changed terminal mod into non-canonical
void ft_getch_prepare(void)
{
int ret;
struct termios new_opts;
ret = tcgetattr(STDIN_FILENO, &new_opts);
new_opts.c_lflag &= ~(ICANON | ECHO | ECHOE | ECHOK
| ECHONL | ECHOPRT | ECHOKE | ICRNL);
new_opts.c_cc[VMIN] = 1;
new_opts.c_cc[VTIME] = 1;
ret += tcsetattr(STDIN_FILENO, TCSANOW, &new_opts);
}
int ft_getch(void)
{
int c;
c = 0;
ft_getch_prepare();
read(0, &c, 4);
return (c);
}
but when I want to copy a string and paste it, it only show the first character of the copied string
For example, I want to paste this string "HELLO WORLD" into my terminal, but
it only shows the first character "H"
If I complete your program with
int main()
{
int i = ft_getch();
printf("%x\n", i);
}
I get
$ ./a.out
4c4c4548
when I try to paste HELLO WORLD which is what I expect. (48 is the hexadecimal code for H, 45 for E, 4C for L; it look reversed as I'm on a little endian architecture).
The ICRNL flag constant applies to c_iflag, not c_lflag. You are turning it off in the wrong place. It is unclear to me why you're turning it off at all, but if you want to do so, then you need to modify the correct flag set.
The ECHOE, ECHOL, ECHONL, ECHOPRT, and ECHOKE local-mode flags are only relevant in canonical mode, which you are turning off. It should not be harmful to turn these off, too, but it does make your code harder to read and follow than it needs to be.
With respect to
when I want to copy a string and paste it, it only show the first character of the copied string
, I suspect you're being bitten by the input timer and / or minimum character count properties of non-canonical mode. These are controlled by the c_cc[VTIME] and c_cc[VMIN] elements of the "special characters" array in your termios structure. If you are configuring a terminal that will support interactive input, or for which there may otherwise be unbounded-length pauses in input, then you need to turn off the timer and ensure that reads block properly by setting
new_opts.c_cc[VTIME] = 0;
new_opts.c_cc[VMIN] = 1;
. I am uncertain whether that will be sufficient for your purposes, however, in part because I cannot judge whether the manner in which you are reading the input contributes to the issue.
UPDATE:
Since you've now disclosed your input function, I can say that you indeed do have significant problems there if it is supposed to provide an interface equivalent to getc(). You are reading four bytes at a time instead of one, and you are not handling EOF or errors properly. Moreover, the multi-byte read introduces the possibility of short reads, which you do not detect or handle.
If you're trying to read a single character at a time, then do that. The return value of getc() is int instead of char not because it's appropriate to try to read ints from the stream but to provide for result values that are not valid chars -- specifically, EOF.
I decline to rewrite your code for you, but to emulate getc(), it needs to do this:
read a single char at a time
check the return value of read. If it is anything other than 1 (for a one-character read) then return EOF
otherwise, return the char read, converted to type unsigned char.
I would like to know how to get the cursor position (x, y) in my program, without writing anything on the screen neither tracking it all the time.
I found out a way to get its position with this function (I don't check the return of read, write, etc here to write a smaller code on this subject but I do it in my program):
void get_cursor_position(int *col, int *rows)
{
int a = 0;
int i = 0;
char buf[4];
write(1, "\033[6n", 4); // string asking for the cursor position
read(1, buf, 4);
while (buf[i])
{
if (buf[i] >= 48 && buf[i] <= 57)
{
if (a == 0)
*rows = atoi(&buf[i]) - 1;
else
*col = atoi(&buf[i]) - 1;
a++;
}
i++;
}
}
This function gives me the exact cursor position (*rows = y, *col = x), but it writes on the screen.
How can I get the cursor position without writing anything on the screen?
(If the cursor is on one of the printed characters, it will overwrite it.)
Should echo be toggled before and after sending the escape sequence?
This is a school project, so I only can use termcap, I can't use ncurses functions, the only allowed functions are tputs, tgoto, tgetstr, tgetnum, tgetflag.
There are several problems:
canonical mode is buffered (see below)
the read is done on the file-descriptor for standard output (that may happen to work — sometimes — but don't count on it)
the read does not read enough characters to get a typical response
the response would have two decimal integers, separated by semicolon ;
the response would have a final character (which would become an issue if the read actually asked for enough characters...)
Further reading:
General Terminal Interface The Single UNIX ® Specification, Version 2
In canonical mode input processing, terminal input is processed in units of lines. A line is delimited by a newline character (NL), an end-of-file character (EOF), or an end-of-line (EOL) character. See Special Characters for more information on EOF and EOL. This means that a read request will not return until an entire line has been typed or a signal has been received. Also, no matter how many bytes are requested in the read() call, at most one line will be returned. It is not, however, necessary to read a whole line at once; any number of bytes, even one, may be requested in a read() without losing information.
XTerm Control Sequences
CSI Ps n Device Status Report (DSR).
Ps = 5 -> Status Report.
Result ("OK") is CSI 0 n
Ps = 6 -> Report Cursor Position (CPR) [row;column].
Result is CSI r ; c R
That is, your program should be prepared to read Escape[ followed by two decimal integers (with no fixed limit on their length), and two other characters ; and R.
By the way, termcap by itself will do little for your solution. While ncurses has some relevant capabilities defined in the terminal database:
# u9 terminal enquire string (equiv. to ANSI/ECMA-48 DA)
# u8 terminal answerback description
# u7 cursor position request (equiv. to VT100/ANSI/ECMA-48 DSR 6)
# u6 cursor position report (equiv. to ANSI/ECMA-48 CPR)
few programs use those, and in any case you would find it difficult to use the cursor position report in a termcap application.
I am testing a method to simulate a specific input to an application.
This is the application:
#include <stdio.h>
int main()
{
int num1;
char buffer[6] = {0};
scanf("%d", &num1);
read(0, buffer, 6);
printf("num1 = %d\n", num1);
for(num1=0; num1 < 6; num1++)
{
printf("%02X\n", buffer[num1]);
}
return 0;
}
I am trying to simulate the input using the following bash command:
echo -ne "1337\\x0A\\x31\\x02\\x03\\x04\\x05\\x06" | ./test
The output I get is the following:
num1 = 1337
00
00
00
00
00
00
As you can see, the buffer was not filled with values passed to the STDIN.
EDIT:
The function below is only used to illustrate an idea of input automation in mixed i/o functions, I got this function by reverse engineering a binary file, is it possible to automate the input ?
I appreciate your help.
Thanks,
You are mixing scanf() (section 3 man page) and read() (section 2 man page). The scanf() series of functions perform buffered reads and writes. The section 2 read() is unbuffered. The bytes you are trying to read with read() have already been read and put into the buffer scanf() is using.
If you comment out your scanf() line, and change your command to
echo -ne "\\x0A\\x31\\x02\\x03\\x04\\x05\\x06" | ./test
you will get
num1 = 0
0A
31
02
03
04
05
So just use the buffered functions or the unbuffered functions. Pick one.
What's happening here?
echo is making a write with exactly as many characters you put in the command line (as you use -n flag it doesn't output a final \n char).
you are using scanf() on a pipe, so it makes a full buffer read first, and then scans the buffer for an integer. It does a n = read(0, buffer, BUFSIZ); returning 11 as the number of characters read to scanf and then scanf scans the buffer, returning 1337 as the number read and leaving all the characters past it into the buffer.
then you do a read(0, buffer, 6); that returns 0 and the buffer is not initialized with data.
then you print the previous, uninitialized buffer contents to stdout.
fifos (or pipes) behave quite different than terminals on input. The terminal driver just makes a read to complete when you push the enter key, making read to get the actual number of characters read by one input line. When you do this with a fifo, the reader process is blocked until enough characters (the actual number requested by read) is fed to the process and then that number (actually the requested number of characters) is returned by read.
If you had the precaution of checking the read(2) return value, you should get the actual number of read chars (that should be 0 as the scanf(3)
has eaten the complete buffer as it is smaller than BUFSIZ constant, in the first read)
Never mix stdio functions like scanf with low level io functions like read. Stdio maintains a buffer for efficiency and if any extra data is available when you call scanf it is likely going to read it to fill the buffer.
Use fread instead of read. Also, don't forget to check the return values of these functions.
I was experimenting on creating BMP files from scratch, when I found a weird bug I could not explain. I isolated the bug in this minimalist program:
int main()
{
FILE* ptr=NULL;
int success=0,pos=0;
ptr=fopen("test.bin","w");
if (ptr==NULL)
{
return 1;
}
char c[3]={10,11,10};
success=fwrite(c,1,3,ptr);
pos=ftell(ptr);
printf("success=%d, pos=%d\n",success,pos);
return 0;
}
the output is:
success=3, pos=5
with hex dump of the test.bin file being:
0D 0A 0B 0D 0A
In short, whatever value you put instead of 11 (0x0B), fwrite will write it correctly. But for some reason, when fwrite comes across a 10 (0x0A) - and precisely this value - it writes 0D 0A instead, that is, 2 bytes, although I clearly specified 1 byte per write in the fwrite arguments. Thus the 3 bytes written, as can be seen in the success variable, and the mysterious 5 in the ftell result.
Could someone please tell me what the heck is going on here...and why 10, why not 97 or 28??
Thank you very much for your help!
EDIT: oh wait, I think I have an idea...isn't this linked to \n being 0A on Unix, and 0D 0A on windows, and some inner feature of the compiler converting one to the other? how can I force it to write exactly the bytes I want?
Your file was opened in text mode so CRLF translation is being done. Try:
fopen("test.bin","wb");
You must be working on the Windows machine. In Windows, EOL is CR-LF whereas in Unix, it is a single character. Your system is replacing 0A with 0D0A.