Why is XKeysymToKeycode() making all of my keys lowercase? - c

I'm currently having a problem with Xlib where whenever I call XKeysymToKeycode() and pass in an uppercase KeySym, it returns a lowercase KeyCode. Google doesn't really seem to have an answer to this question, or too much documentation at all on the functions I'm using, for that matter.
Here's the code I am using:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <X11/keysym.h>
#include <X11/extensions/XTest.h>
int main(void) {
Display *display;
char *ptr;
char c[2] = {0, 0};
KeySym ksym;
KeyCode kcode;
display = XOpenDisplay(0);
ptr = "Test";
while (*ptr) {
c[0] = *ptr;
ksym = XStringToKeysym(c);
printf("Before XKeysymToKeycode(): %s\n", XKeysymToString(ksym));
kcode = XKeysymToKeycode(display, ksym);
printf("Key code after XKeysymToKeycode(): %s\n", XKeysymToString(XKeycodeToKeysym(display, kcode, 0)));
ptr++;
}
XCloseDisplay(display);
return 0;
}
It can be compiled with gcc -o sendkeys sendkeys_min.c -lX11 -lXtst -g -Wall -Wextra -pedantic -ansi (Assuming it has been saved as sendkeys_min.c.)
The current output is the following:
Before XKeysymToKeycode(): T
Key code after XKeysymToKeycode(): t
Before XKeysymToKeycode(): e
Key code after XKeysymToKeycode(): e
Before XKeysymToKeycode(): s
Key code after XKeysymToKeycode(): s
Before XKeysymToKeycode(): t
Key code after XKeysymToKeycode(): t
The expected output, is, of course, that the first T in "Test" is still uppercase after being ran through XKeysymToKeycode(). (Note that this is not my actual program, but a simplified version for posting here. In the actual program, I am sending key events with the resulting keycode, and the keys sent still have the problem exhibited here (They all become lowercase))

KeySyms and KeyCodes are semantically different, and there is not a 1-1 relationship between them.
A KeyCode is an arbitrary small integer representing a key on the keyboard. (Not a character. A key.) Xlib requires that key codes be in the range [8, 255], but fortunately most keyboards have only a bit more than 100 keys.
A KeySym is a representation of some actual character associated with a key. There will almost always be several of these: lower- and upper-case letters correspond to the same key on most terminal layouts.
So there is no such thing as an "upper-case" or "lower-case" KeyCode. When you get the KeyCode corresponding to a Keysym, you are actually losing information.
In Xlib, a given key has at least four corresponding KeySyms (lower-case, upper-case, alternate lower-case, alternate upper-case), although some might be unassigned. When you ask for the KeySym corresponding to a KeyCode, you need to supply an index; index 0 (as in your code) will get the unshifted unmodified character.
For a given keypress, the translation to a KeySym will take into account the state of the modifier keys. There are eight of these, including the Shift and Lock modifiers. Ignoring Lock, which complicates the situation, the shift modifier key would normally turn lower-case letters into their upper-case equivalents (for alphabetic keys).
Keyboard handling is much more complicated than that brief summary, but it's a start.
For your task, you probably should take a look at XkbKeysymToModifiers.

Related

KDGKBENT returns wrong keysym values?

Example code:
#include <stdio.h>
#include <stdlib.h>
#include <linux/keyboard.h>
#include <sys/ioctl.h>
#include <sys/kd.h>
int
main(int argc, char **argv) {
struct kbentry ke;
ke.kb_table = (unsigned char)atoi(argv[1]);
ke.kb_index = (unsigned char)atoi(argv[2]);
ioclt(0, KDGKBENT, &ke);
printf("keycode %u = %04x\n", ke.kb_index, ke.kb_value);
return 0;
}
When I try to get the value of a keycode using e.g. the code above, KDGKBENT returns strange values. It adds a '0B' to ASCII characters: 0x0B61 for 'a' instead of 0x0061, 0x0B41 for 'A' instead of 0x0041.
I cannot find any answer regarding to what this happens on the internet.
I only found the same question, without any answer there:
https://www.unix.com/unix-for-advanced-and-expert-users/178627-questions-about-linux-console-keyboard-driver-translation-tables.html
Those values in 0x0Bxx do not appear when running dumpkeys -l (alphabet has normal ASCII values), nor in this list:
https://wiki.linuxquestions.org/wiki/List_of_keysyms
Why does this happen? And how am I supposed to get a proper conversion?
Actually, looking carefully at dumpkeys tables, the alphabet keys symbols are '+a', '+A' etc. i.e. they are Caps Lock conditioned to change their case. Could be the explanation behind '0x0B' but I need to find a confirmation about this theory.

Non-spacing characters in curses

I was trying to write a basic program to print ā (a with overline) in C using curses and non-spacing characters. I have set the locale to en_US.UTF-8 and I am able to print international language characters using that. This code only prints a without overline. I am getting similar results with ncurses too. What else do I need to do to get ā on screen?
#include <curses.h>
#include <locale.h>
#include <wchar.h>
#include <assert.h>
int main() {
setlocale(LC_ALL, "");
initscr();
int s = 0x41; // represents 'a'
int ns = 0x0305; // represents COMBINING OVERLINE (a non-spacing character)
assert(wcwidth(ns) == 0);
wchar_t wstr[] = { s, ns, L'\0'};
cchar_t *cc;
int x = setcchar(cc, wstr, 0x00, 0, NULL);
assert(x == 0);
add_wch(cc);
refresh();
getch();
endwin();
return 0;
}
The curses calls need a pointer to data, not just a pointer.
It's okay to pass a null-terminated array for the wide-characters, but the pointer for the cchar_t data needs some repair.
Here's a fix for the program:
> diff -u foo.c.orig foo.c
--- foo.c.orig 2020-05-21 19:50:48.000000000 -0400
+++ foo.c 2020-05-21 19:51:46.799849136 -0400
## -3,7 +3,7 ##
#include <wchar.h>
#include <assert.h>
-int main() {
+int main(void) {
setlocale(LC_ALL, "");
initscr();
int s = 0x41; // represents 'a'
## -12,11 +12,11 ##
assert(wcwidth(ns) == 0);
wchar_t wstr[] = { s, ns, L'\0'};
- cchar_t *cc;
- int x = setcchar(cc, wstr, 0x00, 0, NULL);
+ cchar_t cc;
+ int x = setcchar(&cc, wstr, 0x00, 0, NULL);
assert(x == 0);
- add_wch(cc);
+ add_wch(&cc);
refresh();
getch();
That produces (on xterm) a "A" with an overbar:
(For what it's worth, 0x61 is "a", while 0x41 is "A").
Your code is basically correct aside from the declaration of cc. You'd be well-advised to hide the cursor, though; I think it is preventing you from seeing the overbar incorrectly rendered in the following character position.
I modified your code as follows:
#include <curses.h>
#include <locale.h>
#include <wchar.h>
#include <assert.h>
int main() {
setlocale(LC_ALL, "");
initscr();
int s = 0x41; // represents 'A'
int ns = 0x0305; // represents COMBINING OVERLINE (a non-spacing character)
assert(wcwidth(ns) == 0);
wchar_t wstr[] = { s, ns, L'\0'};
cchar_t cc; /* Changed *cc to cc */
int x = setcchar(&cc, wstr, 0x00, 0, NULL); /* Changed cc to &cc */
assert(x == 0);
set_curs(0); /* Added to hide the cursor */
add_wch(&cc); /* Changed cc to &cc */
refresh();
getch();
endwin();
return 0;
}
I tested on a kubuntu system, since that's what I have handy. The resulting program worked perfectly on xterm (which has ugly fonts) but not on konsole. On konsole, it rendered the overbar in the following character position, which is clearly a rendering bug since the overbar appears on top of the following character if there is one. After changing konsole's font to Liberation Mono, the test program worked perfectly.
The rendering bug is not going to be easy to track down because it is hard to reproduce, although from my experiments it seems to show up reliably when the font is DejaVu Sans Mono. Curiously, my system is set up to use non-spacing characters from DejaVu Sans Mono as substitutes in other fonts, such as Ubuntu Mono, and when these characters are used as substitutes, the spacing appears to be correct. However, Unicode rendering is sufficiently intricate that I cannot actually prove that the substitute characters really come from the configured font, and the rendering bug seems to come and go. It may depend on the font cache, although I can't prove that either.
If I had more to go on I'd file a bug report, and if I get motivated to look at this some more tomorrow, I might find something. Meanwhile, any information that other people can provide will undoubtedly be useful; at a minimum, that should include operating system and console emulator, with precise version numbers, and a list of fonts tried along with an indication whether they succeeded or not.
It's not necessary to use ncurses to see this bug, by the way. It's sufficient to test in your shell:
printf '\u0041\u0305\u000a'
will suffice. I found it interesting to test
printf '\u0041\u0305\u0321\u000a'
as well.
The system I tested it on:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
$ konsole --version
konsole 17.12.3
$ # Fonts showing bug
$ otfinfo -v /usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf
Version 2.37
$ # Fonts not showing bug
$ otfinfo -v /usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf
Version 1.07.4
There are multiple issues here. First, you're storing the result of setcchar to random memory at an uninitialized pointer, cc. Whenever a function takes a pointer for output, you need to pass the address of an object where the result will be stored, not an uninitialized pointer variable. The output must be an array of sufficient length to store the number of characters in the input. I'm not sure what the null termination convention is so to be safe I'd use:
cchar_t cc[3];
int x = setcchar(cc, wstr, 0x00, 0, NULL);
Then, the add_wch function takes only a single character to add, and replaces or appends based on whether it's a spacing or non-spacing character. So you need to call it once for each character.

How to Build Curses Program That Supports More Than 223 Columns of Mouse Input

I'm trying to get a curses program working with my terminal spanning my monitor. However, the x coordinate can't move past the 223rd column, instead it loops around. In the source, this seems to be due to them being defined as 8-bits, and having the position values start only after the first 32 values (i.e. x = raw_x - ' ').
Here's an example program from https://gist.github.com/sylt/93d3f7b77e7f3a881603 that demonstrates the issue when compiled with libncurses5. In it, if your cursor moves more than 233 columns to the right of the window, the x value will loop back over to 0 - ' ', i.e. -32
#include <curses.h>
#include <stdio.h>
int main()
{
initscr();
cbreak();
noecho();
// Enables keypad mode. This makes (at least for me) mouse events getting
// reported as KEY_MOUSE, instead as of random letters.
keypad(stdscr, TRUE);
// Don't mask any mouse events
mousemask(ALL_MOUSE_EVENTS | REPORT_MOUSE_POSITION, NULL);
printf("\033[?1003h\n"); // Makes the terminal report mouse movement events
for (;;) {
int c = wgetch(stdscr);
// Exit the program on new line fed
if (c == '\n')
break;
char buffer[512];
size_t max_size = sizeof(buffer);
if (c == ERR) {
snprintf(buffer, max_size, "Nothing happened.");
}
else if (c == KEY_MOUSE) {
MEVENT event;
if (getmouse(&event) == OK) {
snprintf(buffer, max_size, "Mouse at row=%d, column=%d bstate=0x%08lx",
event.y, event.x, event.bstate);
}
else {
snprintf(buffer, max_size, "Got bad mouse event.");
}
}
else {
snprintf(buffer, max_size, "Pressed key %d (%s)", c, keyname(c));
}
move(0, 0);
insertln();
addstr(buffer);
clrtoeol();
move(0, 0);
}
printf("\033[?1003l\n"); // Disable mouse movement events, as l = low
endwin();
return 0;
}
for the curious, you can build this with gcc file.c -lcurses
How do I workaround this? I can use vim in full-screen mode mode, and tmux mouse interactions also work. These both depend on ncurses, so it must be fixed somehow. I tried reading their source for hours and attempting samples of what I thought would work. I've also tried several printf() terminal modes, but none seem to enable this mode. How can I get my mouse event to hold more than 8 bits, and thus let the columns field hold values larger than 232?
That's a terminal-dependent feature (not an ncurses limitation as such). The original xterm protocol dating from the late 1980s encodes each ordinate in a byte, reserving the first 32 for control characters. That gives 256 - 32 = 223.
xterm introduced an experimental feature in 2010 to extend the range. There is an ncurses terminal description "xterm-1005" which uses that. Some criticized that, and xterm introduced an different feature in 2012. Again, there is a "xterm-1006" using that feature.
The descriptions in ncurses were added in 2014. ncurses 6 was released in 2015, and still supports (by compile-time options) the ABI 5 for ncurses 5. If your "ncurses5" is at least as new as the changes in 2014, the library supports SGR 1006 without change.
The reason for not making one of those part of the default "xterm" is that portability across the various xterm imitators is poor (as is their documentation), and that would only increase bug reports. But if you happen to be using one of the terminals (such as xterm...) which support the SGR 1006 feature, that's supported in the ncurses library.

Weird key values printed by ncurses

I am doing a little program in C with the ncurses library on Linux.
I decided to check the input I received with the getch() function, more specifically, the backspace key.
The backspace ASCII decimal value is 127, link: here
I decided to print the numerical decimal value of the keys I pressed, for example:
a -> 97
A -> 65
] -> 93
...
The latter are correct.
However, the following values are not correct:
Backspace -> 7 (which is BELL)
Supr -> 74 (which is 'J')
Here is the test code:
#include <curses.h>
int main(int argc, char **argv)
{
char ch;
int column,line;
int s_column,s_line;
initscr();
clear();
noecho();
raw();
keypad(stdscr,TRUE);
printw("Type: \n> ");
refresh();
getyx(stdscr,s_line,s_column);
while((ch=getch())!='\n')
{
printw("%d",ch);
addch(ch);
refresh();
}
endwin();
return 0;
}
NOTE: changing raw() to cbreak() generates the same output
Output test: (note: I type: 'a','A',(Backspace),(Supr),'J')
Type:
> 97a65A7^G74J74J
I don't understand why this is happening, can somebody explain why the Backspace key outputs 7 instead of 127, and Supr outputs 74, which is the same sa 'J'?
For special function keys, getch() doesn't necessarily return the ASCII character, it returns one of the KEY_xxx codes in <curses.h>. In the case of Backspace, this is:
#define KEY_BACKSPACE 0407 /* backspace key */
Since you declare ch as char rather than int, the value 0407 is being truncated to 07.
Change the declaration to:
int ch;
and then it will display 263 when you press Backspace. addch() will still display ^G, though, because it doesn't use the KEY_xxx macros. You need to handle these characters in your code.
I believe the "special" keys are generating multi-character readings, which explains the ^ in the output.
See caret notation for more.

Changing a variable's value through the stack

Okay we are given the following code:
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#include "callstack.h"
#include "tweetIt.h"
#include "badguy2.c"
static char *correctPassword = "ceriaslyserious";
char *message = NULL;
int validateSanity(char *password) {
for(int i=0;i<strlen(password);i++)
if(!isalpha(password[i]))
return 0;
unsigned int magic = 0x12345678;
return badguy(password);
}
int validate(char *password) {
printf("--Validating something\n", password);
if (strlen(password) > 128) return 0;
char *passwordCopy = malloc(strlen(password) + 1);
strcpy(passwordCopy, password);
return validateSanity(passwordCopy);
}
int check(char *password, char *expectedPassword) {
return (strcmp(password, expectedPassword) == 0);
}
int main() {
char *password = "wrongpassword";
unsigned int magic = 0xABCDE;
char *expectedPassword = correctPassword;
if (!validate(password)) {
printf("--Invalid password!\n");
return 1;
}
if (check(password, expectedPassword)) {
if (message == NULL) {
printf("--No message!\n");
return 1;
} else {
tweetIt(message, strlen(message));
printf("--Message sent.\n");
}
} else {
printf("--Incorrect password!\n");
}
return 0;
}
We are supposed to trick main into sending a tweet using the function badguy. In badguy we have an offset from a previous problem which is the difference between the declaration of password in main and the argument passed to badguy. We have been instructed to use this offset to find the addresses of the correctPassword and password in main and manipulate the value in password to correctPassword so when the password check occurs, it is believed to be legitimate. I am having some trouble figuring out how to use this offset to find the addresses and continuing from there.
First of all, make sure you have good control over your compiler behavior. That is: make sure you know the calling conventions and that they're being respected (not optimized away or altered in any manner). This usually boils down to turn off optimization settings, at least for testing under more controlled conditions until a robust method is devised. Pay special attention to variables such as expectedPassword, since it is highly likely they'll be optimized away (expectedPassword might never be created in the stack, being substituted with the equivalent of correctPassword, rendering you with no stack reference to the correct password at all).
Secondly, note that "wrongpassword" is shorter than "ceriaslyserious"; in other words, if I got it straight, attempting to crack into the buffer pointed to by passwordCopy (whose size is the length of "wrongpassword" plus one) in order to copy "ceriaslyserious" into there could result in a segmentation violation. Nonetheless, it should be relatively simple to track the address of expectedPassword in the call stack, if it exists (see above), specially if you do have already an offset from main()'s stack frame.
Considering an x86 32-bit target under controlled circumstances, expectedPassword will reside 8 bytes below password (4 for password, 4 for magic if it is not optimized away). Having an offset from password to a parameter as you said, it should suffice to subtract the offset from the address of that parameter, and then add 8. The resulting pointer should be expectedPassword, which then points to the static area containing the password. Again, double check your environment. Check this for an explanation on the stack layout in x64 (the layout in the 32-bit case is similar).
Lastly, if expectedPassword does not exist in the call stack, then, since correctPassword is a global static, it will reside in a data segment, rendering the method useless. To achieve the goal in this situation, you would need to carefully scan the data segment with a more intelligent algorithm. It would probably be easier, though, to simply attempt to find the test for check()'s return value in the program text and replace with nops (after properly manipulating the page permissions to allow writing to the text segment).
If you're having problems, inspecting the resulting assembly code is the way to go. If you're using GCC, gcc -S halts the compilation just before assembling (that is, producing an assembly source code file as output). objdump -d could also help. gdb can step between instructions, show the disassembly of a frame and display register contents; check the documentation.
These exercises are specially useful to understand how security breaches occur in common programs, and to provide some basic notions on defensive programming.

Resources