I am exploring shellcode. I wrote an example program as part of my exploration.
Using objdump, I got the following shellcode:
\xb8\x0a\x00\x00\x00\xc
for the simple function:
int boo()
{
return(10);
}
I then wrote the following program to attempt to run the shellcode:
#include <stdio.h>
#include <stdlib.h>
unsigned char code[] = "\xb8\x0a\x00\x00\x00\xc3";
int main(int argc, char **argv) {
int foo_value = 0;
int (*foo)() = (int(*)())code;
foo_value = foo();
printf("%d\n", foo_value);
}
I am compiling using gcc, with the options:
-fno-stack-protector -z execstack
However, when I attempt to run, I still get a segfault.
What am I messing up?
You're almost there!
You have placed your code[] outside of main, it's a global array. Global variables are not placed on the stack. They can be placed:
In the BSS section if there are not initialized
In the data section if there are initialized and access in both
read/write
In the rodata section if there are only accessed in read
Let's verify this You can use readelf command to check all the sections of your binary (I only show the ones we are interested in):
$ readelf -S --wide <your binary>
There are 31 section headers, starting at offset 0x39c0:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[...]
[16] .text PROGBITS 0000000000001060 001060 0001a5 00 AX 0 0 16
[...]
[18] .rodata PROGBITS 0000000000002000 002000 000008 00
[...]
[25] .data PROGBITS 0000000000004000 003000 000017 00 WA 0 0 8
[...]
[26] .bss NOBITS 0000000000004017 003017 000001 00 WA 0 0 1
Then we can look for your symbol code in your binary:
$ readelf -s <your binary> | grep code
66: 0000000000004010 7 OBJECT GLOBAL DEFAULT 25 code
This confirms that your variable/array code is in .data section, which doesn't present the X flag, so you cannot execute code from it.
From there, the solution is obvious, place your array in your main function:
int main(int argc, char **argv) {
uint8_t code[] = "\xb8\x0a\x00\x00\x00\xc3";
int foo_value = 0;
int (*foo)() = (int(*)())code;
foo_value = foo();
printf("%d\n", foo_value);
}
However, this may also not work!
Your C compiler may find that yes, you are using code, but never reading from it anything, so it will optimize it and simply allocate it on the stack without initializing it. This is what happens with my version of GCC.
To force the compiler to not optimize the array, use volatile keyword.
int main(int argc, char **argv) {
volatile uint8_t code[] = "\xb8\x0a\x00\x00\x00\xc3";
int foo_value = 0;
int (*foo)() = (int(*)())code;
foo_value = foo();
printf("%d\n", foo_value);
}
In a real use-case, your array would be allocated on the stack and sent as a parameter to another function which itself would modify the array content with shellcode. So you wouldn't encounter such compiler optimization issue.
Related
This question already has answers here:
Possible to know section of memory a variable is located?
(2 answers)
Closed 2 years ago.
Why are function addresses nearly the same as the address of static global variables or dynamically allocated variables? Here is the code for demonstration:
#include <stdio.h>
#include <stdlib.h>
int global_var;
int global_var1;
int global_var2;
static int st_var = 3;
void func()
{
return;
}
int main(void)
{
int x;
int* x_m = malloc(sizeof(int));
printf("Malloc: %p\n", x_m);
printf("Local: %p\n", &x);
printf("Function: %p\n", &func);
printf("Global: %p\n", &global_var);
printf("Global: %p\n", &global_var1);
printf("Global: %p\n", &global_var2);
printf("Static: %p\n", &st_var);
free(x_m);
return 0;
}
Output:
Malloc: 0x55bede9ce2a0
Local: 0x7ffdbc67b25c
Function: 0x55bede7151a9
Global: 0x55bede718024
Global: 0x55bede718030
Global: 0x55bede718020
Static: 0x55bede718010
Can somebody explain this? Because I thought that just global and static variables are stored into the .bss segment.
This is because, usually, the .text section (containing function code) and the .bss section of an ELF executable are mapped "relatively near" each other.
You can check this with readelf:
$ gcc prog.c
$ readelf -S a.out
There are 29 section headers, starting at offset 0x1ac0:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[14] .text PROGBITS 00000000000007e0 000007e0
0000000000000302 0000000000000000 AX 0 0 16
...
[24] .bss NOBITS 0000000000201010 00001010
0000000000000010 0000000000000000 WA 0 0 8
...
You can see from above from the "Address" field of .text and .bss that they will be loaded 0x201010-0x7e0 = 0x200830 bytes apart in virtual memory when the program runs.
In any case, this does not mean that your code is in the .bss section or that your variables are in the .text section. They are in two different yet "relatively near" sections.
The distance between the two is arbitrary, there is no real minimum or maximum requirement dictated by the ELF specification. You could write your own linker script to place them farther away if you really want.
I was using objdump here, and using the -x flag, I saw that the sections were with some 2 ** 0. What would somebody be? Has any practical effect on this align
#include <stdio.h>
int main(void) {
char *x = "section"; // .rodata algn 2**0
return 0;
}
On Linux, I would like to store some structures in a custom .note.foobar section and discover them at runtime.
I compile and link the program below once with gold and once without:
$ gcc -o test-ld test.c
$ gcc -o test-gold -fuse-ld=gold test.c
You can see that the ld-linked version finds the section while the gold-linked version does not:
$ ./test-ld
note section at vaddr: 2c4
note section at vaddr: 2f0
found f00dface
note section at vaddr: 324
note section at vaddr: 7a8
note section at vaddr: 270
note section at vaddr: 1c8
$ ./test-gold
note section at vaddr: 254
note section at vaddr: 7a8
note section at vaddr: 270
note section at vaddr: 1c8
However, the section does exist in both binaries:
$ readelf -x .note.foobar test-ld
Hex dump of section '.note.foobar':
0x000002f0 04000000 14000000 67452301 666f6f00 ........gE#.foo.
0x00000300 cefa0df0 00000000 00000000 00000000 ................
0x00000310 04000000 14000000 67452301 666f6f00 ........gE#.foo.
0x00000320 efbeadde ....
$ readelf -x .note.foobar test-gold
Hex dump of section '.note.foobar':
0x00000280 04000000 14000000 67452301 666f6f00 ........gE#.foo.
0x00000290 cefa0df0 00000000 00000000 00000000 ................
0x000002a0 04000000 14000000 67452301 666f6f00 ........gE#.foo.
0x000002b0 efbeadde ....
So you would expect the test-gold program to report a section at vaddr 280, but it does not.
Why can dl_iterate_phdr not find this section, while readelf can, and what is gold doing differently to cause this?
#define _GNU_SOURCE
#include <link.h>
#include <stdlib.h>
#include <stdio.h>
typedef struct {
unsigned int elf_namesize;
unsigned int elf_datasize;
unsigned int elf_type;
unsigned int elf_name;
unsigned int bar;
} foo_t;
const foo_t __attribute__((used,section(".note.foobar,\"a\"#"))) foo1 = {
4,
20,
0x01234567,
0x6f6f66,
0xf00dface,
};
const foo_t __attribute__((used,section(".note.foobar,\"a\"#"))) foo2 = {
4,
20,
0x01234567,
0x6f6f66,
0xdeadbeef,
};
static int
callback(struct dl_phdr_info *info, size_t size, void *data)
{
for (int i = 0; i < info->dlpi_phnum; i++) {
const ElfW(Phdr)* phdr = &info->dlpi_phdr[i];
if (phdr->p_type == PT_NOTE) {
foo_t *payload = (foo_t*)(info->dlpi_addr + phdr->p_vaddr);
printf("note section at vaddr: %lx\n", phdr->p_vaddr);
if (phdr->p_memsz >= sizeof(foo_t) && payload->elf_type == 0x01234567 && payload->elf_name == 0x6f6f66) {
printf("found %x\n", payload->bar);
}
}
}
return 0;
}
int
main(int argc, char *argv[])
{
dl_iterate_phdr(callback, NULL);
return 0;
}
This code:
foo_t *payload = (foo_t*)(info->dlpi_addr + phdr->p_vaddr);
assumes that your .note.foobar is the very first Elf...Note in the PT_NOTE segment, but you can't make that assumption -- the order of notes in PT_NOTE is not guaranteed; you need to iterate over all of them.
You can verify that there are multiple notes with readelf -n test-{ld,gold}.
It appears that GNU-ld emits a separate PT_NOTE for each .note* section, while Gold merges them all into a single PT_NOTE segment. Either behavior is perfectly fine as far as ELF standard is concerned, though GNU-ld is wasteful (there is no need to emit extra PT_NOTE program headers).
Here is what I get for your test program:
readelf -l test-ld | grep NOTE
NOTE 0x00000000000002c4 0x00000000004002c4 0x00000000004002c4
NOTE 0x00000000000002f0 0x00000000004002f0 0x00000000004002f0
NOTE 0x0000000000000324 0x0000000000400324 0x0000000000400324
readelf -l test-gold | grep NOTE
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
P.S.
Why does the gold linker cause dl_iterate_phdr() not to return my custom note section?
The direct answer is that dl_iterate_phdr doesn't deal with (or care) about sections. It iterates over segments, and assignment of sections to segments is up for linkers to perform as they see fit.
Recently,I learned that the .bss segment store uninitialized data. However, when I try a small program as below and use size(1) command in terminal, the .bss segment didn't change, even if I add some global variables. Do I misunderstand something?
jameschu#aspire-e5-573g:~$ cat test.c
#include <stdio.h>
int main(void)
{
printf("hello world\n");
return 0;
}
jameschu#aspire-e5-573g:~$ gcc -c test.c
jameschu#aspire-e5-573g:~$ size test.o
text data bss dec hex filename
89 0 0 89 59 test.o
jameschu#aspire-e5-573g:~$ cat test.c
#include <stdio.h>
int a1;
int a2;
int a3;
int main(void)
{
printf("hello world\n");
return 0;
}
jameschu#aspire-e5-573g:~$ gcc -c test.c
jameschu#aspire-e5-573g:~$ size test.o
text data bss dec hex filename
89 0 0 89 59 test.o
This is because the way global variables work.
The problem that is being solved is that it is possible to declare a global variable, without initializing it, in several .c files and not getting a duplicate symbol error. That is, every global uninitialized declaration works like a weak declaration, that can be considered external if no other declaration contains an initialization.
How it this implemented by the compiler? Easy:
when compiling, instead of adding that variable in the bss segment it will be added to the COMMON segment.
when linking, however, it will merge all the COMMON variables with the same name and discard anyone that is already in other section. The remaining ones will be moved to the bss of the executable.
And that is why you don't see your variables in the bss of the object file, but you do in the executable file.
You can check the contents of the object sections using a more modern alternative to size, such as objdump -x. And note how the variables are placed in *COM*.
It is worth noting that if you declare your global variable as static you are saying that the variable belongs to that compilation unit, so the COMMON is not used and you get the behavior you expect:
int a;
int b;
static int c;
$ size test.o
text data bss dec hex filename
91 0 4 95 5f test.o
Initializing to 0 will get a similar result.
int a;
int b;
int c = 0;
$ size test.o
text data bss dec hex filename
91 0 4 95 5f test.o
However initializing to anything other than 0 will move that variable to data:
int a;
int b = 1;
int c = 0;
$ size test.o
text data bss dec hex filename
91 4 4 99 5f test.o
To check whether I can change code at run time or not, I wrote a small piece of code(below) in linux.
int add(int a, int b)
{
printf("reached inside the function");
return a+b;
}
int main()
{
int x=10;
int y = 20;
int * p;
int z;
int (*fp) (int , int);
fp = add;
p = (int *)fp;
*(p+0) = 0;
z = add(x,y);
}
As there is no issue from c coding point of view, compiler compiles is perfectly and link also happens. But at run time it fails with below error:
Segmentation fault (core dumped)
Above error is perfect, because code segment is not supposed to be changed at run time, But I want to know how it is controlled at run time.
To know more about the code area restrictions, I ran readelf on the output file and result shows below in section headers:
[13] .text PROGBITS 08048330 000330 0001cc 00 AX 0 0 16
where section header flag shows as "AX" , means this section is just allocatable and executable. It does not support writing ("W").
and with a small change in the elf file I was able to modify the flag of this section as "WAX" , as below:
[13] .text PROGBITS 08048330 000330 0001cc 00 WAX 0 0 16
But still I get the same "segmentation fault" error.
I want to know - how is it achieved by the system?
The system is ignoring the W flag here:
$ gcc -Wall file.c
$ readelf -S a.out | grep .text
[14] .text PROGBITS 08048330 000330 0001cc 00 AX 0 0 16
$ objcopy a.out --set-section-flags .text=alloc,code,data a.out
$ readelf -S a.out | grep .text
[14] .text PROGBITS 08048330 000330 0001cc 00 WAX 0 0 16
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
(gdb) r
Starting program: a.out
Program received signal SIGSEGV, Segmentation fault.
0x0804842f in main ()
(gdb) x/i 0x0804842f
0x804842f <main+45>: movl $0x0,(%eax)
(gdb)
You still cannot write to p. You can change the memory page protection at runtime using mprotect:
#include <stdio.h>
#include <unistd.h>
#include <stdint.h>
#include <sys/mman.h>
int add(int a, int b)
{
printf("reached inside the function");
return a+b;
}
int main()
{
int x=10;
int y = 20;
int * p;
int z;
int (*fp) (int , int);
long pagesize;
fp = add;
p = (int *)fp;
pagesize = sysconf(_SC_PAGESIZE);
if(mprotect((void *)((uintptr_t)p & ~((uintptr_t)pagesize - 1)), pagesize, PROT_READ | PROT_WRITE | PROT_EXEC) == -1)
perror("Error mprotect()");
*(p+0) = 0;
z = add(x,y);
return 0;
}
this will leave you with the bad instruction to fix:
$ gcc -Wall file.c
$ ./a.out
Segmentation fault
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
(gdb) r
Starting program: a.out
Program received signal SIGSEGV, Segmentation fault.
0x08048484 in add ()
(gdb) x/i 0x08048484
0x8048484 <add>: add %al,(%eax)
(gdb)
Does the segmentation fault happen at the same place?
It could be that the OS ignores the W flag, but I don't think that's the case here. Assuming the OS honours the flag, the following is relevant.
You are overwriting the first instruction of the add function with 0, which in x86 assembly is (assuming 4 bytes int here)
00000000 0000 add [bx+si],al
00000002 0000 add [bx+si],al
This most likely ends up accessing invalid memory, at bx+si.