I am trying to run some shellcode on a server where I dont have access to the shell, but I have access to my own executable bash script.
My shellcode looks like this:
unsigned char code[] = "\xeb\x15\x5b\x31\xc0\x89\x5b\x08\x88\x43\x07\x8d\x4b\x08\x89\x43"
"\x0c\x89\xc2\xb0\x0b\xcd\x80\xe8\xe6\xff\xff\xff/bin/sh";
When I run it locally, I spawn a shell with the code. I can also run other commands such as /bin/ls... However, when I try to change /bin/sh in favor of ./abcde it wont run my executable.
unsigned char code[] = "\xeb\x15\x5b\x31\xc0\x89\x5b\x08\x88\x43\x07\x8d\x4b\x08\x89\x43"
"\x0c\x89\xc2\xb0\x0b\xcd\x80\xe8\xe6\xff\xff\xff./abcde";
What am I doing wrong? I am on a x86-32 machine..
EDIT:
To make it more clear, this is the scenario:
unsigned char code[] = "\xeb\x15\x5b\x31\xc0\x89\x5b\x08\x88\x43\x07\x8d\x4b\x08\x89\x43"
"\x0c\x89\xc2\xb0\x0b\xcd\x80\xe8\xe6\xff\xff\xff/bin/sh";
unsigned char code1[] = "\xeb\x15\x5b\x31\xc0\x89\x5b\x08\x88\x43\x07\x8d\x4b\x08\x89\x43"
"\x0c\x89\xc2\xb0\x0b\xcd\x80\xe8\xe6\xff\xff\xff./abcde";
int main(void){
void (*f)(void);
f = (void (*)(void))code; //works
f = (void (*)(void))code1; //Does NOT work
f();
}
Your program is not very portable as you include ia32 instructions in your strings. With some help from gdb it was easier to read:
(gdb) disassemble/r code,code1
Dump of assembler code from 0x804a040 to 0x804a080:
0x0804a040 <code+0>: eb 15 jmp 0x804a057 <code+23>
0x0804a042 <code+2>: 5b pop %ebx
0x0804a043 <code+3>: 31 c0 xor %eax,%eax
0x0804a045 <code+5>: 89 5b 08 mov %ebx,0x8(%ebx)
0x0804a048 <code+8>: 88 43 07 mov %al,0x7(%ebx)
0x0804a04b <code+11>: 8d 4b 08 lea 0x8(%ebx),%ecx
0x0804a04e <code+14>: 89 43 0c mov %eax,0xc(%ebx)
0x0804a051 <code+17>: 89 c2 mov %eax,%edx
0x0804a053 <code+19>: b0 0b mov $0xb,%al
0x0804a055 <code+21>: cd 80 int $0x80
0x0804a057 <code+23>: e8 e6 ff ff ff call 0x804a042 <code+2>
0x0804a05c <code+28>: 2f das
0x0804a05d <code+29>: 62 69 6e bound %ebp,0x6e(%ecx)
0x0804a060 <code+32>: 2f das
0x0804a061 <code+33>: 73 68 jae 0x804a0cb
0x0804a063 <code+35>: 00 00 add %al,(%eax)
0x0804a065: 00 00 add %al,(%eax)
0x0804a067: 00 00 add %al,(%eax)
0x0804a069: 00 00 add %al,(%eax)
0x0804a06b: 00 00 add %al,(%eax)
0x0804a06d: 00 00 add %al,(%eax)
0x0804a06f: 00 00 add %al,(%eax)
0x0804a071: 00 00 add %al,(%eax)
0x0804a073: 00 00 add %al,(%eax)
0x0804a075: 00 00 add %al,(%eax)
0x0804a077: 00 00 add %al,(%eax)
0x0804a079: 00 00 add %al,(%eax)
0x0804a07b: 00 00 add %al,(%eax)
0x0804a07d: 00 00 add %al,(%eax)
0x0804a07f: 00 eb add %ch,%bl
however a helpful compiler puts the code in a variable segment which will cause a segmentation fault when the processor jumps to the "string" adn tries to execute from it.
I think this is a similar question:
sys_execve system call from Assembly
A careful reading of the question reveals what is actually going on. Indeed the compiler will happily bork you by placing this in a variable region; however the OS platform targeted probably doesn't have NX enabled (enabling NX on arbitrary 32 bit process was a recipe for disaster for a long time as GCC extensions required the stack to be executable).
The actual problem is you don't have execute access to bash. Your ./abcde is a bash script by your own admission, so the loader interprets #!/bin/bash, goes to open /bin/bash and discovers you don't have x permissions and barfs. exec() returns -Esomething with unpredictable results when you run off the end of the shellcode.
Related
When calling a function in the test portion of an if statement in c, does it evaluate exactly as if you had called it normally? As in, will all the effects besides the return value evaluate and persist?
For example, if I want to include an error check when calling fseek, can I write
if( fseek(file, 0, SEEK_END) ) {fprintf(stderr, "File too long")};
and be functionally the same as:
long int i = fseek(file, 0, SEEK_END);
if( i ) {fprintf(stderr, "File too long")};
?
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#The-if-Statement
https://www.gnu.org/software/libc/manual/html_node/File-Positioning.html
Yes, this is exactly the same. This only difference is you won't be able to use again the result of the operation executed in the if statement.
In both cases the operation is being executed BEFORE the condition (comparison) happens. To illustrate this, we can see what is the result of the two different cases in machine code. Please do note that the output machine code will vary depending of the OS and compiler.
Source file 'a.c':
#include <stdio.h>
int
main(void)
{
FILE *f = fopen("testfile", "r");
long int i = fseek(f, 0, SEEK_END);
if (i)
fprintf(stderr, "Error\n");
return 0;
}
$ gcc -O1 a.c -o a
Source file 'b.c':
#include <stdio.h>
int
main(void)
{
FILE *f = fopen("testfile", "r");
if (fseek(f, 0, SEEK_END))
fprintf(stderr, "Error\n");
return 0;
}
$ gcc -O1 b.c -o b
You will note that for both cases I used the option '-O1' which allows the compiler to introduce small optimizations, this is mostly to make the machine code a little cleaner as without optimization the compiler converts "literally" to machine code.
$ objdump -Mintel -D a |grep -i main -A20
0000000000001189 <main>:
1189: f3 0f 1e fa endbr64
118d: 48 83 ec 08 sub rsp,0x8
1191: 48 8d 35 6c 0e 00 00 lea rsi,[rip+0xe6c] # 2004 <_IO_stdin_used+0x4>
1198: 48 8d 3d 67 0e 00 00 lea rdi,[rip+0xe67] # 2006 <_IO_stdin_used+0x6>
119f: e8 dc fe ff ff call 1080 <fopen#plt>
# Interesting part
11a4: 48 89 c7 mov rdi,rax # Sets return of fopen as param 1
11a7: ba 02 00 00 00 mov edx,0x2 # Sets Ox2 (SEEK_END) as param 3
11ac: be 00 00 00 00 mov esi,0x0 # Sets 0 as param 2
11b1: e8 ba fe ff ff call 1070 <fseek#plt> # Call to FSEEK being made and stored in register
11b6: 85 c0 test eax,eax # Comparison being made
11b8: 75 0a jne 11c4 <main+0x3b> # Comparison jumping
# End of interesting part
11ba: b8 00 00 00 00 mov eax,0x0
11bf: 48 83 c4 08 add rsp,0x8
11c3: c3 ret
11c4: 48 8b 0d 55 2e 00 00 mov rcx,QWORD PTR [rip+0x2e55] # 4020 <stderr##GLIBC_2.2.5>
11cb: ba 06 00 00 00 mov edx,0x6
11d0: be 01 00 00 00 mov esi,0x1
11d5: 48 8d 3d 33 0e 00 00 lea rdi,[rip+0xe33] # 200f <_IO_stdin_used+0xf>
11dc: e8 af fe ff ff call 1090 <fwrite#plt>
11e1: eb d7 jmp 11ba <main+0x31>
11e3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
11ea: 00 00 00
11ed: 0f 1f 00 nop DWORD PTR [rax]
Objdumping on binary 'b' yields an almost identical same machine code result. To sum it up, whatever you put in your if statement is evaluated and will yield a beind-the-scene equivalent result whether or not you assign it a variable first.
Edit:
For reference, this is the output of $ objdump -Mintel -D b |grep -i main -A20:
0000000000001189 <main>:
1189: f3 0f 1e fa endbr64
118d: 48 83 ec 08 sub rsp,0x8
1191: 48 8d 35 6c 0e 00 00 lea rsi,[rip+0xe6c] # 2004 <_IO_stdin_used+0x4>
1198: 48 8d 3d 67 0e 00 00 lea rdi,[rip+0xe67] # 2006 <_IO_stdin_used+0x6>
119f: e8 dc fe ff ff call 1080 <fopen#plt>
# Interesting Part
11a4: 48 89 c7 mov rdi,rax
11a7: ba 02 00 00 00 mov edx,0x2
11ac: be 00 00 00 00 mov esi,0x0
11b1: e8 ba fe ff ff call 1070 <fseek#plt>
11b6: 85 c0 test eax,eax
11b8: 75 0a jne 11c4 <main+0x3b>
# End of interesting part
11ba: b8 00 00 00 00 mov eax,0x0
11bf: 48 83 c4 08 add rsp,0x8
11c3: c3 ret
11c4: 48 8b 0d 55 2e 00 00 mov rcx,QWORD PTR [rip+0x2e55] # 4020 <stderr##GLIBC_2.2.5>
11cb: ba 06 00 00 00 mov edx,0x6
11d0: be 01 00 00 00 mov esi,0x1
11d5: 48 8d 3d 33 0e 00 00 lea rdi,[rip+0xe33] # 200f <_IO_stdin_used+0xf>
11dc: e8 af fe ff ff call 1090 <fwrite#plt>
11e1: eb d7 jmp 11ba <main+0x31>
11e3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
11ea: 00 00 00
11ed: 0f 1f 00 nop DWORD PTR [rax]
The short answer is yes (as in your trivial example), the long answer is maybe.
When the logical expression (any) is more complex the C language evaluates it until the result of the whole expression is fully determined. The remaining operations are not evaluated.
Examples:
int x = 0;
if(x && foo()) {}
the foo will not be called because x is false - and then the whole operation is false.
int x = 1;
if(x && foo()) {}
the foo will be called because x is true and the second part of the expression is needed to get the result.
It is called Short circuit evaluation and all logical expressions in C are evaluated this way.
Background
I'm making an app that needs to run several tasks concurrently. I can't use threads and such because the app should work without any OS (i.e. straight from the bootsector). Using x86 tasks looks like an overkill (both logically and performance-wise). Thus, I decided to implement a task-switching utility myself. I would save processor state, make a call to the task code and then restore the previous state. So I have to make the call from inline assembly.
Problem
Here's some example code:
#include <stdio.h>
void func() {
printf("Hello, world!\n");
}
void (*funcptr)();
int main() {
funcptr = func;
asm(
"call *%0;"
:
:"r"(funcptr)
);
return 0;
}
It compiles perfectly under icc with no options, gcc and clang and yields "Hello, world!" when run. However, if I compile it with icc main.c -ipo, it segfaults.
I disassembled the code that was generated by icc main.c and got the following:
0000000000401220 <main>:
401220: 55 push %rbp
401221: 48 89 e5 mov %rsp,%rbp
401224: 48 83 e4 80 and $0xffffffffffffff80,%rsp
401228: 48 81 ec 80 00 00 00 sub $0x80,%rsp
40122f: bf 03 00 00 00 mov $0x3,%edi
401234: 33 f6 xor %esi,%esi
401236: e8 45 00 00 00 callq 401280 <__intel_new_feature_proc_init>
40123b: 0f ae 1c 24 stmxcsr (%rsp)
40123f: 48 c7 05 f6 78 00 00 movq $0x401270,0x78f6(%rip) # 408b40 <funcptr>
401246: 70 12 40 00
40124a: b8 70 12 40 00 mov $0x401270,%eax
40124f: 81 0c 24 40 80 00 00 orl $0x8040,(%rsp)
401256: 0f ae 14 24 ldmxcsr (%rsp)
40125a: ff d0 callq *%rax
40125c: 33 c0 xor %eax,%eax
40125e: 48 89 ec mov %rbp,%rsp
401261: 5d pop %rbp
401262: c3 retq
401263: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
401268: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40126f: 00
0000000000401270 <func>:
401270: bf 04 40 40 00 mov $0x404004,%edi
401275: e9 e6 fd ff ff jmpq 401060 <puts#plt>
40127a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
On the other hand, icc main.c -ipo yields:
0000000000401210 <main>:
401210: 55 push %rbp
401211: 48 89 e5 mov %rsp,%rbp
401214: 48 83 e4 80 and $0xffffffffffffff80,%rsp
401218: 48 81 ec 80 00 00 00 sub $0x80,%rsp
40121f: bf 03 00 00 00 mov $0x3,%edi
401224: 33 f6 xor %esi,%esi
401226: e8 25 00 00 00 callq 401250 <__intel_new_feature_proc_init>
40122b: 0f ae 1c 24 stmxcsr (%rsp)
40122f: 81 0c 24 40 80 00 00 orl $0x8040,(%rsp)
401236: 48 8b 05 cb 2d 00 00 mov 0x2dcb(%rip),%rax # 404008 <funcptr_2.dp.0>
40123d: 0f ae 14 24 ldmxcsr (%rsp)
401241: ff d0 callq *%rax
401243: 33 c0 xor %eax,%eax
401245: 48 89 ec mov %rbp,%rsp
401248: 5d pop %rbp
401249: c3 retq
40124a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
So, while -ipo didn't remove funcptr variable (see address 401236), it did remove assignment. I guess that icc noticed that func is not called from C code so it can be safely removed, so funcptr is allowed to contain garbage. However, it didn't notice that I'm calling func indirectly via assembly.
What I tried
Replacing "r"(funcptr) with "r"(func) works but I can't hardcode a specific function (see background).
Calling funcptr and/or func before and/or after inline assembly block don't help because icc just inlines printf("Hello, world!\n");.
I can't get rid of inline assembly because I have to do low-level register, flags and stack manipulation before and after call.
Making funcptr volatile yields the following warning but still segfaults:
a value of type "void (*)()" cannot be assigned to an entity of type "volatile void (*)()"
Adding volatile to almost every other word doesn't help either.
Moving func and/or funcptr to other source files and then linking them together doesn't help.
Moving inline assembly to a separate function doesn't work.
Am I doing something wrong or is it an icc bug? If the former, how do I fix the code? If the latter, is there any workaround and should I report the bug?
$ icc --version
icc (ICC) 19.1.0.166 20191121
Copyright (C) 1985-2019 Intel Corporation. All rights reserved.
Here gdb does not stop at Line:4.
Next,
Without hitting the declaration line at Line:5, variable x is existing and initialized.
Next,
But here it shows out of scope (yes it should according to me).
Now, I have the following doubts regarding this particular instance of c program.
When exactly the memory for variable x in P1() gets created and initialized?
why gdb did not stop at static declaration statement in inside P1() in the first example?
If we call P1() again will the program control simply skip the declaration statement?
It has already been explained (in related topics linked in comments below question) how static variables work.
Here is actual code generated by a gcc for your p1 function (by gcc -c -O0 -fomit-frame-pointer -g3 staticvar.c -o staticvar.o) then disassembled with related source.
Disassembly of section .text:
0000000000000000 <p1>:
#include <stdio.h>
void p1(void)
{
0: 48 83 ec 08 sub $0x8,%rsp
static int x = 10;
x += 5;
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <p1+0xa>
a: 83 c0 05 add $0x5,%eax
d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 13 <p1+0x13>
printf("%d\n", x);
13: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 19 <p1+0x19>
19: 89 c6 mov %eax,%esi
1b: bf 00 00 00 00 mov $0x0,%edi
20: b8 00 00 00 00 mov $0x0,%eax
25: e8 00 00 00 00 callq 2a <p1+0x2a>
}
2a: 90 nop
2b: 48 83 c4 08 add $0x8,%rsp
2f: c3 retq
So, as you see there is no code for declaration of x. GDB can only break on actual machine code instruction and as there is none, it breaks on next instruction (mov), which matches line 5.
I am working on a problem where I am attempting to create different scenarios in different C programs such as
Data Hazard
Branch Evaluation
Procedure Call
This is in an attempt at learning pipelining and the different hazards that come up.
So I am writing simple C programs and disassembling to assembly language to see if a hazard gets created. But I cannot figure out how to create these hazards. Do yall have any idea how I could do this? Here is some of the simple code I have written.
I compile using.
gcc -g -c programName.c -o programName.o
gcc programName.o -o programName
objdump -d programName.o > programName.asm
Code:
#include <stdio.h>
int main()
{
int i = 0;
int size = 5;
int num[5] = {1,2,3,4,5};
int sum=0;
int average = 0;
for(i = 0; i < size; i++)
{
sum += num[i];
}
average=sum/size;
return 0;
}
...and here is the assembly for that.
average.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: c7 45 f0 00 00 00 00 movl $0x0,0xfffffffffffffff0(%rbp)
b: c7 45 f4 05 00 00 00 movl $0x5,0xfffffffffffffff4(%rbp)
12: c7 45 d0 01 00 00 00 movl $0x1,0xffffffffffffffd0(%rbp)
19: c7 45 d4 02 00 00 00 movl $0x2,0xffffffffffffffd4(%rbp)
20: c7 45 d8 03 00 00 00 movl $0x3,0xffffffffffffffd8(%rbp)
27: c7 45 dc 04 00 00 00 movl $0x4,0xffffffffffffffdc(%rbp)
2e: c7 45 e0 05 00 00 00 movl $0x5,0xffffffffffffffe0(%rbp)
35: c7 45 f8 00 00 00 00 movl $0x0,0xfffffffffffffff8(%rbp)
3c: c7 45 fc 00 00 00 00 movl $0x0,0xfffffffffffffffc(%rbp)
43: c7 45 f0 00 00 00 00 movl $0x0,0xfffffffffffffff0(%rbp)
4a: eb 10 jmp 5c <main+0x5c>
4c: 8b 45 f0 mov 0xfffffffffffffff0(%rbp),%eax
4f: 48 98 cltq
51: 8b 44 85 d0 mov 0xffffffffffffffd0(%rbp,%rax,4),%eax
55: 01 45 f8 add %eax,0xfffffffffffffff8(%rbp)
58: 83 45 f0 01 addl $0x1,0xfffffffffffffff0(%rbp)
5c: 8b 45 f0 mov 0xfffffffffffffff0(%rbp),%eax
5f: 3b 45 f4 cmp 0xfffffffffffffff4(%rbp),%eax
62: 7c e8 jl 4c <main+0x4c>
64: 8b 55 f8 mov 0xfffffffffffffff8(%rbp),%edx
67: 89 d0 mov %edx,%eax
69: c1 fa 1f sar $0x1f,%edx
6c: f7 7d f4 idivl 0xfffffffffffffff4(%rbp)
6f: 89 45 fc mov %eax,0xfffffffffffffffc(%rbp)
72: b8 00 00 00 00 mov $0x0,%eax
77: c9 leaveq
78: c3 retq
Would appreciate any insight or help. Thanks!
Since this is homework, I'm not going to give you a straight answer, but some food for thought to push you in the right direction.
x86 is a terrible ISA to be using to try and comprehend pipelining. A single x86 instruction can hide two or three side-effects, making it difficult to tease out how a given instruction would perform in even the simplest of pipelines. Are you sure you're not provided a RISC ISA to use for this problem?
Put your loop/hazard code into a function and preferably randomize the creation of the array. Make the array much longer. A good compiler will basically figure out the answer otherwise and remove most of the code you wrote! For reasons I don't understand it's putting your variables in memory.
A good compiler will also do things such as loop unrolling in attempt to hide data hazards and get better code scheduling. Learn how to defeat that (or if you can, give the flag the compiler telling it to NOT do those things if messing around with the compiler is allowed).
The keyword "volatile" can be very helpful in telling the compiler to not optimize around/away certain variables (it tells the compiler this value can change at any moment, so don't be clever and optimize code with it and also don't keep the variable inside the register file).
A data hazard means the pipeline will stall waiting on data. Normally instructions get bypassed just in time, so no stalling occurs. Think about which types of instructions may not be able to be bypassed and could cause a stall on a data hazard. This is dependent on the pipeline, so code that stalls for a specific processor may not stall for another. Modern out-of-order Intel processors are excellent at avoiding these stalls and compilers are great at re-scheduling code so they won't occur even on an in-order core.
I am learning some anti-debugging techniques on Linux and found a snippet of code for checking 0xcc byte in memory to detect the breakpoints in gdb. Here is that code:
if ((*(volatile unsigned *)((unsigned)foo + 3) & 0xff) == 0xcc)
{
printf("BREAKPOINT\n");
exit(1);
}
foo();
But it does not work. I even tried to set a breakpoint on foo() function and observe the contents in memory, but did not see any 0xcc byte written for breakpoint. Here is what I did:
(gdb) b foo
Breakpoint 1 at 0x804846a: file p4.c, line 8.
(gdb) x/x 0x804846a
0x804846a <foo+6>: 0xe02404c7
(gdb) x/16x 0x8048460
0x8048460 <frame_dummy+32>: 0x90c3c9d0 0x83e58955 0x04c718ec 0x0485e024
0x8048470 <foo+12>: 0xfefae808 0xc3c9ffff .....
As you can see, there seems to be no 0xcc byte written on the entry point of foo() function. Does anyone know what's going on or where I might be wrong? Thanks.
Second part is easily explained (as Flortify correctly stated):
GDB shows original memory contents, not the breakpoint "bytes". In default mode it actually even removes breakpoints when debugger suspends and re-inserts them before continuing. Users typically want to see their code, not strange modified instructions used for breakpoints.
With your C code you missed breakpoint for few bytes. GDB sets breakpoint after function prologue, because function prologue is not typically what gdb users want to see. So, if you put break to foo, actual breakpoint will be typically located few bytes after that (depends on prologue code itself that is function dependent as it may or might not have to save stack pointer, frame pointer and so on). But it is easy to check. I used this code:
#include <stdio.h>
int main()
{
int i,j;
unsigned char *p = (unsigned char*)main;
for (j=0; j<4; j++) {
printf("%p: ",p);
for (i=0; i<16; i++)
printf("%.2x ", *p++);
printf("\n");
}
return 0;
}
If we run this program by itself it prints:
0x40057d: 55 48 89 e5 48 83 ec 10 48 c7 45 f8 7d 05 40 00
0x40058d: c7 45 f4 00 00 00 00 eb 5a 48 8b 45 f8 48 89 c6
0x40059d: bf 84 06 40 00 b8 00 00 00 00 e8 b4 fe ff ff c7
0x4005ad: 45 f0 00 00 00 00 eb 27 48 8b 45 f8 48 8d 50 01
Now we run it in gdb (output re-formatted for SO).
(gdb) break main
Breakpoint 1 at 0x400585: file ../bp.c, line 6.
(gdb) info break
Num Type Disp Enb Address What
1 breakpoint keep y 0x0000000000400585 in main at ../bp.c:6
(gdb) disas/r main,+32
Dump of assembler code from 0x40057d to 0x40059d:
0x000000000040057d (main+0): 55 push %rbp
0x000000000040057e (main+1): 48 89 e5 mov %rsp,%rbp
0x0000000000400581 (main+4): 48 83 ec 10 sub $0x10,%rsp
0x0000000000400585 (main+8): 48 c7 45 f8 7d 05 40 00 movq $0x40057d,-0x8(%rbp)
0x000000000040058d (main+16): c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp)
0x0000000000400594 (main+23): eb 5a jmp 0x4005f0
0x0000000000400596 (main+25): 48 8b 45 f8 mov -0x8(%rbp),%rax
0x000000000040059a (main+29): 48 89 c6 mov %rax,%rsi
End of assembler dump.
With this we verified, that program is printing correct bytes. But this also shows that breakpoint has been inserted at 0x400585 (that is after function prologue), not at first instruction of function.
If we now run program under gdb (with run) and then "continue" after breakpoint is hit, we get this output:
(gdb) cont
Continuing.
0x40057d: 55 48 89 e5 48 83 ec 10 cc c7 45 f8 7d 05 40 00
0x40058d: c7 45 f4 00 00 00 00 eb 5a 48 8b 45 f8 48 89 c6
0x40059d: bf 84 06 40 00 b8 00 00 00 00 e8 b4 fe ff ff c7
0x4005ad: 45 f0 00 00 00 00 eb 27 48 8b 45 f8 48 8d 50 01
This now shows 0xcc being printed for address 9 bytes into main.
If your hardware supports it, GDB may be using Hardware Breakpoints, which do not patch the code.
While I have not confirmed this via any official docs, this page indicates that
By default, gdb attempts to use hardware-assisted break-points.
Since you indicate expecting 0xCC bytes, I'm assuming you're running on x86 hardware, as the int3 opcode is 0xCC. x86 processors have a set of debug registers DR0-DR3, where you can program the address of data to cause a breakpoint exception. DR7 is a bitfield which controls the behavior of the breakpoints, and DR6 indicates the status.
The debug registers can only be read/written from Ring 0 (kernel mode). That means that the kernel manages these registers for you (via the ptrace API, I believe.)
However, for the sake of anti-debugging, all hope is not lost! On Windows, the GetThreadContext API allows you to get (a copy) of the CONTEXT for a (stopped) thread. This structure includes the contents of the DRx registers. This question is about how to implement the same on Linux.
This may also be a white lie that GDB is telling you... there may be a breakpoint there in RAM but GDB has noted what was there beforehand (so it can restore it later) and is showing you that, instead of the true contents of RAM.
Of course, it could also be using Hardware Breakpoints, which is a facility available on some processors. Setting h/w breakpoints is done by telling the processor the address it should watch out for (and trigger a breakpoint interrupt if it gets hit by the program counter while executing code).