pthread conditions support for CLOCK_TAI - c

I have to be missing something. I can obtain CLOCK_TAI using clock_gettime. However, when I attempt to use CLOCK_TAI with a pthread condition I get an EINVAL.
#include <pthread.h>
#include <stdio.h>
int main()
{
clockid_t clockTai = 11;
pthread_cond_t condition;
pthread_condattr_t eventConditionAttributes;
pthread_condattr_init( &eventConditionAttributes );
int ret = pthread_condattr_setclock( &eventConditionAttributes, clockTai );
printf( "%d %d\n", ret, clockTai );
pthread_cond_init( &condition,
&eventConditionAttributes );
return( 0 );
}
When compiled as follows it produces the following output:
g++ -o taiTest taiTest.cxx -lpthread -lrt
./taitest$ ./taiTest
22 11
Where EINVAL = 22 and CLOCK_TAI = 11.
This happens on both my Ubuntu 14.04 system and my embedded ARM device with an OS built from Yocto.
Any thoughts or help here are greatly appreciated. Thanks in advance.

As per manual page, pthread_condattr_setclock() accepts only a limited set of clock id values. CLOCK_TAI is not one of them. The manual page talks about system clock which does sound somewhat ambiguous. CLOCK_REALTIME, CLOCK_MONOTONIC, and their derivatives should be the acceptable values.

Related

Compiling old C code Y2038 conform still results in 4 byte variables

According to this overview in order to compile Y2038 conform old code, we just need to add the preprocessor macro __USE_TIME_BITS64 to gcc, but that does not seem to work on an ARMv7 board with Debian 12 (bookworm):
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <unistd.h>
int main(void)
{
struct stat sb;
printf("sizeof time_t: %zu\n", sizeof(time_t));
printf("sizeof stat timestamp: %zu\n", sizeof(sb.st_atime));
return 0;
}
time_t is still 4 bytes:
root#debian:~# gcc -D__USE_TIME_BITS64 time.c -o time
root#debian:~# ./time
sizeof time_t: 4
sizeof stat timestamp: 4
root#debian:~#
glibc is 2.33, what am I doing wrong here?
According to this post (which is getting a little old now, and some parts of which are probably no longer relevant):
... defining _TIME_BITS=64 would cause all time functions to use 64-bit times by default. The _TIME_BITS=64 option is implemented by transparently mapping the standard functions and types to their internal 64-bit variants. Glibc would also set __USE_TIME_BITS64, which user code can test for to determine if the 64-bit variants are available.
Presumably, this includes making time_t 64 bit.
So if your version of glibc supports this at all, it looks like you're setting the wrong macro. You want:
-D_TIME_BITS=64

gcc's __builtin_memcpy performance with certain number of bytes is terrible compared to clang's

I thought I`d first share this here to have your opinions before doing anything else. I found out while designing an algorithm that the gcc compiled code performance for some simple code was catastrophic compared to clang's.
How to reproduce
Create a test.c file containing this code :
#include <sys/stat.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
int main(int argc, char *argv[]) {
const uint64_t size = 1000000000;
const size_t alloc_mem = size * sizeof(uint8_t);
uint8_t *mem = (uint8_t*)malloc(alloc_mem);
for (uint_fast64_t i = 0; i < size; i++)
mem[i] = (uint8_t) (i >> 7);
uint8_t block = 0;
uint_fast64_t counter = 0;
uint64_t total = 0x123456789abcdefllu;
uint64_t receiver = 0;
for(block = 1; block <= 8; block ++) {
printf("%u ...\n", block);
counter = 0;
while (counter < size - 8) {
__builtin_memcpy(&receiver, &mem[counter], block);
receiver &= (0xffffffffffffffffllu >> (64 - ((block) << 3)));
total += ((receiver * 0x321654987cbafedllu) >> 48);
counter += block;
}
}
printf("=> %llu\n", total);
return EXIT_SUCCESS;
}
gcc
Compile and run :
gcc-7 -O3 test.c
time ./a.out
1 ...
2 ...
3 ...
4 ...
5 ...
6 ...
7 ...
8 ...
=> 82075168519762377
real 0m23.367s
user 0m22.634s
sys 0m0.495s
info :
gcc-7 -v
Using built-in specs.
COLLECT_GCC=gcc-7
COLLECT_LTO_WRAPPER=/usr/local/Cellar/gcc/7.3.0/libexec/gcc/x86_64-apple-darwin17.4.0/7.3.0/lto-wrapper
Target: x86_64-apple-darwin17.4.0
Configured with: ../configure --build=x86_64-apple-darwin17.4.0 --prefix=/usr/local/Cellar/gcc/7.3.0 --libdir=/usr/local/Cellar/gcc/7.3.0/lib/gcc/7 --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-7 --with-gmp=/usr/local/opt/gmp --with-mpfr=/usr/local/opt/mpfr --with-mpc=/usr/local/opt/libmpc --with-isl=/usr/local/opt/isl --with-system-zlib --enable-checking=release --with-pkgversion='Homebrew GCC 7.3.0' --with-bugurl=https://github.com/Homebrew/homebrew-core/issues --disable-nls
Thread model: posix
gcc version 7.3.0 (Homebrew GCC 7.3.0)
So we get about 23s of user time. Now let's do the same with cc (clang on macOS) :
clang
cc -O3 test.c
time ./a.out
1 ...
2 ...
3 ...
4 ...
5 ...
6 ...
7 ...
8 ...
=> 82075168519762377
real 0m9.832s
user 0m9.310s
sys 0m0.442s
info :
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin17.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
That's more than 2.5x faster !! Any thoughts ?
I replaced the __builtin_memcpy function by memcpy to test things out and this time the compiled code runs in about 34s on both sides - consistent and slower as expected.
It would appear that the combination of __builtin_memcpy and bitmasking is interpreted very differently by both compilers.
I had a look at the assembly code, but couldn't see anything standing out that would explain such a drop in performance as I'm not an asm expert.
Edit 03-05-2018 :
Posted this bug : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84719.
I find it suspicious that you get different code for memcpy vs __builtin_memcpy. I don't think that's supposed to happen, and indeed I cannot reproduce it on my (linux) system.
If you add #pragma GCC unroll 16 (implemented in gcc-8+) before the for loop, gcc gets the same perf as clang (making block a constant is essential to optimize the code), so essentially llvm's unrolling is more aggressive than gcc's, which can be good or bad depending on cases. Still, feel free to report it to gcc, maybe they'll tweak the unrolling heuristics some day and an extra testcase could help.
Once unrolling is taken care of, gcc does ok for some values (block equals 4 or 8 in particular), but much worse for some others, in particular 3. But that's better analyzed with a smaller testcase without the loop on block. Gcc seems to have trouble with memcpy(,,3), it works much better if you always read 8 bytes (the next line already takes care of the extra bytes IIUC). Another thing that could be reported to gcc.

PTRACE_SYSEMU, and PTRACE_SYSEMU_SINGLESTEP not defined on x64 or x86?

My code is as follows:
#include <sys/ptrace.h>
#include <stdio.h>
int
main()
{
printf("PTRACE_CONT: %d\n", PTRACE_CONT);
printf("PTRACE_SYSCALL: %d\n", PTRACE_SYSCALL);
printf("PTRACE_SINGLESTEP: %d\n", PTRACE_SINGLESTEP);
printf("PTRACE_SYSEMU: %d\n", PTRACE_SYSEMU);
printf("PTRACE_SYSEMU_SINGLESTEP: %d\n", PTRACE_SYSEMU_SINGLESTEP);
printf("PTRACE_LISTEN: %d\n", PTRACE_LISTEN);
return 0;
}
I'm compiling with the default flags on Ubuntu16.04 (Linux x86_64 4.40-38), with gcc v5.4.0.
This throws an error that PTRACE_SYSEMU is undeclared. While the man ptrace page states it exists. This is repeated for PTRACE_SYSEMU_SINGLESTEP if the line containing PTRACE_SYSEMU is commented out. Which the man page states PTRACE_SYSEMU_SINGLESTEP is only available for x86, except a patch was merged to unify the x86 and x64 handling of PTRACE_SYSEMU_SINGLESTEP in 2008.
This produces the same error on 32bit (well i686), or 64bit (AMD64). Is this distro specific? What is going on?
I can confirm neither of these values are defined are in my /usr/include/x86_64/linux/sys/ptrace.h. But they are defined in kernel sources?!?
On Ubuntu 16.04 (and also 14.04), these are defined in <asm/ptrace-abi.h>, which is included by <asm/ptrace.h>, which in turn is included by <linux/ptrace.h>, but not by <sys/ptrace.h>
Since these request codes are linux specific (not part of any standard), if you want them, you need to #include <linux/ptrace.h>
Sysemu is used in user-mode linux as optimization and described at http://sysemu.sourceforge.net/ site. It is feature for UML (when special kernel runs as ordinary process) and not for typical users of ptrace.
Its implementation in x86 linux can be checked by TIF_SYSCALL_EMU flag in lxr of linux kernel (ptrace_resume)
http://lxr.free-electrons.com/source/kernel/ptrace.c?v=4.10#L767
767 static int ptrace_resume(struct task_struct *child, long request,
768 unsigned long data)
...
780 #ifdef TIF_SYSCALL_EMU
781 if (request == PTRACE_SYSEMU || request == PTRACE_SYSEMU_SINGLESTEP)
782 set_tsk_thread_flag(child, TIF_SYSCALL_EMU);
783 else
784 clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
785 #endif
http://lxr.free-electrons.com/ident?i=TIF_SYSCALL_EMU
The only definition is for x86:
http://lxr.free-electrons.com/source/arch/x86/include/asm/thread_info.h?v=4.10#L85
85 #define TIF_SYSCALL_EMU 6 /* syscall emulation active */

difference between time() and gettimeofday() and why does one cause seg fault

I'm trying to measure the amount of time for a system call, and I tried using time(0) and gettimeofday() in this program, but whenever I use gettimeofday() it seg faults. I suppose I can just use time(0) but I'd like to know why this is happening. And I know you guys can just look at it and see the problem. Please don't yell at me!
I want to get the time but not save it anywhere.
I've tried every combination of code I can think of but I pasted the simplest version here. I'm new to C and Linux. I look at the .stackdump file but it's pretty meaningless to me.
GetRDTSC is in util.h and it does rdtsc(), as one might expect. Now it's set to 10 iterations but later the loop will run 1000 times, without printf.
#include <stdio.h>
#include <time.h>
#include "util.h"
int main() {
int i;
uint64_t cycles[10];
for (i = 0; i < 10; ++i) {
// get initial cycles
uint64_t init = GetRDTSC();
gettimeofday(); // <== time(0) will work here without a seg fault.
// get cycles after
uint64_t after = GetRDTSC();
// save cycles for each operation in an array
cycles[i] = after - init;
printf("%i\n", (int)(cycles[i]));
}
}
The short version
gettimeofday() requires a pointer to a struct timeval to fill with time data.
So, for example, you'd do something like this:
#include <sys/time.h>
#include <stdio.h>
int main() {
struct timeval tv;
gettimeofday(&tv, NULL); // timezone should be NULL
printf("%d seconds\n", tv.tv_secs);
return 0;
}
The long version
The real problem is that gcc is automatically including vdso on your system, which contains a symbol for the syscall gettimeofday. Consider this program (entire file):
int main() {
gettimeofday();
return 0;
}
By default, gcc will compile this without warning. If you check the symbols it's linked against, you'll see:
ternus#event-horizon ~> gcc -o foo foo.c
ternus#event-horizon ~> ldd foo
linux-vdso.so.1 => (0x00007ffff33fe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f56a5255000)
/lib64/ld-linux-x86-64.so.2 (0x00007f56a562b000)
You just happen to be using a function that has a defined symbol, but without the prototype, there's no way to tell how many command-line arguments it's supposed to have.
If you compile it with -Wall, you'll see:
ternus#event-horizon ~> gcc -Wall -o foo foo.c
foo.c: In function ‘main’:
foo.c:2:3: warning: implicit declaration of function ‘gettimeofday’ [-Wimplicit-function-declaration]
Of course, it'll segfault when you try to run it. Interestingly, it'll segfault in kernel space (this is on MacOS):
cternus#astarael ~/foo> gcc -o foo -g foo.c
cternus#astarael ~/foo> gdb foo
GNU gdb 6.3.50-20050815 (Apple version gdb-1822) (Sun Aug 5 03:00:42 UTC 2012)
[etc]
(gdb) run
Starting program: /Users/cternus/foo/foo
Reading symbols for shared libraries +.............................. done
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000001
0x00007fff87eeab73 in __commpage_gettimeofday ()
Now consider this program (again, no header files):
typedef struct {
long tv_sec;
long tv_usec;
} timeval;
int main() {
timeval tv;
gettimeofday(&tv, 0);
return 0;
}
This will compile and run just fine -- no segfault. You've provided it with the memory location it expects, even though there's still no gettimeofday prototype provided.
More information:
Can anyone understand how gettimeofday works?
Is there a faster equivalent of gettimeofday?
The POSIX gettimeofday specification

Operations from atomic.h seem to be non-atomic

The following code produces random values for both n and v. It's not surprising that n is random without being properly protected. But it is supposed that v should finally be 0. Is there anything wrong in my code? Or could anyone explain this for me? Thanks.
I'm working on a 4-core server of x86 architecture. The uname is as follows.
Linux 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:00:54 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
#include <stdio.h>
#include <pthread.h>
#include <asm-x86_64/atomic.h>
int n = 0;
atomic_t v;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
#define LOOP 10000
void* foo(void *p)
{
int i = 0;
for(i = 0; i < LOOP; i++) {
// pthread_mutex_lock(&mutex);
++n;
--n;
atomic_inc(&v);
atomic_dec(&v);
// pthread_mutex_unlock(&mutex);
}
return NULL;
}
#define COUNT 50
int main(int argc, char **argv)
{
int i;
pthread_t pids[COUNT];
pthread_attr_t attr;
pthread_attr_init(&attr);
atomic_set(&v, 0);
for(i = 0; i < COUNT; i++) {
pthread_create(&pids[i], &attr, foo, NULL);
}
for(i = 0; i < COUNT; i++) {
pthread_join(pids[i], NULL);
}
printf("%d\n", n);
printf("%d\n", v);
return 0;
}
You should use gcc built-ins instead (see. this) This works fine, and also works with icc.
int a;
__sync_fetch_and_add(&a, 1); // atomic a++
Note that you should be aware of the cache consistency issues when you modify variables without locking.
This old post implies that
It's not obvious that you're supposed to include this kernel header in userspace programs
It's been known to fail to provide atomicity for userspace programs.
So ... Perhaps that's the reason for the problems you're seeing?
Can we get a look at the assembler output of the code (gcc -E, I think). Even thought the uname indicates it's SMP-aware, that doesn't necessarily mean it was compiled with CONFIG_SMP.
Without that, the assembler code output does not have the lock prefix and you can find your cores interfering with one another.
But I would be using the pthread functions anyway since they're portable across more platforms.
Linux kernel atomic.h is not usable from userland and never was. On x86, some of it might work, because x86 is rather synchronization-friendly architecture, but on some platforms it relies heavily on being able to do privileged operations (older arm) or at least on being able to disable preemption (older arm and sparc at least), which is not the case in userland!

Resources