The following code produces random values for both n and v. It's not surprising that n is random without being properly protected. But it is supposed that v should finally be 0. Is there anything wrong in my code? Or could anyone explain this for me? Thanks.
I'm working on a 4-core server of x86 architecture. The uname is as follows.
Linux 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:00:54 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
#include <stdio.h>
#include <pthread.h>
#include <asm-x86_64/atomic.h>
int n = 0;
atomic_t v;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
#define LOOP 10000
void* foo(void *p)
{
int i = 0;
for(i = 0; i < LOOP; i++) {
// pthread_mutex_lock(&mutex);
++n;
--n;
atomic_inc(&v);
atomic_dec(&v);
// pthread_mutex_unlock(&mutex);
}
return NULL;
}
#define COUNT 50
int main(int argc, char **argv)
{
int i;
pthread_t pids[COUNT];
pthread_attr_t attr;
pthread_attr_init(&attr);
atomic_set(&v, 0);
for(i = 0; i < COUNT; i++) {
pthread_create(&pids[i], &attr, foo, NULL);
}
for(i = 0; i < COUNT; i++) {
pthread_join(pids[i], NULL);
}
printf("%d\n", n);
printf("%d\n", v);
return 0;
}
You should use gcc built-ins instead (see. this) This works fine, and also works with icc.
int a;
__sync_fetch_and_add(&a, 1); // atomic a++
Note that you should be aware of the cache consistency issues when you modify variables without locking.
This old post implies that
It's not obvious that you're supposed to include this kernel header in userspace programs
It's been known to fail to provide atomicity for userspace programs.
So ... Perhaps that's the reason for the problems you're seeing?
Can we get a look at the assembler output of the code (gcc -E, I think). Even thought the uname indicates it's SMP-aware, that doesn't necessarily mean it was compiled with CONFIG_SMP.
Without that, the assembler code output does not have the lock prefix and you can find your cores interfering with one another.
But I would be using the pthread functions anyway since they're portable across more platforms.
Linux kernel atomic.h is not usable from userland and never was. On x86, some of it might work, because x86 is rather synchronization-friendly architecture, but on some platforms it relies heavily on being able to do privileged operations (older arm) or at least on being able to disable preemption (older arm and sparc at least), which is not the case in userland!
Related
I have to be missing something. I can obtain CLOCK_TAI using clock_gettime. However, when I attempt to use CLOCK_TAI with a pthread condition I get an EINVAL.
#include <pthread.h>
#include <stdio.h>
int main()
{
clockid_t clockTai = 11;
pthread_cond_t condition;
pthread_condattr_t eventConditionAttributes;
pthread_condattr_init( &eventConditionAttributes );
int ret = pthread_condattr_setclock( &eventConditionAttributes, clockTai );
printf( "%d %d\n", ret, clockTai );
pthread_cond_init( &condition,
&eventConditionAttributes );
return( 0 );
}
When compiled as follows it produces the following output:
g++ -o taiTest taiTest.cxx -lpthread -lrt
./taitest$ ./taiTest
22 11
Where EINVAL = 22 and CLOCK_TAI = 11.
This happens on both my Ubuntu 14.04 system and my embedded ARM device with an OS built from Yocto.
Any thoughts or help here are greatly appreciated. Thanks in advance.
As per manual page, pthread_condattr_setclock() accepts only a limited set of clock id values. CLOCK_TAI is not one of them. The manual page talks about system clock which does sound somewhat ambiguous. CLOCK_REALTIME, CLOCK_MONOTONIC, and their derivatives should be the acceptable values.
I'm learning the basics of SIMD so I was given a simple code snippet to see the principle at work with SSE and SSE2.
I recently installed minGW to compile C code in windows with gcc instead of using the visual studio compiler.
The objective of the example is to add two floats and then multiply by a third one.
The headers included are the following (which I guess are used to be able to use the SSE intrinsics):
#include <time.h>
#include <stdio.h>
#include <xmmintrin.h>
#include <pmmintrin.h>
#include <time.h>
#include <sys/time.h> // for timing
Then I have a function to check what time it is, to compare time between calculations:
double now(){
struct timeval t; double f_t;
gettimeofday(&t, NULL);
f_t = t.tv_usec; f_t = f_t/1000000.0; f_t +=t.tv_sec;
return f_t;
}
The function to do the calculation in the "scalar" sense is the following:
void run_scalar(){
unsigned int i;
for( i = 0; i < N; i++ ){
rs[i] = (a[i]+b[i])*c[i];
}
}
Here is the code for the sse2 function:
void run_sse2(){
unsigned int i;
__m128 *mm_a = (__m128 *)a;
__m128 *mm_b = (__m128 *)b;
__m128 *mm_c = (__m128 *)c;
__m128 *mm_r = (__m128 *)rv;
for( i = 0; i <N/4; i++)
mm_r[i] = _mm_mul_ps(_mm_add_ps(mm_a[i],mm_b[i]),mm_c[i]);
}
The vectors are defined the following way (N is the size of the vectors and it is defined elsewhere) and a function init() is called to initialize them:
float a[N] __attribute__((aligned(16)));
float b[N] __attribute__((aligned(16)));
float c[N] __attribute__((aligned(16)));
float rs[N] __attribute__((aligned(16)));
float rv[N] __attribute__((aligned(16)));
void init(){
unsigned int i;
for( i = 0; i < N; i++ ){
a[i] = (float)rand () / RAND_MAX / N;
b[i] = (float)rand () / RAND_MAX / N;
c[i] = (float)rand () / RAND_MAX / N;
}
}
Finally here is the main that calls the functions and prints the results and computing time.
int main(){
double t;
init();
t = now();
run_scalar();
t = now()-t;
printf("S = %10.9f Temps du code scalaire : %f seconde(s)\n",1e5*sum(rs),t);
t = now();
run_sse2();
t = now()-t;
printf("S = %10.9f Temps du code vectoriel 2: %f seconde(s)\n",1e5*sum(rv),t);
}
For sum reason if I compile this code with a command line of "gcc -o vec vectorial.c -msse -msse2 -msse3" or "mingw32-gcc -o vec vectorial.c -msse -msse2 -msse3"" it compiles without any problems, but for some reason I can't run it in my windows machine, in the command prompt I get an "access denied" and a big message appears on the screen saying "This app can't run on your PC, to find a version for your PC, check with the software publisher".
I don't really understand what is going on, neither do I have much experience with MinGW or C (just an introductory course to C++ done on Linux machines). I've tried playing around with different headers because I thought maybe I was targeting a different processor than the one on my PC but couldn't solve the issue. Most of the info I found was confusing.
Can someone help me understand what is going on? Is it a problem in the minGW configuration that is compiling in targeting a Linux platform? Is it something in the code that doesn't have the equivalent in windows?
I'm trying to run it on a 64 bit Windows 8.1 pc
Edit: Tried the configuration suggested in the site linked below. The output remains the same.
If I try to run through MSYS I get a "Bad File number"
If I try to run throught the command prompt I get Access is Denied.
I'm guessing there's some sort of bug arising from permissions. Tried turning off the antivirus and User Account control but still no luck.
Any ideas?
There is nothing wrong with your code, besides, you did not provide the definition of sum() or N which is, however, not a problem. The switches -msse -msse2 appear to be not required.
I was able to compile and run your code on Linux (Ubuntu x86_64, compiled with gcc 4.8.2 and 4.6.3, on Atom D2700 and AMD Athlon LE-1640) and Windows7/64 (compiled with gcc 4.5.3 (32bit) and 4.8.2 (64bit), on Core i3-4330 and Core i7-4960X). It was running without problem.
Are you sure your CPU supports the required instructions? What exactly was the error code you got? Which MinGW configuration did you use? Out of curiosity, I used the one available at http://win-builds.org/download.html which was very straight-forward.
However, using the optimization flag -O3 created the best result -- with the scalar loop! Also useful are -m64 -mtune=native -s.
Hello I have the following code, which I compile with gcc (>4.2) with -fopenmp flag:
int main(void)
{
#pragma omp parallel for
int i;
for(i=0;i<4;i++) while(1);
return 0;
}
I get a SIGSEGV on OSX Lion (ver 1.7.3, llvm-gcc 4.2.1) and CentOS 6.2 . What am I doing wrong here? Thanks
Not sure if this is relevant to the compiler version and configuration but while(true){} terminates
More precisely, if you write a loop which
makes no calls to library I/O functions, and
does not access or modify volatile objects, and
performs no synchronization operations (1.10) or atomic operations (Clause 29)
and does not terminate, you have undefined behaviour.
This may end up not applying to your situation, but as C++11 becomes more established, watch out.
Very interesting.
I changed the code a little
so
int main(void)
{
int i;
#pragma omp parallel
{
while(1);
}
return 0;
}
and so
inline void func() {
while (1) ;
}
int main(void)
{
int i;
#pragma omp parallel for
for(i=0;i<8;i++) {
func();
}
return 0;
}
And they both worked OK.
There was a bug in the gcc regarding this issue, I reported it and they will provide a fix. Here is the link: GCC bug
A simple C program which uses gettimeofday() works fine when compiled without any flags ( gcc-4.5.1) but doesn't give output when compiled with the flag -mno-sse.
#include <stdio.h>
#include <stdlib.h>
int main()
{
struct timeval s,e;
float time;
int i;
gettimeofday(&s, NULL);
for( i=0; i< 10000; i++);
gettimeofday(&e, NULL);
time = e.tv_sec - s.tv_sec + e.tv_usec - s.tv_usec;
printf("%f\n", time);
return 0;
}
I have CFLAGS=-march=native -mtune=native
Could someone explain why this happens?
The program returns a correct value normally, but prints "0" when compiled with -mno-sse enabled.
The flag -mno-sse causes floating point arguments to be passed on the stack, whereas the usual x86_64 ABI specifies that they should be passed via SSE registers.
Since printf() in your C library was compiled without -mno-sse, it is expecting floating point arguments to be passed in accordance with the ABI. This is why your code fails. It has nothing to do with gettimeofday().
If you wish to use printf() from your code compiled with -mno-sse and pass it floating point arguments, you will need to recompile your C library with that option and link against that version.
It appears that you are using a loop which does nothing in order to observe a time difference. The problem is, the compiler may optimize this loop away entirely. The issue may not be with the -mno-sse itself, but may be that that allows an optimization that removes the loop, thus giving you the same time each time you run it.
I would recommend trying to put something in that loop which can't be optimized out (such as incrementing a number which you print out at the end). See if you still get the same behavior. If not, I'd recommend looking at the generated assembler gcc -S and see what the code difference is.
The datastructures tv_usec and tv_sec are usually longs.
Redeclaration of the variable "time" as a long integer solved the issue.
The following link addresses the issue.
http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00525.html
Working code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
struct timeval s,e;
long time;
int i;
gettimeofday(&s, NULL);
for( i=0; i< 10000; i++);
gettimeofday(&e, NULL);
time = e.tv_sec - s.tv_sec + e.tv_usec - s.tv_usec;
printf("%ld\n", time);
return 0;
}
Thanks for the prompt replies. Hope this helps.
What do you mean doesn't give output?
0 (zero) is a perfectly reasonable output to expect.
Edit: Try compiling to assembler (gcc -S ...) and see the differences between the normal and the no-sse version.
I'm looking for a way to atomically increment a short, and then return that value. I need to do this both in kernel mode and in user mode, so it's in C, under Linux, on Intel 32bit architecture. Unfortunately, due to speed requirements, a mutex lock isn't going to be a good option.
Is there any other way to do this? At this point, it seems like the only option available is to inline some assembly. If that's the case, could someone point me towards the appropriate instructions?
GCC __atomic_* built-ins
As of GCC 4.8, __sync built-ins have been deprecated in favor of the __atomic built-ins: https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/_005f_005fatomic-Builtins.html
They implement the C++ memory model, and std::atomic uses them internally.
The following POSIX threads example fails consistently with ++ on x86-64, and always works with _atomic_fetch_add.
main.c
#include <assert.h>
#include <pthread.h>
#include <stdlib.h>
enum CONSTANTS {
NUM_THREADS = 1000,
NUM_ITERS = 1000
};
int global = 0;
void* main_thread(void *arg) {
int i;
for (i = 0; i < NUM_ITERS; ++i) {
__atomic_fetch_add(&global, 1, __ATOMIC_SEQ_CST);
/* This fails consistently. */
/*global++*/;
}
return NULL;
}
int main(void) {
int i;
pthread_t threads[NUM_THREADS];
for (i = 0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, main_thread, NULL);
for (i = 0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
assert(global == NUM_THREADS * NUM_ITERS);
return EXIT_SUCCESS;
}
Compile and run:
gcc -std=c99 -Wall -Wextra -pedantic -o main.out ./main.c -pthread
./main.out
Disassembly analysis at: How do I start threads in plain C?
Tested in Ubuntu 18.10, GCC 8.2.0, glibc 2.28.
C11 _Atomic
In 5.1, the above code works with:
_Atomic int global = 0;
global++;
And C11 threads.h was added in glibc 2.28, which allows you to create threads in pure ANSI C without POSIX, minimal runnable example: How do I start threads in plain C?
GCC supports atomic operations:
gcc atomics