I have a program in C with an OpenMP library. I am trying to measure the execution time for a loop to add numbers except that I am getting an error that the reference the function omp_get_wtime() is undefined while I have the library file included in the header section of my code file...
Why am I getting this error and how can I resolve it?
#include <omp.h>
#include <stdlib.h>
#include <string.h>
int main(){
// Number of threads to use
int no_threads = 3;
// Mark the execution go mark
double start = omp_get_wtime();
// Parallelize the loop
#pragma omp parallel num_threads(no_threads)
for(int i=0; i<1000; i++){
printf("%d", i);
}
double end_time = omp_get_wtime();
printf("%f", end_time - start);
}
I tried to compile even before I started using the #pragma directive and got the error that the omp function is undefined and the function in the file.
You should build an OMP program with a flag.
gcc -fopenmp test.c
Related
I am trying to use OpenMP on a code doing some complex algebra using FFTW operations. Each thread needs to work separately using its own work arrays. Therefore I followed this answer ( Reusable private dynamically allocated arrays in OpenMP ) and tried to use the threadprivate functionality. The FFTW plans and workspace variables need to be used many times in the code in multiple parallel segments.
The following is a test code. However, it gives all kinds of errors (segmentation faults, etc.) at the line p1=fftw_plan_ .... Any idea what I am doing wrong ? This seems to work well just with arrays but is not working for the FFTW plan. Can an FFTW plan be shared by threads but working on their separate data segments inpd, outc1, outc2 ?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
...
#include <math.h>
#include <complex.h>
#include <omp.h>
#include <fftw3.h>
int main(int argc, char *argv[])
{
int npts2, id;
static double *inpd;
static fftw_complex *outc1, *outc2;
static fftw_plan p1, p2;
static int npts2stat;
#pragma omp threadprivate(inpd,outc1,outc2,p1,p2,npts2stat)
npts2=1000;
#pragma omp parallel private(id) shared(npts2)
{
id=omp_get_thread_num();
printf("Thread %d: Allocating threadprivate memory \n",id);
npts2stat=npts2;
printf("step1 \n");
inpd=malloc(sizeof(double)*npts2stat);
outc1=fftw_malloc(sizeof(fftw_complex)*npts2stat);
outc2=fftw_malloc(sizeof(fftw_complex)*npts2stat);
printf("step2 \n");
// CODE COMPILES FINE BUT STOPS HERE WITH ERROR
p1=fftw_plan_dft_r2c_1d(npts2stat,inpd,outc1,FFTW_ESTIMATE);
p2=fftw_plan_dft_1d(npts2stat,outc1,outc2,FFTW_BACKWARD,FFTW_ESTIMATE);
printf("step3 \n");
}
// multiple omp parallel segments with different threads doing some calculations on complex numbers using FFTW
#pragma omp parallel private(id)
{
id=omp_get_thread_num();
printf("Thread %d: Deallocating threadprivate memory \n",id);
free(inpd);
fftw_free(outc1);
fftw_free(outc2);
fftw_destroy_plan(p1);
fftw_destroy_plan(p2);
}
}
EDIT1: I seem to have solved the segmentation fault issue below. However if you have a better way of doing this, that will be helpful. For example, I don't know if a single plan for all threads is sufficient because it is acting on different data inpd private to different threads.
Sorry the section on Thread Safety in FFTW manual makes it pretty clear that FFTW planner should be called only one thread at a time. This seems to have solved the segmentation fault issue.
#pragma omp critical
{
p1=fftw_plan_dft_r2c_1d(npts2stat,inpd,outc1,FFTW_ESTIMATE);
p2=fftw_plan_dft_1d(npts2stat,outc1,outc2,FFTW_BACKWARD,FFTW_ESTIMATE);
}
Is there a good way to use OpenMP to parallelize a for-loop, only if an -omp argument is passed to the program?
This seems not possible, since #pragma omp parallel for is a preprocessor directive and thus evaluated even before compile time and of course it is only certain if the argument is passed to the program at runtime.
At the moment I am using a very ugly solution to achieve this, which leads to an enormous duplication of code.
if(ompDefined) {
#pragma omp parallel for
for(...)
...
}
else {
for(...)
...
}
I think what you are looking for can be solved using a CPU dispatcher technique.
For benchmarking OpenMP code vs. non-OpenMP code you can create different object files from the same source code like this
//foo.c
#ifdef _OPENMP
double foo_omp() {
#else
double foo() {
#endif
double sum = 0;
#pragma omp parallel for reduction(+:sum)
for(int i=0; i<1000000000; i++) sum += i%10;
return sum;
}
Compile like this
gcc -O3 -c foo.c
gcc -O3 -fopenmp -c foo.c -o foo_omp.o
This creates two object files foo.o and foo_omp.o. Then you can call one of these functions like this
//bar.c
#include <stdio.h>
double foo();
double foo_omp();
double (*fp)();
int main(int argc, char *argv[]) {
if(argc>1) {
fp = foo_omp;
}
else {
fp = foo;
}
double sum = fp();
printf("sum %e\n", sum);
}
Compile and link like this
gcc -O3 -fopenmp bar.c foo.o foo_omp.o
Then I time the code like this
time ./a.out -omp
time ./a.out
and the first case takes about 0.4 s and the second case about 1.2 s on my system with 4 cores/8 hardware threads.
Here is a solution which only needs a single source file
#include <stdio.h>
typedef double foo_type();
foo_type foo, foo_omp, *fp;
#ifdef _OPENMP
#define FUNCNAME foo_omp
#else
#define FUNCNAME foo
#endif
double FUNCNAME () {
double sum = 0;
#pragma omp parallel for reduction(+:sum)
for(int i=0; i<1000000000; i++) sum += i%10;
return sum;
}
#ifdef _OPENMP
int main(int argc, char *argv[]) {
if(argc>1) {
fp = foo_omp;
}
else {
fp = foo;
}
double sum = fp();
printf("sum %e\n", sum);
}
#endif
Compile like this
gcc -O3 -c foo.c
gcc -O3 -fopenmp foo.c foo.o
You can set the number of threads at run-time by calling omp_set_num_threads:
#include <omp.h>
int main()
{
int threads = 1;
#ifdef _OPENMP
omp_set_num_threads(threads);
#endif
#pragma omp parallel for
for(...)
{
...
}
}
This isn't quite the same as disabling OpenMP, but it will stop it running calculations in parallel. I've found it's always a good idea to set this using a command line switch (you can implement this using GNU getopt or Boost.ProgramOptions). This allows you to easily run single-threaded and multi-threaded tests on the same code.
As Vladimir F pointed out in the comments, you can also set the number of threads by setting the environment variable OMP_NUM_THREADS before executing your program:
gcc -Wall -Werror -pedantic -O3 -fopenmp -o test test.c
OMP_NUM_THREADS=1
./test
unset OMP_NUM_THREADS
Finally, you can disable OpenMP at compile-time by not providing GCC with the -fopenmp option. However, you will need to put preprocessor guards around any lines in your code that require OpenMP to be enabled (see above). If you want to use some functions included in the OpenMP library without actually enabling the OpenMP pragmas you can simply link against the OpenMP library by replacing the -fopenmp option with -lgomp.
One solution would be to use the preprocessor to ignore the pragma statement if you do not pass an additional flag to the compiler.
For example in your code you might have:
#ifdef MP_ENABLED
#pragma omp parallel for
#endif
for(...)
...
and then when you compile you can pass a flag to the compiler to define the MP_ENABLED macro. In the case of GCC (and Clang) you would pass -DMP_ENABLED.
You then might compile with gcc as
gcc SOME_SOURCE.c -I SOME_INCLUDE.h -lomp -DMP_ENABLED -o SOME_OUTPUT
then when you want to disable the parallelism you can make a minor tweek to the compile command by dropping -DMP_ENABLED.
gcc SOME_SOURCE.c -I SOME_INCLUDE.h -lomp -DMP_ENABLED -o SOME_OUTPUT
This causes the macro to be undefined which leads to the preprocessor ignoring the pragma.
You could also use a similar solution using ifndef instead depending on whether you consider the parallel behavior the default or not.
Edit: As noted by some comments, inclusion of OMP lib defines some macros such as _OPENMP which you could use in place of your own user-defined macros. That looks to be a superior solution, but the difference in effort is reasonably small.
My OpenMP program is like this:
#include <stdio.h>
#include <omp.h>
int main (void)
{
int i = 10;
#pragma omp parallel lastprivate(i)
{
printf("thread %d: i = %d\n", omp_get_thread_num(), i);
i = 1000 + omp_get_thread_num();
}
printf("i = %d\n", i);
return 0;
}
Use gcc to compile it and generate following errors:
# gcc -fopenmp test.c
test.c: In function 'main':
test.c:8:26: error: 'lastprivate' is not valid for '#pragma omp parallel'
#pragma omp parallel lastprivate(i)
^~~~~~~~~~~
Why does OpenMP forbid use lastprivate in #pragma omp parallel?
The meaning of lastprivate, is to assign "the sequentially last iteration of the associated loops, or the lexically last section construct [...] to the original list item."
Hence, there it no meaning for a pure parallel construct. It would not be a good idea to use a meaning like "the last thread to exit the parallel construct" - that would be a race condition.
I am working on a m/c Intel(R) Xeon(R) CPU E5-2640 v2 # 2.00GHz It supports SSE4.2.
I have written C code to perform XOR operation over string bits. But I want to write corresponding SIMD code and check for performance improvement. Here is my code
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#define LENGTH 10
unsigned char xor_val[LENGTH];
void oper_xor(unsigned char *r1, unsigned char *r2)
{
unsigned int i;
for (i = 0; i < LENGTH; ++i)
{
xor_val[i] = (unsigned char)(r1[i] ^ r2[i]);
printf("%d",xor_val[i]);
}
}
int main() {
int i;
time_t start, stop;
double cur_time;
start = clock();
oper_xor("1110001111", "0000110011");
stop = clock();
cur_time = ((double) stop-start) / CLOCKS_PER_SEC;
printf("Time used %f seconds.\n", cur_time / 100);
for (i = 0; i < LENGTH; ++i)
printf("%d",xor_val[i]);
printf("\n");
return 0;
}
On compiling and running a sample code I am getting output shown below. Time is 00 here but in actual project it is consuming sufficient time.
gcc xor_scalar.c -o xor_scalar
pan88: ./xor_scalar
1110111100 Time used 0.000000 seconds.
1110111100
How can I start writing a corresponding SIMD code for SSE4.2
The Intel Compiler and any OpenMP compiler support #pragma simd and #pragma omp simd, respectively. These are your best bet to get the compiler to do SIMD codegen for you. If that fails, you can use intrinsics or, as a means of last resort, inline assembly.
Note the the printf function calls will almost certainly interfere with vectorization, so you should remove them from any loops in which you want to see SIMD.
I've recently started to play around with OpenMP and like it very much.
I am a just-for-fun Classic-VB programmer and like coding functions for my VB programs in C. As such, I use Windows 7 x64 and GCC 4.7.2.
I usually set up all my C functions in one large C file and then compile a DLL out of it. Now I would like to use OpenMP in my DLL.
First of all, I set up a simple example and compiled an exe file from it:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int n = 520000;
int i;
int a[n];
int NumThreads;
omp_set_num_threads(4);
#pragma omp parallel for
for (i = 0; i < n; i++)
{
a[i] = 2 * i;
NumThreads = omp_get_num_threads();
}
printf("Value = %d.\n", a[77]);
printf("Number of threads = %d.", NumThreads);
return(0);
}
I compile that using gcc -fopenmp !MyC.c -o !MyC.exe and it works like a charm.
However, when I try to use OpenMP in my DLL, it fails. For example, I set up this function:
__declspec(dllexport) int __stdcall TestAdd3i(struct SAFEARRAY **InArr1, struct SAFEARRAY **InArr2, struct SAFEARRAY **OutArr) //OpenMP Test
{
int LengthArr;
int i;
int *InArrElements1;
int *InArrElements2;
int *OutArrElements;
LengthArr = (*InArr1)->rgsabound[0].cElements;
InArrElements1 = (int*) (**InArr1).pvData;
InArrElements2 = (int*) (**InArr2).pvData;
OutArrElements = (int*) (**OutArr).pvData;
omp_set_num_threads(4);
#pragma omp parallel for private(i)
for (i = 0; i < LengthArr; i++)
{
OutArrElements[i] = InArrElements1[i] + InArrElements2[i];
}
return(omp_get_num_threads());
}
The structs are defined, of course. I compile that using
gcc -fopenmp -c -DBUILD_DLL dll.c -o dll.o
gcc -fopenmp -shared -o mydll.dll dll.o -lgomp -Wl,--add-stdcall-alias
The compiler and linker do not complain (not even warnings come up) and the dll file is actually being built. But as I try to call the function from within VB, the VB compiler claims the the DLL file could not be found (run-time error 53). The strange thing about that is that as soon as one single OpenMP "command" is present inside the .c file, the VB compiler claims a missing DLL even if I call a function that does not even contain a single line of OpenMP code. When I comment all OpenMP stuff out, the function works as expected, but doesn't use OpenMP for parallelization, of course.
What is wrong here? Any help appreciated, thanks in advance! :-)
The problem most probably in this case is LD_LIBRARY_PATH is not set . You must use set LD_LIBRARY_PATH to the path that contains the dll or the system will not be able to find it and hence complains about the same