I'm new in frama-c. I tried to run value analysis plugin on the following c code with openmp directives :
static void kernel_2mm(int ni, int nj, int nk, int nl, float alpha,
float beta, float *tmp, float *A, float *B, float *C, float *D) {
int i, j, k;
/* D := alpha*A*B*C + beta*D */
#pragma omp parallel for collapse(2)
for (i = 0; i < ni; i++)
for (j = 0; j < nj; j++) {
tmp[i * nj + j] = 0.0;
for (k = 0; k < nk; ++k)
tmp[i * nj + j] += alpha * A[i * nk + k] * B[k * nj + j];
}
#pragma omp parallel for collapse(2)
for (i = 0; i < ni; i++)
for (j = 0; j < nl; j++) {
D[i * nl + j] *= beta;
for (k = 0; k < nj; ++k)
D[i * nl + j] += tmp[i * nj + k] * C[k * nl + j];
}
}
But I got the following errors:
rouki#rouki-VirtualBox:~/Téléchargements/frama-c$ frama-c -val 2mm_mp.c
[kernel] Parsing FRAMAC_SHARE/libc/__fc_builtin_for_normalization.i
(no preprocessing)
[kernel] Parsing 2mm_mp.c (with preprocessing)
[kernel] syntax error at 2mm_mp.c:78:
76 int i, j, k;
77 /* D := alpha*A*B*C + beta*D */
78 #pragma omp parallel for collapse(2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
79 for (i = 0; i < ni; i++)
80 for (j = 0; j < nj; j++) {
[kernel] Frama-C aborted: invalid user input.
When I tried to add -fopenmp flag to the preprocesseur options with:
frama-c -machdep gcc_x86_64 -val -cpp-command 'gcc -fopenmp -C -E -I. ' 2mm_mp.c
I got another error message:
[kernel] Parsing FRAMAC_SHARE/libc/__fc_builtin_for_normalization.i
(no preprocessing)
[kernel] warning: your preprocessor is not known to handle option `-nostdinc'.
If pre-processing fails because of it, please add
-no-cpp-frama-c-compliant option to Frama-C's command-line.
If you do not want to see this warning again, explicitly use option
-cpp-frama-c-compliant.
[kernel] warning: your preprocessor is not known to handle option `-dD'.
If pre-processing fails because of it, please add -no-cpp-frama-c-compliant
option to Frama-C's command-line.
If you do not want to see this warning again, explicitly use option
-cpp-frama-c-compliant.
[kernel] Parsing 2mm_mp.c (with preprocessing)
[kernel] warning: trying to preprocess annotation with an
unknown preprocessor.
[kernel] syntax error at 2mm_mp.c:78:
76 int i, j, k;
77 /* D := alpha*A*B*C + beta*D */
78 #pragma omp parallel for collapse(2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
79 for (i = 0; i < ni; i++)
80 for (j = 0; j < nj; j++) {
[kernel] Frama-C aborted: invalid user input.
How to make it that frama-c can analyze codes openmp?
Is there a way to force frama-c to use another compiler than gcc (eg: clang, pgcc)?
I use frama-c Phosphorus-20170501 version, with gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5).
Answer to first question (how to make it so that Frama-C can analyze codes with openmp?)
OpenMP pragmas are currently (up to and including Frama-C 16 Sulfur) not supported by Frama-C.
Frama-C tries to parse the pragmas it encounters, and in some cases it will just ignore them, but in other cases (as in the one you encountered) it will try to parse them and fail. Such pragmas are not part of the C standard and constitute compiler extensions that are implementation-defined. Some pragmas, such as #pragma pack(), are supported by Frama-C, on a case by case basis.
Also note that the usage of -cpp-command is no longer recommended if you can use -cpp-extra-args instead. In your case, using this would mean using -cpp-extra-args="-fopenmp". Not that it would help much here, since those pragmas are not supported anyway, but it should avoid the extra warnings you mentioned.
I'm afraid that, currently, the best solution would consist in manually commenting out such pragmas, and then trying to parse the sources again.
Answer to second question (is there a way to force frama-c to use another compiler than gcc (eg: clang, pgcc)?)
Yes, and using -cpp-command as you did is indeed the way to do it. But a good understanding of the C compilation chain is helpful here. In particular, an often recommended approach to deal with some architecture-specific issues (such as custom stdlib headers and non-standard features), is to use the compiler to produce preprocessed code (eg, gcc -E <inputs> -o file.i), and then giving that file to Frama-C.
Note that in the case of OpenMP in particular, the pragmas used by GCC are not removed by preprocessing (which is logical, since it uses those pragmas after preprocessing, during the compilation itself), so it wouldn't help in your case. But it does help, for instance, when using MSVC-specific code that includes several stdlib headers from the Microsoft SDK that are incompatible with those from the GNU libc).
Finally, remember that Frama-C uses gcc (or another compiler) only for preprocessing the sources; the rest of the compilation chain is not used. Therefore, it is not often the case that switching from GCC and Clang changes the result, since both implement very similar features in terms of preprocessing. Again, it is often possible to use exclusively -cpp-extra-args instead of -cpp-command, which was recommended mainly when -cpp-extra-args did not exist yet.
Related
I have the following 4x4 matrix-vector multiply code:
double const __restrict__ a[16];
double const __restrict__ x[4];
double __restrict__ y[4];
//#pragma GCC unroll 1 - does not work either
#pragma GCC nounroll
for ( int j = 0; j < 4; ++j )
{
double const* __restrict__ aj = a + j * 4;
double const xj = x[j];
#pragma GCC ivdep
for ( int i = 0; i < 4; ++i )
{
y[i] += aj[i] * xj;
}
}
I compile with -O3 -mavx flags. The inner loop is vectorized (single FMAD). However, gcc (7.2) keeps unrolling the outer loop 4 times, unless I use -O2 or lower optimization.
Is there a way to override -O3 unrolling of a particular loop?
NB. Similar #pragma nounroll works if I use Intel icc.
According to the documentation, #pragma GCC unroll 1 is supposed to work, if you place it just so. If it doesn't then you should submit a bug report.
Alternatively, you can use a function attribute to set optimizations, I think:
void myfn () __attribute__((optimize("no-unroll-loops")));
For concise functions
sans full and partial loop unrolling
when required
the following function attribute
please try.
__attribute__((optimize("Os")))
Consider the following toy example, where A is an n x 2 matrix stored in column-major order and I want to compute its column sum. sum_0 only computes sum of the 1st column, while sum_1 does the 2nd column as well. This is really an artificial example, as there is essentially no need to define two functions for this task (I can write a single function with a double loop nest where the outer loop iterates from 0 to j). It is constructed to demonstrate the template problem I have in reality.
/* "test.c" */
#include <stdlib.h>
// j can be 0 or 1
static inline void sum_template (size_t j, size_t n, double *A, double *c) {
if (n == 0) return;
size_t i;
double *a = A, *b = A + n;
double c0 = 0.0, c1 = 0.0;
#pragma omp simd reduction (+: c0, c1) aligned (a, b: 32)
for (i = 0; i < n; i++) {
c0 += a[i];
if (j > 0) c1 += b[i];
}
c[0] = c0;
if (j > 0) c[1] = c1;
}
#define macro_define_sum(FUN, j) \
void FUN (size_t n, double *A, double *c) { \
sum_template(j, n, A, c); \
}
macro_define_sum(sum_0, 0)
macro_define_sum(sum_1, 1)
If I compile it with
gcc -O2 -mavx test.c
GCC (say the latest 8.2), after inlining, constant propagation and dead code elimination, would optimize out code involving c1 for function sum_0 (Check it on Godbolt).
I like this trick. By writing a single template function and passing in different configuration parameters, an optimizing compiler can generate different versions. It is much cleaner than copying-and-pasting a big proportion of the code and manually define different function versions.
However, such convenience is lost if I activate OpenMP 4.0+ with
gcc -O2 -mavx -fopenmp test.c
sum_template is inlined no more and no dead code elimination is applied (Check it on Godbolt). But if I remove flag -mavx to work with 128-bit SIMD, compiler optimization works as I expect (Check it on Godbolt). So is this a bug? I am on an x86-64 (Sandybridge).
Remark
Using GCC's auto-vectorization -ftree-vectorize -ffast-math would not have this issue (Check it on Godbolt). But I wish to use OpenMP because it allows portable alignment pragma across different compilers.
Background
I write modules for an R package, which needs be portable across platforms and compilers. Writing R extension requires no Makefile. When R is built on a platform, it knows what the default compiler is on that platform, and configures a set of default compilation flags. R does not have auto-vectorization flag but it has OpenMP flag. This means that using OpenMP SIMD is the ideal way to utilize SIMD in an R package. See 1 and 2 for a bit more elaboration.
The simplest way to solve this problem is with __attribute__((always_inline)), or other compiler-specific overrides.
#ifdef __GNUC__
#define ALWAYS_INLINE __attribute__((always_inline)) inline
#elif defined(_MSC_VER)
#define ALWAYS_INLINE __forceinline inline
#else
#define ALWAYS_INLINE inline // cross your fingers
#endif
ALWAYS_INLINE
static inline void sum_template (size_t j, size_t n, double *A, double *c) {
...
}
Godbolt proof that it works.
Also, don't forget to use -mtune=haswell, not just -mavx. It's usually a good idea. (However, promising aligned data will stop gcc's default -mavx256-split-unaligned-load tuning from splitting 256-bit loads into 128-bit vmovupd + vinsertf128, so code gen for this function is fine with tune=haswell. But normally you want this for gcc to auto-vectorize any other functions.
You don't really need static along with inline; if a compiler decides not to inline it, it can at least share the same definition across compilation units.
Normally gcc decides to inline or not according to function-size heuristics. But even setting -finline-limit=90000 doesn't get gcc to inline with your #pragma omp (How do I force gcc to inline a function?). I had been guessing that gcc didn't realize that constant-propagation after inlining would simplify the conditional, but 90000 "pseudo-instructions" seems plenty big. There could be other heuristics.
Possibly OpenMP sets some per-function stuff differently in ways that could break the optimizer if it let them inline into other functions. Using __attribute__((target("avx"))) stops that function from inlining into functions compiled without AVX (so you can do runtime dispatching safely, without inlining "infecting" other functions with AVX instructions across if(avx) conditions.)
One thing OpenMP does that you don't get with regular auto-vectorization is that reductions can be vectorized without enabling -ffast-math.
Unfortunately OpenMP still doesn't bother to unroll with multiple accumulators or anything to hide FP latency. #pragma omp is a pretty good hint that a loop is actually hot and worth spending code-size on, so gcc should really do that, even without -fprofile-use.
So especially if this ever runs on data that's hot in L2 or L1 cache (or maybe L3), you should do something to get better throughput.
And BTW, alignment isn't usually a huge deal for AVX on Haswell. But 64-byte alignment does matter a lot more in practice for AVX512 on SKX. Like maybe 20% slowdown for misaligned data, instead of a couple %.
(But promising alignment at compile time is a separate issue from actually having your data aligned at runtime. Both are helpful, but promising alignment at compile time makes tighter code with gcc7 and earlier, or on any compiler without AVX.)
I desperately needed to resolve this issue, because in my real C project, if no template trick were used for auto generation of different function versions (simply called "versioning" hereafter), I would need to write a total of 1400 lines of code for 9 different versions, instead of just 200 lines for a single template.
I was able to find a way out, and am now posting a solution using the toy example in the question.
I planed to utilize an inline function sum_template for versioning. If successful, it occurs at compile time when a compiler performs optimization. However, OpenMP pragma turns out to fail this compile time versioning. The option is then to do versioning at the pre-processing stage using macros only.
To get rid of the inline function sum_template, I manually inline it in the macro macro_define_sum:
#include <stdlib.h>
// j can be 0 or 1
#define macro_define_sum(FUN, j) \
void FUN (size_t n, double *A, double *c) { \
if (n == 0) return; \
size_t i; \
double *a = A, * b = A + n; \
double c0 = 0.0, c1 = 0.0; \
#pragma omp simd reduction (+: c0, c1) aligned (a, b: 32) \
for (i = 0; i < n; i++) { \
c0 += a[i]; \
if (j > 0) c1 += b[i]; \
} \
c[0] = c0; \
if (j > 0) c[1] = c1; \
}
macro_define_sum(sum_0, 0)
macro_define_sum(sum_1, 1)
In this macro-only version, j is directly substituted by 0 or 1 at during macro expansion. Whereas in the inline function + macro approach in the question, I only have sum_template(0, n, a, b, c) or sum_template(1, n, a, b, c) at pre-processing stage, and j in the body of sum_template is only propagated at the later compile time.
Unfortunately, the above macro gives error. I can not define or test a macro inside another (see 1, 2, 3). The OpenMP pragma starting with # is causing problem here. So I have to split this template into two parts: the part before the pragma and the part after.
#include <stdlib.h>
#define macro_before_pragma \
if (n == 0) return; \
size_t i; \
double *a = A, * b = A + n; \
double c0 = 0.0, c1 = 0.0;
#define macro_after_pragma(j) \
for (i = 0; i < n; i++) { \
c0 += a[i]; \
if (j > 0) c1 += b[i]; \
} \
c[0] = c0; \
if (j > 0) c[1] = c1;
void sum_0 (size_t n, double *A, double *c) {
macro_before_pragma
#pragma omp simd reduction (+: c0) aligned (a: 32)
macro_after_pragma(0)
}
void sum_1 (size_t n, double *A, double *c) {
macro_before_pragma
#pragma omp simd reduction (+: c0, c1) aligned (a, b: 32)
macro_after_pragma(1)
}
I no long need macro_define_sum. I can define sum_0 and sum_1 straightaway using the defined two macros. I can also adjust the pragma appropriately. Here instead of having a template function, I have templates for code blocks of a function and can reuse them with ease.
The compiler output is as expected in this case (Check it on Godbolt).
Update
Thanks for the various feedback; they are all very constructive (this is why I love Stack Overflow).
Thanks Marc Glisse for point me to Using an openmp pragma inside #define. Yeah, it was my bad to not have searched this issue. #pragma is an directive, not a real macro, so there must be some way to put it inside a macro. Here is the neat version using the _Pragma operator:
/* "neat.c" */
#include <stdlib.h>
// stringizing: https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html
#define str(s) #s
// j can be 0 or 1
#define macro_define_sum(j, alignment) \
void sum_ ## j (size_t n, double *A, double *c) { \
if (n == 0) return; \
size_t i; \
double *a = A, * b = A + n; \
double c0 = 0.0, c1 = 0.0; \
_Pragma(str(omp simd reduction (+: c0, c1) aligned (a, b: alignment))) \
for (i = 0; i < n; i++) { \
c0 += a[i]; \
if (j > 0) c1 += b[i]; \
} \
c[0] = c0; \
if (j > 0) c[1] = c1; \
}
macro_define_sum(0, 32)
macro_define_sum(1, 32)
Other changes include:
I used token concatenation to generate function name;
alignment is made a macro argument. For AVX, a value of 32 means good alignment, while a value of 8 (sizeof(double)) essentially implies no alignment. Stringizing is required to parse those tokens into strings that _Pragma requires.
Use gcc -E neat.c to inspect pre-processing result. Compilation gives desired assembly output (Check it on Godbolt).
A few comments on Peter Cordes informative answer
Using complier's function attributes. I am not a professional C programmer. My experiences with C come merely from writing R extensions. The development environment determines that I am not very familiar with compiler attributes. I know some, but don't really use them.
-mavx256-split-unaligned-load is not an issue in my application, because I will allocate aligned memory and apply padding to ensure alignment. I just need to promise compiler of the alignment so that it can generate aligned load / store instructions. I do need to do some vectorization on unaligned data, but that contributes to a very limited part of the whole computation. Even if I get a performance penalty on split unaligned load it won't be noticed in reality. I also don't compiler every C file with auto vectorization. I only do SIMD when the operation is hot on L1 cache (i.e., it is CPU-bound not memory-bound). By the way, -mavx256-split-unaligned-load is for GCC; what is it for other compilers?
I am aware of the difference between static inline and inline. If an inline function is only accessed by one file, I will declare it as static so that compiler does not generate a copy of it.
OpenMP SIMD can do reduction efficiently even without GCC's -ffast-math. However, it does not use horizontal addition to aggregate results inside the accumulator register in the end of the reduction; it runs a scalar loop to add up each double word (see code block .L5 and .L27 in Godbolt output).
Throughput is a good point (especially for floating-point arithmetics which has relatively big latency but high throughput). My real C code where SIMD is applied is a triple loop nest. I unroll outer two loops to enlarge the code block in the innermost loop to enhance throughput. Vectorization of the innermost one is then sufficient. With the toy example in this Q & A where I just sum an array, I can use -funroll-loops to ask GCC for loop unrolling, using several accumulators to enhance throughput.
On this Q & A
I think most people would treat this Q & A in a more technical way than me. They might be interested in using compiler attributes or tweaking compiler flags / parameters to force function inlining. Therefore, Peter's answer as well as Marc's comment under the answer is still very valuable. Thanks again.
I am quite new to meson and C, please forgive me if the answer to this question is trivial ...
I want to use OpenMP in a C project, and I am using meson as a build tool.
I want to compile the parallel for example from this tutorial.
My main.c looks very similar:
#include <omp.h>
#define N 1000
#define CHUNKSIZE 100
int main(int argc, char *argv[]) {
int i, chunk;
float a[N], b[N], c[N];
/* Some initializations */
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;
chunk = CHUNKSIZE;
#pragma omp parallel for \
shared(a,b,c,chunk) private(i) \
schedule(static,chunk)
for (i=0; i < N; i++)
c[i] = a[i] + b[i];
return 0;
}
My short meson.build file contains this:
project('openmp_with_meson', 'c')
# add_project_arguments('-fopenmp', language: 'c')
exe = executable('some_exe', 'src/main.c') #, c_args: '-fopenmp')
I commented out the c_args keyword in the call to executable here.
Now I end up with the following scenarios:
without '-fopenmp' option, I get the warning, that the pragma is unknown and will be ignored (as I would expect): ../src/main.c:15:0: warning: ignoring pragma omp parallel [-Wunknown-pragmas] #pragma omp parallel for
with the option c_args: '-fopenmp' inserted, I do not get the above warning anymore, instead I get errors for undefined references to GOMP_parallel, omp_get_num_threads and omp_get_thread_num, and nothing gets built
when I use gcc manually with gcc -Wall -o manually_with_gcc ../src/main.c -fopenmp the program compiles and executes without any errors.
Can anyone tell me how to get the executable to compile with meson?
Meson 0.46 or later
Meson 0.46 (released Apr 23, 2018) added OpenMP support. So, if you have meson 0.46 or later,
project('openmp_with_meson', 'c')
omp = dependency('openmp')
exe = executable('some_exe', 'src/main.c',
dependencies : omp)
Should work with both GCC and Clang.
Meson 0.45 or earlier
If you happen to have older version, Debian Stretch, Ubuntu Bionic (18.04LTS), or Fedora 27, you can do the following:
You need another keyword arg link_args : '-fopenmp' for executable().
exe = executable('some_exe', 'src/main.c',
c_args: '-fopenmp',
link_args : '-fopenmp')
Meson builds C program in two phases, compiling and linking. You can pass extra arguments with c_args for compiling and link_args for linking.
The option -fopenmp enables OpenMP directives while compiling, and
the flag also arranges for automatic linking of the OpenMP runtime
library.
That is, -fopenmp is dual purpose option.
Now, the above is simple and good. Once you understand it, however, you can also compile your program with -fopenmp to activate the OpenMP directives and link the OpenMP libraries by yourself without -fopenmp to the link_args.
Here is a complete meson.build:
project('openmp_with_meson', 'c')
cc = meson.get_compiler('c')
libgomp = cc.find_library('gomp')
exe = executable('some_exe', 'src/main.c',
c_args: '-fopenmp',
dependencies : libgomp)
Meson >= 0.46 now has a builtin for this (docs):
openmp = dependency('openmp') # meson builtin
I'm having a bit of trouble compiling and running my .c file in terminal. First, when compiling, I see:
HW3.c: In function ‘main’:
HW3.c:87:5: error: ‘for’ loop initial declarations are only allowed in C99 mode
for(int j = 0; j < 10; j++) {
^
HW3.c:87:5: note: use option -std=c99 or -std=gnu99 to compile your code
HW3.c:100:5: error: ‘for’ loop initial declarations are only allowed in C99 mode
for(int j = 0; j < 10; j++) {
^
All of my variables are declared and assigned at the beginning of the program, including j, so I'm not sure why I'm seeing an error about 'for' loop initial declarations.
Secondly, when attempting to run my program, I type:
./a.out HW3.c
and see error
./a.out: Command not found.
What could possibly be the issue here? Is it not running because of the error in compiling? I'm sure I have the command right, right..? Let me know if you need to see the whole program to help, it's not too long, I could copy it over. Thanks!
If j is already declared at the beginning of the program, then remove the int part of for (int j:
for(j = 0; j < 10; j++) {
You can declare j inside the for loop, as you seem to have attempted to do, but you would need to tell your compiler to support a newer revision of the C standard.
You need to add a more recent C standard revision to your compiler options. Try adding the flag --std=c99 and it should work.
As for your second problem, a.out is the executable produced by the compiler. If there are errors in the program, it won't produce an executable, so you have to fix the errors.
You can also specify the name of the executable with the -o flag:
gcc -std=c99 HW3.c -o HW3
This will produce an executable named HW3.
I am having an extraordinary problem with my current Code::Blocks (GNU GCC compiler) setup. The compiler seems to selectively run some GSL functions, but seems for some reason to have great problems when commanded to execute other GSL functions.
For example, I have lifted the following code from the following destination:
https://www.gnu.org/software/gsl/manual/html_node/Example-programs-for-matrices.html
I assume that because the code is derived from the official GNU website, that the code is correct:
#include <math.h>
#include <stdio.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_blas.h>
int
main (void)
{
size_t i,j;
gsl_matrix *m = gsl_matrix_alloc (10, 10);
for (i = 0; i < 10; i++)
for (j = 0; j < 10; j++)
gsl_matrix_set (m, i, j, sin (i) + cos (j));
for (j = 0; j < 10; j++)
{
gsl_vector_view column = gsl_matrix_column (m, j);
double d;
d = gsl_blas_dnrm2 (&column.vector);
printf ("matrix column %d, norm = %g\n", j, d);
}
gsl_matrix_free (m);
return 0;
}
From debugging, I have learned that the source of the error is the following line:
d = gsl_blas_dnrm2 (&column.vector);
The compiler crashes at this point and prints the following error message:
Process returned -1073741819 <0xC0000005>
I have spent a lot of time trying to discover the source of the bug but have sadly not had much success. I am generally not sure why there is a crash at all. The debugger prints no warnings or error messages.
I'm going to suggest that perhaps you have mismatched headers and libraries. Perhaps there is more than one version of GSL installed. You have the headers from one version referenced in your source, and the linker is referencing the libs from other version.
I went looking up the typedef of the gsl_vector_view and ended up here, and you may be even able to discern that this version doesn't even support the vector member of that struct type.
You will get this 0xC0000005 error, typically when you use some unitialised pointer. Its not that your pointer is uninitialised here though .. I'd say what's happening is the 'vector' bit of &column.vector is being interpreted as something other than what is intended.
In summary, I think this is some kind of environmental issue, perhaps with your linker settings? There are some details on how to configure these here: http://www.learncpp.com/cpp-tutorial/a3-using-libraries-with-codeblocks/
Hope this helps.