MKL BLAS functions not behaving as expected

MKL BLAS functions not behaving as expected - c

I can't get Intel MKL to work as it should from C.
I have the following test program:
#include "stdafx.h"
#include"mkl.h"
int main()
{
int one = 1;
int ten = 10;
double copy[10];
double threes[10];
for (int i = 0; i < 10; ++i) threes[i] = 3;
dcopy(&ten, threes, &one, copy, &one);
double l1norm;
l1norm = dasum(&ten, threes, &one);
return 0;
}
which is building and linking fine but not doing what I intended. Specifically at the return line the array "copy" continues to be full of what was there when it was declared and l1norm equal to 0.
I am linking to the libraries : mkl_blas95_ilp64.lib, mkl_core_dll.lib, mkl_intel_ilp64_dll.lib and mkl_intel_thread_dll.lib.
I'm also getting similar problems when running third-party code that calls MKL so I assume the problem is how I have the build configured (in Visual Studio 2015).
The equivalent Fortran program runs fine.

Please check the libraries you link when porting from Fortran to C/C++. MKL requires different libraries and compiling flags with different compilers and settings. At least mkl_blas95_ilp64.lib is not required with C compiler.
Also ILP64 is not common compared to the default model LP64.
MKL Link Line Advisor is a tool provided by Intel to solve this issue. You could use it to check if your libraries and compiling flags are correct.
https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

Related

How to link a lib written in D to use it with a program written in C, under Windows, using MinGW GCC?

I would like to use a library written in D for a C program compilable with MinGW GCC, for Windows. Here are the codes:
dll.d
extern (C) int dsquare(int n) nothrow
{
return n * n;
}
main.c
#include <stdio.h>
int main()
{
int res = dsquare(6); // Expect '36'
printf("res = %d\n", res);
return 0;
}
There is a tutorial on D's site, but it seems to target only Linux. Indeed, no explanation is given for creating such a dynamic D library for Windows and MinGW users.
D's documentation also says that the -shared option should generate a DLL version of the D code, but in my case, it generates an executable, I don't know why.
Also, anything that seems to generate files to be linked targets MVSC formats and nothing seems to be suitable for MinGW GCC compilers.
So, how can I generate a "GCC-friend" DLL with D, so that I can link it to my C program without having to use another compiler, such as GDC or LDC, via gcc main.c -o main -ldll -L. (I guess)?

I attached link with short explanation. Link D onto C is not so straightforward as C to D. Check D.org page here:
https://dlang.org/spec/betterc.html

Force gcc to use syscalls

So I am currently learning assembly language (AT&T syntax). We all know that gcc has an option to generate assembly code from C code with -S argument. Now, I would like to look at some code, how it looks in assembly. The problem is, on laboratories we compile it with as+ld, and as for now, we cannot use C libraries. So for example we cannot use printf. We should do it by syscalls (32 bit is enough). And now I have this code in C:
#include <stdio.h>
int main()
{
int a = 5;
int b = 3;
int c = a + b;
printf("%d", c);
return 0;
}
This is simple code, so I know how it will look with syscalls. But if I have some more complicated code, I don't want to mess around and replace every call printf and modify other registers, cuz gcc generated code for printf, and I should have it with syscalls. So can I somehow make gcc generate assembly code with syscalls (for example for I/O (console, files)), not with C libs?

Under Linux there exist the macro family _syscallX to generate a syscall where the X names the number of parameters. It is marked as obsolete, but IMHO still working. E.g., the following code should work (not tested here):
_syscall3(int,syswrite,int,handle,char*,str,int len);
// ---
char str[]="Hello, world!\n";
// file handle 1 is stdout
syswrite(1,str,14);

cacosf (Complex arc cos) function in C returns indefinite

I have an algorithm coded in MATLAB, which contains complex arc cos of some value (computation requires arccos of 15, which is approximately 3.4i). I want to code C or C++ counterpart of this code running on my Windows 7 PC. Actually, I want to produce it as a mex function compiled with Visual Studio C++.
I included "complex.h" and used cacosf function (complex arccos returning float _Complex) but I could not compile it as a mex function because Visual C++ compiler does not have "complex.h" support. However, mex file can take libraries as input, so I can compile my c code with another compiler that MATLAB does support (for example mingw, I integrated it to matlab with gnumex utility.) I downloaded Bloodshed C++ IDE which uses mingw at backend, I can compile my c++ code. The following C++ code represents a similar operation to my goal:
#include <stdio.h>
#include <complex.h>
int main() {
float _Complex myComplex;
myComplex = cacosf(5);
printf("Complex number result of acos(5) is : %f + %fi \r\n",crealf(myComplex),cimagf(myComplex));
return 0;
}
The output should be:
Complex number result of acos(5) is : 0.000000 + -2.292432i
However I get
Complex number result of acos(5) is : -1.#IND00, -0.000000
When I compile my C++ code with Linux GCC on Ubuntu 14.04 computer with Eclipse CDT Luna I get
The output should be:
Complex number result of acos(5) is : 0.000000 + -2.292432i
Where can I be wrong? Why can't I compile this code in Windows + mingw setup?
Note: I can compute cacosf(0) as 1.570796 + -0.000000 when I use mingw.

What version of mingwrt are you using? With mingwrt-3.21.1, the following works for me, (cross-compiling on a Linux host, and running under wine):
$ cat foo.c
#include <stdio.h>
#include <complex.h>
int main()
{
double _Complex Z = cacos(5.0);
printf( "arcos(5) = (%g, %gi)\n", __real__ Z, __imag__ Z );
return 0;
}
$ mingw32-gcc -o foo.exe foo.c
$ ./foo.exe
arcos(5) = (0, -2.29243i)
This seems to be consistent with your expected result. However, if you use any mingwrt version pre-dating mingwrt-3.21, (and the less said about utterly broken mingwrt-4.x the better), then there is a known bug resulting from arbitrarily deeming any purely real cacos() argument value greater than (1.0, 0.0i) to be outside the valid domain, (as would be the case for acos() on its real part), which would yield the result you report.

Visual C++, as the name says, is a C++ compiler. C++ uses the <complex> header and std::complex<float> type. Since C++ has overloading, you can call std::acos for complex values too.
Your code is in fact C, which is no longer supported by MSVC++. (They stopped doing that back in 1996 or so)

Why does this SIMD example code in C compile with minGW but the executable doesn't run on my windows machine?

I'm learning the basics of SIMD so I was given a simple code snippet to see the principle at work with SSE and SSE2.
I recently installed minGW to compile C code in windows with gcc instead of using the visual studio compiler.
The objective of the example is to add two floats and then multiply by a third one.
The headers included are the following (which I guess are used to be able to use the SSE intrinsics):
#include <time.h>
#include <stdio.h>
#include <xmmintrin.h>
#include <pmmintrin.h>
#include <time.h>
#include <sys/time.h> // for timing
Then I have a function to check what time it is, to compare time between calculations:
double now(){
struct timeval t; double f_t;
gettimeofday(&t, NULL);
f_t = t.tv_usec; f_t = f_t/1000000.0; f_t +=t.tv_sec;
return f_t;
}
The function to do the calculation in the "scalar" sense is the following:
void run_scalar(){
unsigned int i;
for( i = 0; i < N; i++ ){
rs[i] = (a[i]+b[i])*c[i];
}
}
Here is the code for the sse2 function:
void run_sse2(){
unsigned int i;
__m128 *mm_a = (__m128 *)a;
__m128 *mm_b = (__m128 *)b;
__m128 *mm_c = (__m128 *)c;
__m128 *mm_r = (__m128 *)rv;
for( i = 0; i <N/4; i++)
mm_r[i] = _mm_mul_ps(_mm_add_ps(mm_a[i],mm_b[i]),mm_c[i]);
}
The vectors are defined the following way (N is the size of the vectors and it is defined elsewhere) and a function init() is called to initialize them:
float a[N] __attribute__((aligned(16)));
float b[N] __attribute__((aligned(16)));
float c[N] __attribute__((aligned(16)));
float rs[N] __attribute__((aligned(16)));
float rv[N] __attribute__((aligned(16)));
void init(){
unsigned int i;
for( i = 0; i < N; i++ ){
a[i] = (float)rand () / RAND_MAX / N;
b[i] = (float)rand () / RAND_MAX / N;
c[i] = (float)rand () / RAND_MAX / N;
}
}
Finally here is the main that calls the functions and prints the results and computing time.
int main(){
double t;
init();
t = now();
run_scalar();
t = now()-t;
printf("S = %10.9f Temps du code scalaire : %f seconde(s)\n",1e5*sum(rs),t);
t = now();
run_sse2();
t = now()-t;
printf("S = %10.9f Temps du code vectoriel 2: %f seconde(s)\n",1e5*sum(rv),t);
}
For sum reason if I compile this code with a command line of "gcc -o vec vectorial.c -msse -msse2 -msse3" or "mingw32-gcc -o vec vectorial.c -msse -msse2 -msse3"" it compiles without any problems, but for some reason I can't run it in my windows machine, in the command prompt I get an "access denied" and a big message appears on the screen saying "This app can't run on your PC, to find a version for your PC, check with the software publisher".
I don't really understand what is going on, neither do I have much experience with MinGW or C (just an introductory course to C++ done on Linux machines). I've tried playing around with different headers because I thought maybe I was targeting a different processor than the one on my PC but couldn't solve the issue. Most of the info I found was confusing.
Can someone help me understand what is going on? Is it a problem in the minGW configuration that is compiling in targeting a Linux platform? Is it something in the code that doesn't have the equivalent in windows?
I'm trying to run it on a 64 bit Windows 8.1 pc
Edit: Tried the configuration suggested in the site linked below. The output remains the same.
If I try to run through MSYS I get a "Bad File number"
If I try to run throught the command prompt I get Access is Denied.
I'm guessing there's some sort of bug arising from permissions. Tried turning off the antivirus and User Account control but still no luck.
Any ideas?

There is nothing wrong with your code, besides, you did not provide the definition of sum() or N which is, however, not a problem. The switches -msse -msse2 appear to be not required.
I was able to compile and run your code on Linux (Ubuntu x86_64, compiled with gcc 4.8.2 and 4.6.3, on Atom D2700 and AMD Athlon LE-1640) and Windows7/64 (compiled with gcc 4.5.3 (32bit) and 4.8.2 (64bit), on Core i3-4330 and Core i7-4960X). It was running without problem.
Are you sure your CPU supports the required instructions? What exactly was the error code you got? Which MinGW configuration did you use? Out of curiosity, I used the one available at http://win-builds.org/download.html which was very straight-forward.
However, using the optimization flag -O3 created the best result -- with the scalar loop! Also useful are -m64 -mtune=native -s.

Why is clang optimizing out my array even when using -O0 flag?

I am trying to debug the following C Program using GDB:
// Program to generate a user specified number of
// fibonacci numbers using variable length arrays
// Chapter 7 Program 8 2013-07-14
#include <stdio.h>
int main(void)
{
int i, numFibs;
printf("How many fibonacci numbers do you want (between 1 and 75)?\n");
scanf("%i", &numFibs);
if (numFibs < 1 || numFibs > 75)
{
printf("Between 1 and 75 remember?\n");
return 1;
}
unsigned long long int fibonacci[numFibs];
fibonacci[0] = 0; // by definition
fibonacci[1] = 1; // by definition
for(i = 2; i < numFibs; i++)
fibonacci[i] = fibonacci[i-2] + fibonacci[i-1];
for(i = 0; i < numFibs; i++)
printf("%llu ", fibonacci[i]);
printf("\n");
return 0;
}
The issue I am having is when trying to compile the code using:
clang -ggdb3 -O0 -Wall -Werror 7_8_FibonacciVarLengthArrays.c
When I try to run gdb on the a.out file created and I am stepping through the program execution. Anytime after the fibonacci[] array is decalared and I type:
info locals
the result says fibonacci <value optimized out> (until after the first iteration of my for loop) which then results in fibonacci holding the address 0xbffff128 for the rest of the program (but dereferencing that address does not appear to contain any meaningful data).
I am just confused why clang appears to be optimizing out this array when the -O0 flag is used?
I can use gcc to compile this code and the value displays as expected when using GDB....
Any thoughts?
Thank you.

You don't mention which version of clang you are using. I tried it with both 3.2 and a recent SVN install (3.4).
The code generated by the two versions looks pretty similar to me, but the debugging information is different. The clang 3.2 (which comes from a default ubuntu 13.04 install) produces an error when I try to examine fibonacci in gdb:
fibonacci = <error reading variable fibonacci (DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.)>
In the code compiled with clang 3.4, it all works fine. In neither case is the array "optimized out"; it's clearly allocated on the stack.
So I suspect the oddity that you're seeing has more to do with the emission of debugging information than with the actual code.

gdb does not yet support debugging stack allocated variable-length arrays. See https://sourceware.org/gdb/wiki/VariableLengthArray
Use a compile time constant or malloc to allocate fibonacci so that it will be visible to gdb.
See also GDB reports "no symbol in current context" upon array initialization

clang is not "optimizing out" the array at all! The array is declared as a variable-length array on the stack, so it has to be explicitly allocated (using techniques similar to those used by alloca()) when its declaration is reached. The starting address of the array is unknown until that process is complete.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

MKL BLAS functions not behaving as expected - c

Related

How to link a lib written in D to use it with a program written in C, under Windows, using MinGW GCC?

Force gcc to use syscalls

cacosf (Complex arc cos) function in C returns indefinite

Why does this SIMD example code in C compile with minGW but the executable doesn't run on my windows machine?

Why is clang optimizing out my array even when using -O0 flag?

Categories

Resources