My professor send out test code to run on our program. However, the test code itself has a segmentation fault error on compiling. The error happens on the first printf. However if that line is commented out it just occurs on the next line. It sounds like the code works fine for him, so I'm trying to figure out why it's failing for me. I know he's using C while I'm using C++, but even when I try to compile the test code with gcc instead of g++ it still fails. Anyone know why I might be having problems? Thanks! The code is below.
#include <stdio.h>
main()
{ double A[400000][4], b[400000], c[4] ;
double result[4];
int i, j; double s, t;
printf("Preparing test: 4 variables, 400000 inequalities\n");
A[0][0] = 1.0; A[0][1] = 2.0; A[0][2] = 1.0; A[0][3] = 0.0; b[0] = 10000.0;
A[1][0] = 0.0; A[1][1] = 1.0; A[1][2] = 2.0; A[1][3] = 1.0; b[0] = 10000.0;
A[2][0] = 1.0; A[2][1] = 0.0; A[2][2] = 1.0; A[2][3] = 3.0; b[0] = 10000.0;
A[3][0] = 4.0; A[3][1] = 0.0; A[3][2] = 1.0; A[3][3] = 1.0; b[0] = 10000.0;
c[0]=1.0; c[1]=1.0; c[2]=1.0; c[3]=1.0;
for( i=4; i< 100000; i++ )
{ A[i][0] = (12123*i)%104729;
A[i][1] = (47*i)%104729;
A[i][2] = (2011*i)%104729;
A[i][3] = (7919*i)%104729;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + 1 + (i%137);
}
A[100000][0] = 0.0; A[100000][1] = 6.0; A[100000][2] = 1.0;
A[100000][3] = 1.0; b[100000] = 19.0;
for( i=100001; i< 200000; i++ )
{ A[i][0] = (2323*i)%101111;
A[i][1] = (74*i)%101111;
A[i][2] = (2017*i)%101111;
A[i][3] = (7915*i)%101111;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + 2 + (i%89);
}
A[200000][0] = 5.0; A[200000][1] = 2.0; A[200000][2] = 0.0;
A[200000][3] = 1.0; b[200000] = 11.0;
for( i=200001; i< 300000; i++ )
{ A[i][0] = (23123*i)%100003;
A[i][1] = (47*i)%100003;
A[i][2] = (2011*i)%100003;
A[i][3] = (7919*i)%100003;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + 2 + (i%57);
}
A[300000][0] = 1.0; A[300000][1] = 2.0; A[300000][2] = 1.0;
A[300000][3] = 3.0; b[300000] = 20.0;
A[300001][0] = 1.0; A[300001][1] = 0.0; A[300001][2] = 5.0;
A[300001][3] = 4.0; b[300001] = 32.0;
A[300002][0] = 7.0; A[300002][1] = 1.0; A[300002][2] = 1.0;
A[300002][3] = 7.0; b[300002] = 40.0;
for( i=300003; i< 400000; i++ )
{ A[i][0] = (13*i)%103087;
A[i][1] = (99*i)%103087;
A[i][2] = (2012*i)%103087;
A[i][3] = (666*i)%103087;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + 1;
}
printf("Running test: 400000 inequalities, 4 variables\n");
//j = rand_lp(40, &(A[0][0]), &(b[0]), &(c[0]), &(result[0]));
printf("Test: extremal point (%f, %f, %f, %f) after %d recomputation steps\n",
result[0], result[1], result[2], result[3], j);
printf("Answer should be (1,2,3,4)\n End Test\n");
}
Try to change:
double A[400000][4], b[400000], c[4] ;
to
static double A[400000][4], b[400000], c[4] ;
Your declaration of the A array has automatic storage duration which probably means on your system it is stored on the stack. Your total stack for your process is likely to be lower than that and you encountered a stack overflow.
On Linux, you can run the ulimit command:
$ ulimit -s
8192
$
to see the stack size in kB allocated for a process. For example, 8192 kB on my machine.
You have overflowed the limits of the stack. Your prof declares 15MB of data in main's stack frame. That's just too big.
Since the lifetime of an ojbect declared at the top of main is essentially the entire program, just declare the objects as static. That way they'll be in the (relatively limitless) data segment, and have nearly the same lifetime.
Try changing this line:
double A[400000][4], b[400000], c[4] ;
to this:
static double A[400000][4], b[400000], c[4] ;
Related
This question already has answers here:
Creating large arrays in C
(2 answers)
Closed last month.
I am trying to optimize the best I can a DAXPY procedure. However,
there is a mistake in the following code that I cannot spot:
#include <stdio.h>
#include <time.h>
void daxpy(int n, double a, double *x, double *y) {
int i;
double y0, y1, y2, y3, x0, x1, x2, x3;
// loop unrolling
for (i = 0; i < n; i += 4) {
// multiple accumulating registers
x0 = x[i];
x1 = x[i + 1];
x2 = x[i + 2];
x3 = x[i + 3];
y0 = y[i];
y1 = y[i + 1];
y2 = y[i + 2];
y3 = y[i + 3];
y0 += a * x0;
y1 += a * x1;
y2 += a * x2;
y3 += a * x3;
y[i] = y0;
y[i + 1] = y1;
y[i + 2] = y2;
y[i + 3] = y3;
}
}
int main() {
int n = 10000000;
double a = 2.0;
double x[n];
double y[n];
int i;
for (i = 0; i < n; ++i) {
x[i] = (double)i;
y[i] = (double)i;
}
clock_t start = clock();
daxpy(n, a, x, y);
clock_t end = clock();
double time_elapsed = (double)(end - start) / CLOCKS_PER_SEC;
printf("Time elapsed: %f seconds\n", time_elapsed);
return 0;
}
When executing the binary I am getting the following error message:
Segmentation fault: 11
Using LLDB I am getting:
Process 89219 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7ff7bb2b2cc0)
frame #0: 0x0000000100003e4f daxpy.x`main at daxpy.c:38:14
35 double y[n];
36 int i;
37 for (i = 0; i < n; ++i) {
-> 38 x[i] = (double)i;
39 y[i] = (double)i;
40 }
41 clock_t start = clock();
Target 0: (daxpy.x) stopped.
But I can't spot what is the mistake on lines 38 and 39. Any help?
Assuming sizeof(double) == 8 you try to allocate 160MB on the stack. The stack is not usually as big.
You need to allocate it from the heap
size_t n = 10000000;
double a = 2.0;
double *x = malloc(sizeof(*x) * n);
double *y = malloc(sizeof(*y) * n);
for (i = 0; i < n; where n is the array size.
What happens if you
y1 = y[i + 1];
y2 = y[i + 2];
y3 = y[i + 3];
You may access the array outside the bounds.
I'm trying to perform a simple matrix times vector multiplication and for some reason I am getting the wrong sign in my results in a couple of my multiplications. I have no idea why this is happening, any pointers would be greatly appreciated.
Here is my whole code, i.e. the matrix * vector function and a caller function (or whatever it's called) from mex - I'm running the code from Matlab via mex.
#include "mex.h"
void mxv(int m, int n, double *A, double *b, double *c) {
double sum;
int i, j;
for (i = 0; i < m; i++) {
sum = 0.0;
for (j = 0; j < n; j++) {
sum += A[i * n + j] * b[j];
}
c[i] = sum;
}
}
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
double *A, *b, *c;
int i, j, Am, An;
A = mxGetPr(prhs[0]);
Am = (int)mxGetM(prhs[0]);
An = (int)mxGetN(prhs[0]);
c = malloc(Am * sizeof(double));
b = mxGetPr(prhs[1]);
mxv(Am, An, A, b, c);
for (i = 0; i < Am; i++)
printf("c[%d] = %1.4f\n", i, c[i]);
}
I call the mex function with the following inputs:
A = [-865.6634 0 0 0;
0 -17002.6822 0 0;
0 0 -1726.2421 2539.6267;
0 0 -2539.6267 -1726.2421;]
b = [-0.00153521; -0.00011165; -0.00037659; 0.00044981]
The correct result should be:
1.3290
1.8983
1.7924
0.1799
But I get
c[0] = 1.3290
c[1] = 1.8983
c[2] = -0.4923
c[3] = -1.7329
So the first two are correct (c[0] and c[1]), but not the latter two.
I added a bunch of print statements into my code to try and figure out where the error occurs:
#include "mex.h"
void mxv(int m, int n, double *A, double *b, double *c) {
double sum;
int i, j;
for (i = 0; i < m; i++) {
sum = 0.0;
printf("********\n");
for (j = 0; j < n; j++) {
printf("A[%d][%d] = %1.10f\nb[%d] = %1.10f\n", i , j, A[i + n * j], j, b[j]);
printf("A[%d][%d]*b[%d] = %1.10f\n", i, j, j, A[i * n + j] * b[j]);
sum += A[i * n + j] * b[j];
printf("sum = %1.10f\n", sum);
}
c[i] = sum;
}
}
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[]) {
double *A, *b, *c;
int i, j, Am, An;
A = mxGetPr(prhs[0]);
Am = (int)mxGetM(prhs[0]);
An = (int)mxGetN(prhs[0]);
printf("size(A) = (%d,%d)\n", Am, An);
for (i = 0; i < Am; i++) {
for (j = 0; j < An; j++) {
printf("A[%d][%d] = %1.4f\n", i, j, A[i + Am * j]);
}
}
c = malloc(Am *sizeof(double));
b = mxGetPr(prhs[1]);
for (i = 0; i < Am; i++) {
printf("b[%d] = %1.4f\n", i, b[i]);
}
mxv(Am, An, A, b, c);
for (i = 0; i < Am; i++)
printf("c[%d] = %1.4f\n", i, c[i]);
}
I'm pretty confident that I'm getting the right inputs from Matlab:
size(A) = (4,4)
A[0][0] = -865.6634
A[0][1] = 0.0000
A[0][2] = 0.0000
A[0][3] = 0.0000
A[1][0] = 0.0000
A[1][1] = -17002.6822
A[1][2] = 0.0000
A[1][3] = 0.0000
A[2][0] = 0.0000
A[2][1] = 0.0000
A[2][2] = -1726.2421
A[2][3] = 2539.6267
A[3][0] = 0.0000
A[3][1] = 0.0000
A[3][2] = -2539.6267
A[3][3] = -1726.2421
b[0] = -0.0015
b[1] = -0.0001
b[2] = -0.0004
b[3] = 0.0004
But when I look into the matrix-vector multiplication I find that the multiplication resuts are for some reason having the wrong sign in some cases. This is what it prints out for i = 2:
********
A[2][0] = 0.0000000000
b[0] = -0.0015352100
A[2][0]*b[0] = -0.0000000000
sum = 0.0000000000
A[2][1] = 0.0000000000
b[1] = -0.0001116500
A[2][1]*b[1] = -0.0000000000
sum = 0.0000000000
A[2][2] = -1726.2421000000
b[2] = -0.0003765900
A[2][2]*b[2] = 0.6500855124
sum = 0.6500855124
A[2][3] = 2539.6267000000
b[3] = 0.0004498100
A[2][3]*b[3] = -1.1423494859 <- THIS SHOULD BE 1.142349... (no minus sign)
sum = -0.4922639735
********
Something similar happens for the i=3 case.
Thanks in advance!
In:
printf("A[%d][%d] = %1.10f\nb[%d] = %1.10f\n", i , j, A[i + n * j], j, b[j]);
printf("A[%d][%d]*b[%d] = %1.10f\n", i, j, j, A[i * n + j] * b[j]);
The first printf uses A[i + n*j], while the second uses A[i*n + j]. These are transposed positions in the array.
This question already has answers here:
C free() routine and incremented array pointers
(2 answers)
Closed 7 years ago.
Why is this code for the computation of the inner product of two vectors yields a double free or corruption error, when compiled with:
ejspeiro#Eduardo-Alienware-14:~/Dropbox/HPC-Practices$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
The code comes from this reference.
// Computation of the inner product of vectors aa and bb.
#include <stdio.h>
#include <stdlib.h>
int main() {
size_t nn = 100000000;
size_t total_mem_array = nn*sizeof(double);
double *aa;
double *bb;
double ss = 0.0;
aa = (double *) malloc(total_mem_array);
bb = (double *) malloc(total_mem_array);
int ii = 0;
for (ii = 0; ii < nn; ++ii) {
aa[ii] = 1.0;
bb[ii] = 1.0;
}
double sum1 = 0.0;
double sum2 = 0.0;
for (ii = 0; ii < nn/2 - 1; ++ii) {
sum1 += (*(aa + 0))*(*(bb + 0));
sum2 += (*(aa + 1))*(*(bb + 1));
aa += 2;
bb += 2;
}
ss = sum1 + sum2;
free(aa);
free(bb);
return 0;
}
The error is caused because the value passed to free() is not the same value returned by malloc(), as you increment aa and bb.
To correct it you could, for example, define two additional pointer variables that are used only for memory management, i.e. allocation and deallocation. Once memory is acquired by them, assign it to aa and bb.
You can simplify:
for (ii = 0; ii < nn/2 - 1; ++ii) {
sum1 += (*(aa + 0))*(*(bb + 0));
sum2 += (*(aa + 1))*(*(bb + 1));
aa += 2;
bb += 2;
}
to:
for (ii = 0; ii < nn/2 - 1; ++ii) {
sum1 += aa[ii * 2] * bb[ii * 2];
sum2 += aa[ii * 2 + 1] * bb[ii * 2 + 1];
}
which has the dual benefits of avoiding incrementing your pointers which causes your problem, and making your code a whole lot clearer.
I try to solve linear optimization problem of 4 variables and 600000 constraints.
I need to generate a large input. So I need A[600000][4] for constraint's coefficents and b[600000] for the right part. Here is a code to generate 600000 constraints.
int i, j;
int numberOfInequalities = 600000;
double c[4];
double result[4];;
double A[numberOfInequalities][4], b[numberOfInequalities];
printf("\nPreparing test: 4 variables, 600000 inequalities\n");
A[0][0] = 1.0; A[0][1] = 2.0; A[0][2] = 1.0; A[0][3] = 0.0; b[0] = 10000.0;
A[1][0] = 0.0; A[1][1] = 1.0; A[1][2] = 2.0; A[1][3] = 1.0; b[1] = 10000.0;
A[2][0] = 1.0; A[2][1] = 0.0; A[2][2] = 1.0; A[2][3] = 3.0; b[2] = 10000.0;
A[3][0] = 4.0; A[3][1] = 0.0; A[3][2] = 1.0; A[3][3] = 1.0; b[3] = 10000.0;
c[0]=1.0; c[1]=1.0; c[2]=1.0; c[3]=1.0;
for( i=4; i< 100000; i++ )
{
A[i][0] = (12123*i)%104729;
A[i][1] = (47*i)%104729;
A[i][2] = (2011*i)%104729;
A[i][3] = (7919*i)%104729;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + 1 + (i%137);
}
A[100000][0] = 0.0; A[100000][1] = 6.0; A[100000][2] = 1.0;
A[100000][3] = 1.0; b[100000] = 19.0;
for( i=100001; i< 200000; i++ )
{
A[i][0] = (2323*i)%101111;
A[i][1] = (74*i)%101111;
A[i][2] = (2017*i)%101111;
A[i][3] = (7915*i)%101111;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + 2 + (i%89);
}
A[200000][0] = 5.0; A[200000][1] = 2.0; A[200000][2] = 0.0;
A[200000][3] = 1.0; b[200000] = 13.0;
for( i=200001; i< 300000; i++ )
{
A[i][0] = (23123*i)%100003;
A[i][1] = (47*i)%100003;
A[i][2] = (2011*i)%100003;
A[i][3] = (7919*i)%100003;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + 2 + (i%57);
}
A[300000][0] = 1.0; A[300000][1] = 2.0; A[300000][2] = 1.0;
A[300000][3] = 3.0; b[300000] = 20.0;
A[300001][0] = 1.0; A[300001][1] = 0.0; A[300001][2] = 5.0;
A[300001][3] = 4.0; b[300001] = 32.0;
A[300002][0] = 7.0; A[300002][1] = 1.0; A[300002][2] = 1.0;
A[300002][3] = 7.0; b[300002] = 40.0;
for( i=300003; i< 400000; i++ )
{
A[i][0] = (13*i)%103087;
A[i][1] = (99*i)%103087;
A[i][2] = (2012*i)%103087;
A[i][3] = (666*i)%103087;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + 1;
}
for( i=400000; i< 500000; i++ )
{
A[i][0] = 1;
A[i][1] = (17*i)%999983;
A[i][2] = (1967*i)%444443;
A[i][3] = 2;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + (1000000.0/(double)i);
}
for( i=500000; i< 600000; i++ )
{
A[i][0] = (3*i)%111121;
A[i][1] = (2*i)%999199;
A[i][2] = (2*i)%444443;
A[i][3] = i;
b[i] = A[i][0] + 2*A[i][1] + 3*A[i][2] + 4* A[i][3] + 1.3;
}
The problem is: it can't create such a large array, it just terminates at the run-time, BUT it works fine if I create no more than 200000 constraints.
I've tried to increase stack size to unlimited value, but it didn't help.
I've tried to use pointers like **A, but I get incorrect result in output.
P.S.
I use Ubuntu.
Any ideas?
If numberOfInequalities is a runtime constant, you could make it a #define and define A and b as global variables or static local variables:
#define numberOfInequalities 600000
static double A[numberOfInequalities][4], b[numberOfInequalities];
This will move these arrays from the 'stack' to the 'bss' segment.
A better solution is to allocate these arrays with malloc:
double (*A)[4] = malloc(numberOfInequalities * 4 * sizeof(double));
double *b = malloc(numberOfInequalities * sizeof(double));
This will cause these arrays to be allocated from the 'heap' memory.
Don't forget to free them before returning to the caller.
See http://www.geeksforgeeks.org/memory-layout-of-c-program/ for a brief explanation how memory is arranged in a typical C program
I`ve tried to implement dot product of this two arrays using AVX https://stackoverflow.com/a/10459028. But my code is very slow.
A and xb are arrays of doubles, n is even number. Can you help me?
const int mask = 0x31;
int sum =0;
for (int i = 0; i < n; i++)
{
int ind = i;
if (i + 8 > n) // padding
{
sum += A[ind] * xb[i].x;
i++;
ind = n * j + i;
sum += A[ind] * xb[i].x;
continue;
}
__declspec(align(32)) double ar[4] = { xb[i].x, xb[i + 1].x, xb[i + 2].x, xb[i + 3].x };
__m256d x = _mm256_loadu_pd(&A[ind]);
__m256d y = _mm256_load_pd(ar);
i+=4; ind = n * j + i;
__declspec(align(32)) double arr[4] = { xb[i].x, xb[i + 1].x, xb[i + 2].x, xb[i + 3].x };
__m256d z = _mm256_loadu_pd(&A[ind]);
__m256d w = _mm256_load_pd(arr);
__m256d xy = _mm256_mul_pd(x, y);
__m256d zw = _mm256_mul_pd(z, w);
__m256d temp = _mm256_hadd_pd(xy, zw);
__m128d hi128 = _mm256_extractf128_pd(temp, 1);
__m128d low128 = _mm256_extractf128_pd(temp, 0);
//__m128d dotproduct = _mm_add_pd((__m128d)temp, hi128);
__m128d dotproduct = _mm_add_pd(low128, hi128);
sum += dotproduct.m128d_f64[0]+dotproduct.m128d_f64[1];
i += 3;
}
There are two big inefficiencies in your loop that are immediately apparent:
(1) these two chunks of scalar code:
__declspec(align(32)) double ar[4] = { xb[i].x, xb[i + 1].x, xb[i + 2].x, xb[i + 3].x };
...
__m256d y = _mm256_load_pd(ar);
and
__declspec(align(32)) double arr[4] = { xb[i].x, xb[i + 1].x, xb[i + 2].x, xb[i + 3].x };
...
__m256d w = _mm256_load_pd(arr);
should be implemented using SIMD loads and shuffles (or at the very least use _mm256_set_pd and give the compiler a chance to do a half-reasonable job of generating code for a gathered load).
(2) the horizontal summation at the end of the loop:
for (int i = 0; i < n; i++)
{
...
__m256d xy = _mm256_mul_pd(x, y);
__m256d zw = _mm256_mul_pd(z, w);
__m256d temp = _mm256_hadd_pd(xy, zw);
__m128d hi128 = _mm256_extractf128_pd(temp, 1);
__m128d low128 = _mm256_extractf128_pd(temp, 0);
//__m128d dotproduct = _mm_add_pd((__m128d)temp, hi128);
__m128d dotproduct = _mm_add_pd(low128, hi128);
sum += dotproduct.m128d_f64[0]+dotproduct.m128d_f64[1];
i += 3;
}
should be moved out of the loop:
__m256d xy = _mm256_setzero_pd();
__m256d zw = _mm256_setzero_pd();
...
for (int i = 0; i < n; i++)
{
...
xy = _mm256_add_pd(xy, _mm256_mul_pd(x, y));
zw = _mm256_add_pd(zw, _mm256_mul_pd(z, w));
i += 3;
}
__m256d temp = _mm256_hadd_pd(xy, zw);
__m128d hi128 = _mm256_extractf128_pd(temp, 1);
__m128d low128 = _mm256_extractf128_pd(temp, 0);
//__m128d dotproduct = _mm_add_pd((__m128d)temp, hi128);
__m128d dotproduct = _mm_add_pd(low128, hi128);
sum += dotproduct.m128d_f64[0]+dotproduct.m128d_f64[1];