Segmentation fault right at the end of the program - c

I have a problem with this code.
It works as expected, excepting that it gets Seg fault right at the end.
Here is the code:
void distribuie(int *nrP, pach *pachet, post *postas) {
int nrPos, k, i, j;
nrPos = 0;
for (k = 0; k < 18; k++)
pos[k].nrPac = 0;
for (i = 0; i < *nrP; i++) {
int distributed = 0;
for (j = 0; j < nrPos; j++)
if (pac[i].idCar == pos[j].id) {
pos[j].vec[pos[j].nrPac] = pac[i].id;
pos[j].nrPac++;
distributed = 1;
break;
}
if (distributed == 0) {
pos[nrPos].id = pac[i].idCar;
pos[nrPos].vec[0] = pac[i].id;
pos[nrPos].nrPac = 1;
nrPos++;
}
}
for (i = 0; i < nrPos; i++) {
printf("%d %d ", pos[i].id, pos[i].nrPac);
for (j = 0; j < pos[i].nrPac; j++)
printf("%d ", pos[i].vec[j]);
printf("\n");
}
}
and calling this function in main().
Running with gdb resulted in this error:
Program received signal SIGSEGV, Segmentation fault.
0x00000001 in ?? ()

If gdb can't find the stack trace, it means your code wrote over the stack so thoroughly that neither the normal C runtime nor gdb can find the information about where the function should return on the stack.
Or, in other words, you have a (major) stack overflow.
Somewhere, your code is writing out of bounds of an array. It is curious that the code posted references global variables pos and pac but is passed (unused) variables postas and pachet. It suggests that the code you're showing isn't the code you're executing. However, assuming that pos and pac are really spelled the same as postas and pachet, then it could be that you are mishandling the call to your distribuie() function. (If, as a comment suggests, pos and pac really are global variables, then why does the function get passed postas and pachet?)
Are you getting any compilation warnings? Have you enabled compilation warnings? If you've got GCC, does the code compile cleanly with -Wall? What about with -Wall -Wextra? If you're getting any warnings, fix the causes. Remember, at this stage in your career, it is probable that the C compiler knows more about C than you do.
You can help yourself with the debugging by printing key values (like *nrP) on entry to the function. If that isn't a sane value, you know where to start looking. You might also take a good look at the data for the line:
pos[j].vec[pos[j].nrPac] = pac[i].id;
There is lots of room there for things to go badly astray!

I lack information to completely help you: I don't know the size of the pos[] array. The loop with k<18 suggests it is 18 elements (but it could be less; I simply don't know). Then you start processing *nrP pachets, but you don't check that you process at most 18 of these. If there are more, you overwrite some other memory. Then you want to print the result et voila, a segmentation fault, meaning some memory got corrupted, is used by someone thinking it is a valid pionter, but the pointer is invalid and...bang - segfault.
So the for loop should at least check the bounds (assuming 18):
for (i = 0; i < *nrP && i < 18; i++) {
In the same way, the pos structure apparently has an array of vec, but its size is unknown and by the same reasoning can be 18, can be less or an be more:
pos[j].vec[pos[j].nrPac]
If you add all your bounds checks it will probably run.

Related

Paralellized execution in nested for using Cilk

I'm trying to implement a 2D-stencil algorithm that manipulates a matrix. For each field in the matrix, the fields above, below, left and right of it are to be added and divided by 4 in order to calculate the new value. This process may be iterated multiple times for a given matrix.
The program is written in C and compiles with the cilkplus gcc binary.
**Edit: I figured you might interested in the compiler flags:
~/cilkplus/bin/gcc -fcilkplus -lcilkrts -pedantic-errors -g -Wall -std=gnu11 -O3 `pkg-config --cflags glib-2.0 gsl` -c -o sal_cilk_tst.o sal_cilk_tst.c
Please note that the real code involves some pointer arithmetic to keep everything consistent. The sequential implementation works. I'm omitting these steps here to enhance understandability.
A pseudocode would look something like this (No edge case handling):
for(int i = 0; i < iterations; i++){
for(int j = 0; j < matrix.width; j++){
for(int k = 0; k < matrix.height; k++){
result_ matrix[j][k] = (matrix[j-1][k] +
matrix[j+1][k] +
matrix[j] [k+1] +
matrix[j] [k-1]) / 4;
}
}
matrix = result_matrix;
}
The stencil calculation itself is then moved to the function apply_stencil(...)
for(int i = 0; i < iterations; i++){
for(int j = 0; j < matrix.width; j++){
for(int k = 0; k < matrix.height; k++){
apply_stencil(matrix, result_matrix, j, k);
}
}
matrix = result_matrix;
}
and parallelization is attempted:
for(int i = 0; i < iterations; i++){
for(int j = 0; j < matrix.width; j++){
cilk_for(int k = 0; k < matrix.height; k++){ /* <--- */
apply_stencil(matrix, result_matrix, j, k);
}
}
matrix = result_matrix;
}
This version compiles without errors/warning, but just straight out produces a Floating point exception when executed. In case you are wondering: It does not matter which of the for loops are made into cilk_for loops. All configurations (except no cilk_for) produce the same error.
the possible other method:
for(int i = 0; i < iterations; i++){
for(int j = 0; j < matrix.width; j++){
for(int k = 0; k < matrix.height; k++){
cilk_spawn apply_stencil(matrix, result_matrix, j, k); /* <--- */
}
}
cilk_sync; /* <--- */
matrix = result_matrix;
}
This produces 3 warnings when compiled: i, j and k appear to be uninitialized.
When trying to execute, the function which executes the matrix = result_matrix; step appears to be undefined.
Now for the actual question: Why and how does Cilk break my sequential code; or rather how can I prevent it from doing so?
The actual code is of course available too, should you be interested. However, this project is for an university class and therefore subject to plagiarism from other students who find this thread which is why I would prefer not to share it publicly.
**UPDATE:
As suggested I attempted to run the algorithm with only 1 worker thread, effectively making the cilk implementation sequential. This did, surprisingly enough, work out fine. However as soon as I change the number of workers to two, the familiar errors return.
I don't think this behavior is caused by race-conditions though. Since the working matrix is changed after each iteration and cilk_sync is called, there is effectively no critical section. All threads do not depend on data written by others in the same iteration.
The next steps I will attempt is to try out other versions of the cilkplus compiler, to see if its maybe an error on their side.
With regards to the floating point exception in a cilk_for, there are some issues that have been fixed in some versions of the Cilk Plus runtime. Is it possible that you are using an outdated version?
https://software.intel.com/en-us/forums/intel-cilk-plus/topic/558825
Also, what were the specific warning messages that are produced? There are some "uninitialized variable" warnings that occur with older versions of Cilk Plus GCC, which I thought were spurious warnings.
The Cilk runtime uses a recursive divide and conquer algorithm to parallelize your loop. Essentially, it breaks the range in half, and recursively calls itself twice, spawning half and calling half.
As part of the initialization, it calculates a "grain size" which is the size of the minimum size it will break your range into. By default, that's loopRange/8P, where P is the number of cores.
One interesting experiment would be to set the number of Cilk workers to 1. When you do this, all of the cilk_for mechanism is excersized, but because there's only 1 worker, nothing gets stolen.
Another possibility is to try running your code under Cilkscreen - the Cilk race detector. Unfortunately only the cilkplus branch of GCC generates the annotations that Cilkscreen needs. Your choices are to use the Intel commpiler, or try using the cilkplus branch of GCC 4.9. Directions on how to pull down the code and build it are at the cilkplus.org website.

OpenMP gives (core dumped)

I have loop which I want to parallelize with OpenMP. when I compile with gcc -o prog prog.c -lm -fopenmp I get no errors. But when I execute it, I get segmentation fault(core dumped). The problem surely comes from the OpenMP commands because the program works when I delete the #pragma...
Here is the parallel loop:
ix = (i-1)%ILIGNE+1;
iy = (i-1)/ILIGNE+1;
k = 1;
# pragma omp parallel for private(j,jx,jy,r,R,voisin) shared(NTOT,k,i,ix,iy) num_threads(2) schedule(auto)
for(j = 1;j <= NTOT;j++){
if(j != i){
jx = (j-1)%ILIGNE+1;
jy = (j-1)/ICOLONE+1;
r[k][0] = (jx-ix)*a;
r[k][1] = (jy-iy)*a;
R[k] = sqrt(pow(r[k][0],2.0)+pow(r[k][1],2.0));
voisin[k] = j;
k++;
}
}
I tried to change the stack size to unlimited but it doesn't fix the problem. Please tell me if it is about a memory leak or a race condition or something else? and thank you for your help
As a side note, be careful when you make an array private.
If you allocated it as a static array
e.g.
int R[5] or something similar then that's fine, each thread gets its own personal copy :).
If you malloc these however
e.g.:
int R = malloc(5*sizeof(int));
then it will act as a shared array regardless of whether you define it as private (which could potentially lead to undefined behaviour, segfaults, jibberish in the array etc).
I'm not sure what your code does, but I'm pretty sure the OpenMP version is wrong. Indeed, you parallelised over the j loop, but the heart of your algorithm revolves around the k, which is loosely derived from j and i (which is not presented here BTW).
So when you distribute your j indexes across your OpenMP threads, they all start from a different value of j, but all from the same value of k which is shared. From that, k is incremented quite randomly and the accesses to the various arrays using k are very very likely to generate segmentation faults.
Moreover, arrays r, R and voisin shouldn't be declared private if one want the parallelisation to have any effect.
Finally, C loops like this for(j = 1;j <= NTOT;j++) look utterly suspicious to me for off-by-one accesses... Shouldn't that be rather for(j = 0;j < NTOT;j++)? (just mentioning this since the initial value of k is 1 as well...)
Bottom line is that you'd probably better define k from j's value with k = j<i ? j : j-1 instead of trying to increment it inside the code.
Assuming all the rest is correct, this might be a valid version:
ix = (i-1)%ILIGNE+1;
iy = (i-1)/ILIGNE+1;
# pragma omp parallel for private(j,jx,jy,k) num_threads(2) schedule(auto)
for(j = 1;j <= NTOT;j++){
if(j != i){
k = j<i ? j : j-1;
jx = (j-1)%ILIGNE+1;
jy = (j-1)/ICOLONE+1;
r[k][0] = (jx-ix)*a;
r[k][1] = (jy-iy)*a;
R[k] = sqrt(pow(r[k][0],2.0)+pow(r[k][1],2.0));
voisin[k] = j;
}
}
Still, be careful with the C indexing from 0 to size-1, not from 1 to size...

Debugging C code with gdb

This is a homework assignment, I just want help with gdb, not specific answers.
I have no experience with gdb whatsoever and little terminal experience. I followed a simple example online to debug some code using gdb but in the example gdb pointed out that a problem happened when it ran the code. When I try to mimic the process for this assignment gdb doesn't say anything. I am still somewhat new to C, but I can see problems when I look at the code and gdb isn't saying anything.
Say the file is named test.c, in the terminal I type gcc test.c and it gives me a warning because printf() is there but #include <stdio.h> is not, which is good because that is supposed to be wrong.
It also produces a.out and if I run it in the terminal with ./a.out nothing happens. The terminal just is ready for my next input with no messages. If I type gdb ./a.out and then run it just tells me the program exited normally.
Can someone point out what I have to do to make gdb point to the errors please?
// insertion sort, several errors
int X[10], // input array
Y[10], // workspace array
NumInputs, // length of input array
NumY = 0; // current number of
// elements in Y
void GetArgs(int AC, char **AV) {
int I;
NumInputs = AC - 1;
for (I = 0; I < NumInputs; I++) X[I] = atoi(AV[I+1]);
}
void ScootOver(int JJ) {
int K;
for (K = NumY-1; K > JJ; K++) Y[K] = Y[K-1];
}
void Insert(int NewY) {
int J;
if (NumY = 0) { // Y empty so far,
// easy case
Y[0] = NewY;
return;
}
// need to insert just before the first Y
// element that NewY is less than
for (J = 0; J < NumY; J++) {
if (NewY < Y[J]) {
// shift Y[J], Y[J+1],... rightward
// before inserting NewY
ScootOver(J);
Y[J] = NewY;
return;
}
}
}
void ProcessData() {
// insert new Y in the proper place
// among Y[0],...,Y[NumY-1]
for (NumY = 0; NumY < NumInputs; NumY++) Insert(X[NumY]);
}
void PrintResults() {
int I;
for (I = 0; I < NumInputs; I++) printf("%d\n",Y[I]);
}
int main(int Argc, char ** Argv) {
GetArgs(Argc,Argv);
ProcessData();
PrintResults();
}
Edit: The code is not mine, it is part of the assignment
There are different kinds of errors. Some can be detected by programs (the compiler, the OS, the debugger), and some cannot.
The compiler is required (by the C standard) to issue errors if it detects any constraint violations. It may issue other errors and warnings when not in standards compliance mode. The compiler will give you more error diagnostics if you add the -Wall and -Wextra options. The compiler may be able to detect even more errors if you enable optimizations (-O0 through -O3 set different levels of optimization), but you may want to skip optimizations if you want to single-step in the debugger, because the optimizer will make it harder for the debugger to show you the relevant source-lines (some may be re-ordered, some may be eliminated).
The operating system will detect errors involving traversing bad pointers (usually), or bad arguments to system calls, or (usually) floating-point division by zero.
But anything that doesn't crash the program is a semantic error. And these require a human brain to hunt for them.
So, as Brian says, you need to set breakpoints and single-step through the program. And, as jweyrich says, you need to compile the program with -g to add debugging symbols.
You can inspect variables with print (eg. print Argc will tell you how many command-line arguments were on the run line). And display will add variables to a list that is displayed just before each prompt. If I were debugging through that for-loop in Insert, I'd probably do display J and display Y[J], next, and then hit enter a bunch of times watching the calculation progress.
If your breakpoint is deeply nested, you can get a "stack dump" with backtrace.
next will take you to the next statement (following the semicolon). step will take you into function calls and to the first statement of the function. And remember: if you're single-stepping through a function and get to the 'return' statement, use step to enter the next function call in the calling statement; use next at the return to finish the calling statement (and just execute any remaining function calls in the statement, without prompting). You may not need to know this bit just yet, but if you do, there you go.
From gdb, do break main, then run.
From there, next or step until you find where you went wrong.

Bakery Lock when used inside a struct doesn't work

I'm new at multi-threaded programming and I tried to code the Bakery Lock Algorithm in C.
Here is the code:
int number[N]; // N is the number of threads
int choosing[N];
void lock(int id) {
choosing[id] = 1;
number[id] = max(number, N) + 1;
choosing[id] = 0;
for (int j = 0; j < N; j++)
{
if (j == id)
continue;
while (1)
if (choosing[j] == 0)
break;
while (1)
{
if (number[j] == 0)
break;
if (number[j] > number[id]
|| (number[j] == number[id] && j > id))
break;
}
}
}
void unlock(int id) {
number[id] = 0;
}
Then I run the following example. I run 100 threads and each thread runs the following code:
for (i = 0; i < 10; ++i) {
lock(id);
counter++;
unlock(id);
}
After all threads have been executed, the result of the shared counter is 10 * 100 = 1000 which is the expected value. I executed my program multiple times and the result was always 1000. So it seems that the implementation of the lock is correct. That seemed weird based on a previous question I had because I didn't use any memory barriers/fences. Was I just lucky?
Then I wanted to create a multi-threaded program that will use many different locks. So I created this (full code can be found here):
typedef struct {
int number[N];
int choosing[N];
} LOCK;
and the code changes to:
void lock(LOCK l, int id)
{
l.choosing[id] = 1;
l.number[id] = max(l.number, N) + 1;
l.choosing[id] = 0;
...
Now when executing my program, sometimes I get 997, sometimes 998, sometimes 1000. So the lock algorithm isn't correct.
What am I doing wrong? What can I do in order to fix it?
Is it perhaps a problem now that I'm reading arrays number and choosing from a struct
and that's not atomic or something?
Should I use memory fences and if so at which points (I tried using asm("mfence") in various points of my code, but it didn't help)?
With pthreads, the standard states that accessing a varable in one thread while another thread is, or might be, modifying it is undefined behavior. Your code does this all over the place. For example:
while (1)
if (choosing[j] == 0)
break;
This code accesses choosing[j] over and over while waiting for another thread to modify it. The compiler is entirely free to modify this code as follows:
int cj=choosing[j];
while(1)
if(cj == 0)
break;
Why? Because the standard is clear that another thread may not modify the variable while this thread may be accessing it, so the value can be assumed to stay the same. But clearly, that won't work.
It can also do this:
while(1)
{
int cj=choosing[j];
if(cj==0) break;
choosing[j]=cj;
}
Same logic. It is perfectly legal for the compiler to write back a variable whether it has been modified or not, so long as it does so at a time when the code could be accessing the variable. (Because, at that time, it's not legal for another thread to modify it, so the value must be the same and the write is harmless. In some cases, the write really is an optimization and real-world code has been broken by such writebacks.)
If you want to write your own synchronization functions, you have to build them with primitive functions that have the appropriate atomicity and memory visibility semantics. You must follow the rules or your code will fail, and fail horribly and unpredictably.

Seg Fault when initializing array

I'm taking a class on C, and running into a segmentation fault. From what I understand, seg faults are supposed to occur when you're accessing memory that hasn't been allocated, or otherwise outside the bounds. 'Course all I'm trying to do is initialize an array (though rather large at that)
Am I simply misunderstanding how to parse a 2d array? Misplacing a bound is exactly what would cause a seg fault-- am I wrong in using a nested for-loop for this?
The professor provided the clock functions, so I'm hoping that's not the problem. I'm running this code in Cygwin, could that be the problem? Source code follows. Using c99 standard as well.
To be perfectly clear: I am looking for help understanding (and eventually fixing) the reason my code produces a seg fault.
#include <stdio.h>
#include <time.h>
int main(void){
//first define the array and two doubles to count elapsed seconds.
double rowMajor, colMajor;
rowMajor = colMajor = 0;
int majorArray [1000][1000] = {};
clock_t start, end;
//set it up to perform the test 100 times.
for(int k = 0; k<10; k++)
{
start=clock();
//first we do row major
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j<1000; j++)
{
majorArray[i][j] = 314;
}
}
end=clock();
rowMajor+= (end-start)/(double)CLOCKS_PER_SEC;
//at this point, we've only done rowMajor, so elapsed = rowMajor
start=clock();
//now we do column major
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j<1000; j++)
{
majorArray[j][i] = 314;
}
}
end=clock();
colMajor += (end-start)/(double)CLOCKS_PER_SEC;
}
//now that we've done the calculations 100 times, we can compare the values.
printf("Row major took %f seconds\n", rowMajor);
printf("Column major took %f seconds\n", colMajor);
if(rowMajor<colMajor)
{
printf("Row major is faster\n");
}
else
{
printf("Column major is faster\n");
}
return 0;
}
Your program works correctly on my computer (x86-64/Linux) so I suspect you're running into a system-specific limit on the size of the call stack. I don't know how much stack you get on Cygwin, but your array is 4,000,000 bytes (with 32-bit int) - that could easily be too big.
Try moving the declaration of majorArray out of main (put it right after the #includes) -- then it will be a global variable, which comes from a different allocation pool that can be much bigger.
By the way, this comparison is backwards:
if(rowMajor>colMajor)
{
printf("Row major is faster\n");
}
else
{
printf("Column major is faster\n");
}
Also, to do a test like this you really ought to repeat the process for many different array sizes and shapes.
You are trying to grab 1000 * 1000 * sizeof( int ) bytes on the stack. This is more then your OS allows for the stack growth. If on any Unix - check the ulimit -a for max stack size of the process.
As a rule of thumb - allocate big structures on the heap with malloc(3). Or use static arrays - outside of scope of any function.
In this case, you can replace the declaration of majorArray with:
int (*majorArray)[1000] = calloc(1000, sizeof majorArray);
I was unable to find any error in your code, so I compiled it and run it and worked as expected.
You have, however, a semantic error in your code:
start=clock();
//set it up to perform the test 100 times.
for(int k = 0; k<10; k++)
{
Should be:
//set it up to perform the test 100 times.
for(int k = 0; k<10; k++)
{
start=clock();
Also, the condition at the end should be changed to its inverse:
if(rowMajor<colMajor)
Finally, to avoid the problem of the os-specific stack size others mentioned, you should define your matrix outside main():
#include <stdio.h>
#include <time.h>
int majorArray [1000][1000];
int main(void){
//first define the array and two doubles to count elapsed seconds.
double rowMajor, colMajor;
rowMajor = colMajor = 0;
This code runs fine for me under Linux and I can't see anything obviously wrong about it. You can try to debug it via gdb. Compile it like this:
gcc -g -o testcode test.c
and then say
gdb ./testcode
and in gdb say run
If it crashes, say where and gdb tells you, where the crash occurred. Then you now in which line the error is.
The program is working perfectly when compiled by gcc, & run in Linux, Cygwin may very well be your problem here.
If it runs correctly elsewhere, you're most likely trying to grab more stack space than the OS allows. You're allocating 4MB on the stack (1 mill integers), which is way too much for allocating "safely" on the stack. malloc() and free() are your best bets here.

Resources