Null distribution of Rank tests - c

I need to compute the distribution of a test statistic in C. The test statistic is based on ranks. So rather than generating observations and rank them I think I can use natural numbers and there all possible distinct combinations for my computation. So i have written a code in C. But it is not displaying the expected output, just some symbols I suppose. When I ran it in an online compiler it displayed a Segmentation fault. Please help me rectify the errors. Any suggestions will be appreciated. The following is the code that I have written.
#include<stdio.h>
#include<conio.h>
void main()
{
int i1,i2,i3,j1,j2,j3,i,k,p,m1,n1,combo1,combo2,combo3;
int fcombo1=0, fcombo2=0, fcombo3=0;
m1=3;
n1=3;
p=1;
int factorial(int n,int r);
for(i1=1;i1<=6;i1++)
for(i2=i1+1;i2<=6;i2++)
for(i3=i2+1;i3<=6;i3++)
for(j1=1;j1<=6;j1++)
if(j1!=i1&&j1!=i2&&j1!=i3)
for(j2=1;j2<=6;j2++)
if(j2!=i1&&j2!=i2&&j2!=i3&&j2>j1)
for(j3=1;j3<=6;j3++)
if(j3!=i1&&j3!=i2&&j3!=i3&&j3>j2)
{
for(i=1;i<=3;i++)
for(k=0;k<=1;k++)
{
if(i==1)
{
combo1=factorial(i-1,1)*factorial(m1-1,p)*factorial(i1-i,p-k)*factorial(n1-i1+i,p+k+1);
fcombo1=fcombo1+combo1;
}
else if(i==2)
{
combo2=factorial(i-1,p)*factorial(m1-1,p)*factorial(i2-i,p-
k)*factorial(n1-i2+i,p+k+1);
fcombo2=fcombo2+combo2;
}
else if(i==3)
{
combo3=factorial(i-1,p)*factorial(m1-1,p)*factorial(i3-i,p-
k)*factorial(n1-i3+i,p+k+1);
fcombo3=fcombo3+combo3;
}
printf("%3d%3d%3d%3d%3d%3d%3d%3d%3d\n",i1,i2,i3,j1,j2,j3,
fcombo1,fcombo2,fcombo3);
}
}
getch();
}
int factorial(int n,int r)
{
if(n<r)
return(0);
else if(n==r)
return(1);
else if(r==1)
return(n);
else
return(factorial(n-1,r)+factorial(n-1,r-1));
}
Output:
1 2 3 4 5 6 0 0 0
1 2 3 4 5 6 0 0 0
1 2 3 4 5 6 0 0 0
1 2 3 4 5 6 0 2 0
1 2 3 4 5 6 0 2 0
1 2 3 4 5 6 0 2 4
1 2 4 3 5 6 0 2 4
1 2 4 3 5 6 0 2 4
1 2 4 3 5 6 0 2 4
1 2 4 3 5 6 0 4 4
1 2 4 3 5 6 0 4 8
Segmentation fault

First of all, the way you design the program uses 10 levels of for loops, it is usually a terrible idea to do that, think about how you can refactor it.
Second, for the specific problem, it lies in the factorial function, if r goes to less than 0, e.g. for the case of factorial(1, 0), the function will recurse infinitely, because none of the termination condition matches. As a result, the program aborted by segfault because of stack overflow in the function. So depending on how you need to calculate, add a terminate condition in the function, for example:
int factorial(int n,int r)
{
if(n<r)
return(0);
else if(n==r)
return(1);
else if(r<=1)
return(n);
else
return(factorial(n-1,r)+factorial(n-1,r-1));
}
or
if (r <= 0) abort(); // this should never happen

I just wanted to know why i am getting a segmentation fault....
I guess the reason is because of Stack Overflow. You are calling a recursive function factorial within a 8 levels of nesting. so this will be in the millions of calls.
Each time you call a function, some memory is allocated in the stack. Over millions of nested calls, the stack will overflow and cause a segmentation fault

Related

Program skipping some lines when trying to read

I'm trying to read this kinda of file
0 //Lonely number that represents the turn of a game
0 0 3 1 2 2
0 1 0 0 3 0
0 2 9 2 0 1
0 3 11 2 3 1
1
4 6 7 8 5 6
3 5 6 7 7 8
1 4 5 6 9 6
4 5 7 6 5 4
but in my code, everytime i try to detect a new line, the program just take a complety random line and write
while( fgets(line,255,*save) != NULL){
printf("The line says: %s",line);
if(line[0] =='\n'){
//the pointer is because the file was passed to a function
fscanf(*save, "%*[^\n]\n"); //To skip this new line and get the one integer
fscanf(*save,"%d",&turno); //get the lonely number
printf("Turno: %d\n",turno);
getch();
}
fscanf(*save,"%d %d %d %d %d %d",&player,&pilha,&id,&x,&y,&qtd); //Get the integers from each line
printf("%d %d %d %d %d %d\n",player,pilha,id,x,y,qtd);
}

Standard C function is slower at the first call, how to solve this properly?

I want to make timing tests for learning how to benchmark using "time.h". But I noticed the first test is always longer.
0 1 2 3 4 5 6 7 8 9
time 0.000138
0 1 2 3 4 5 6 7 8 9
time 0.000008
0 1 2 3 4 5 6 7 8 9
time 0.000007
If I want to do several tests in the same main() function the results will be unreliable.
Here is the stupid code who prints the output above.
#include <stdio.h>
#include <time.h>
const int COUNT = 10;
void test() {
clock_t start = clock();
for(int i = 0; i < COUNT; i++) {
printf("%d ", i);
}
printf("\ntime %lf\n", (double)(clock() - start) / (double)CLOCKS_PER_SEC );
}
int main() {
test();
test();
test();
return 0;
}
I solved this by ignoring the first "test" function. Also, writing a first "printf" who prints some integer before the tests works too. But I guess it's not a proper solution.
CPU has cache. When code and data are not in cache, the code takes longer to run.
It's standard practice to discard the result of first run (or first few runs) when measuring performance. It's sometimes called "cache warmup".

Trouble with extra zero in bubble sort in C

I am trying to sort, say, 10 integers in ascending order using bubble sort in C.
This is the code I am using:
#include<stdio.h>
void main()
{
int x[20],i,j;
float temp;
printf("Enter 10 values: \n");
for(i=0;i<10;i++)
{
printf("x[%d]: ",i+1);
scanf("%d",&x[i]);
}
for(i=0;i<10-1;i++)
{
for(j=0;j<=10-i-1;j++)
{
if(x[j]>x[j+1])
{
temp=x[j];
x[j]=x[j+1];
x[j+1]=temp;
}
}
}
printf("The sorted list in ascending order is \n");
for(i=0;i<10;i++)
{
printf("%5d",x[i]);
}
}
The problem is that I am getting an extra zero as output despite giving only non-zero entries as my 10 integers.
This is the input and the corresponding output I am getting. Note that the second output gives a zero and the value 19 has disappeared from the sorted list:
Enter 10 values:
x[1]: 4
x[2]: 2
x[3]: 7
x[4]: 4
x[5]: 8
x[6]: 2
x[7]: 3
x[8]: 9
x[9]: 13
x[10]: 19
The sorted list in ascending order is
2 0 2 3 4 4 7 8 9 13
--------------------------------
Process exited after 44.89 seconds with return value 5
Press any key to continue . . .
I am unable to locate my precise mistake.
for(i=0;i<10-1;i++)
{
for(j=0;j<10-i-1;j++)
{
if(x[j]>x[j+1])
{
temp=x[j];
x[j]=x[j+1];
x[j+1]=temp;
}
}
}
the bug is when i = 0 then inner loop condition is j<=10-0-1=9, then you compare a[j] and a[j+1], but a[j+1] could be a[10], the array starts from 0 to 19, and you only initialized the first 10 integers(0-9), the left 10 integers(10-19) is 0, so there will be an extra 0 in your result.
change j<=10-i-1 to j<10-i-1, and the code will run as you expect.

MPI IO - MPI_FIle_seek - How to calculate correct offset size for shared file

Update. Now the junk is removed from the end of the shared file, but ther is still som "junk" in the middle of the file where process 0 ends writing and process 1 starts writing:
10 4 16 16 0 2 2 3 1 3 4 2 4 5 1 0 4 6 2 8 5 3 8 10 4 9 5 4 ^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#10 4 16 16 1 2 6 0 3 5 2 2 2 8 1 5 6 5 6 6 4 8 9 7 6 2 1 3 6 4 10 2 5 7 7 6 10 6 5 9 9 10 6 7 5 8
However if i count the the jiberish, i get to 40. When i try to do;
offset = (length-40)*my_rank;
It works, but it is not a very scalable and robust solution. Therfor i need to compute this number for a more generell solution. Does anybody see how can be done, here is my current function:
#define MAX_BUFF 50
int write_parallel(Context *context, int num_procs, int my_rank, MPI_Status status){
int written_chars = 0;
int written_chars_accumulator = 0;
int n = context->n;
void * charbuffer = malloc(n*MAX_BUFF);
if (charbuffer == NULL) {
exit(1);
}
MPI_File file;
MPI_Offset offset;
MPI_File_open(MPI_COMM_WORLD,"test_write.txt",
MPI_MODE_CREATE|MPI_MODE_WRONLY,
MPI_INFO_NULL, &file);
written_chars = snprintf((char *)charbuffer, n*MAX_BUFF, "%d %d %d %d\n", n, context->BOX_SIDE, context->MAX_X, context->MAX_Y);
if (written_chars < 0){ exit(1); }
written_chars_accumulator += written_chars;
int i,j;
for(i=0;i<n;i++){
if(context->allNBfrom[i]>0){
written_chars = snprintf((char *)charbuffer+written_chars_accumulator, (n*MAX_BUFF - written_chars_accumulator), "%d %d %d ", i, context->x[i], context->y[i]);
if (written_chars < 0){ exit(1); }
written_chars_accumulator += written_chars;
for(j=0;j<context->allNBfrom[i];j++){
written_chars = snprintf((char *)charbuffer+written_chars_accumulator, (n*MAX_BUFF - written_chars_accumulator), "%d ", context->delaunayEdges[i][j]);
if (written_chars < 0){ exit(1); }
written_chars_accumulator += written_chars;
}
written_chars = snprintf((char *)charbuffer+written_chars_accumulator, (n*MAX_BUFF - written_chars_accumulator), "\n");
if (written_chars < 0){ exit(1); }
written_chars_accumulator += written_chars;
}
}
int length = strlen((char*)charbuffer);
offset = (length-40)*my_rank; //Why is this correct? the constant = 40 needs to be computet in some way...
//printf("proc=%d:\n%s",my_rank,charbuffer);
MPI_File_seek(file,offset,MPI_SEEK_SET);
MPI_File_write(file,charbuffer,length,MPI_CHAR,&status);
MPI_File_close(&file);
return 0;
}
Her is my current result, with this solution which is also correct: 10 4 16 16 0 2 2 3 1 3 4 2 4 5 1 0 4 6 2 8 5 3 8 10 4 9 5 4 10 4 16 16 1 2 6 0 3 5 2 2 2 8 1 5 6 5 6 6 4 8 9 7 6 2 1 3 6 4 10 2 5 7 7 6 10 6 5 9 9 10 6 7 5 8
But it will not scale because, I dont know how to compute number of jiberish elemtens. Does anybody have a clue ?
If I understand your code your goal is to remove the NULL-chars in between your text blocks. In this parallel writing approach there is no way to solve this without violating the safe boundaries of your buffers. None of the threads now how long the output of the other threads is going to be in advance. This makes it hard (/impossible) to have dynamic ranges for the write offset.
If you shift your offset then you will be writing in a aria not reserved for that thread and the program could overwrite data.
In my opinion there are two solutions to your problem of removing the nulls from the file:
Write separate files and concatenate them after the program is finished.
Post-process your output-file with a program that reads/copy chars form your output-file and skips NULL-bytes (and saves the result as a new file).
offset = (length-40)*my_rank; //Why is this correct? the constant = 40 needs to be computet in some way...
The way you compute this is with MPI_Scan. As others have pondered, you need to know how much data each process will contribute.
I'm pretty sure I've answered this before. Adapted to your code, where each process has computed a 'length' of some string:
length = strlen(charbuffer);
MPI_Scan(&length, &new_offset, 1, MPI_LONG_LONG_INT,
MPI_SUM, MPI_COMM_WORLD);
new_offset -=length; /* MPI_Scan is inclusive, but that
also means it has a defined value on rank 0 */
MPI_File_write_at_all(fh, new_offset, charbuffer, length, MPI_CHAR, &status);
The important feature of MPI_Scan is that it runs through your ranks and applies an operation (in this case, SUM) over all the preceding ranks. Afte r the call, Rank 0 is itself. Rank 1 holds the sum of itself and rank 0; rank 2 holds the sum of itself, rank 1 and rank 0... and so on.

How to resolve "return 0 or return 1" within a for loop in open mp?

This is what I have done so far to resolve the return 1;, return 0;, it is actually a sudoku solver using backtracking algorithm, so I am trying to parallelize it, but I cant get the complete result. (correct me if my implementation is wrong)
what actually happen?
anybody can help?!
this is the site i refer to, i used to follow their way : http://www.thinkingparallel.com/2007/06/29/breaking-out-of-loops-in-openmp/#reply
int solver (int row, int col)
{
int i;
boolean flag = FALSE;
if (outBoard[row][col] == 0)
{
#pragma omp parallel num_threads(2)
#pragma omp parallel for //it works if i remove this line
for (i = 1; i < 10; i++)
{
if (checkExist(row, col, i)) //if not, assign number i to the empty cell
outBoard[row][col] = i;
#pragma omp flush (flag)
if (!flag)
{
if (row == 8 && col == 8)
{
//return 1;
flag = TRUE;
#pragma omp flush (flag)
}
else if (row == 8)
{
if (solver(0, col+1))
{
//return 1;
flag = TRUE;
#pragma omp flush (flag)
}
}
else if (solver(row+1, col))
{
//return 1;
flag = TRUE;
#pragma omp flush (flag)
}
}
}
if (flag) { return 1; }
if (i == 10)
{
if (outBoard[row][col] != inBoardA[row][col])
outBoard[row][col] = 0;
return 0;
}
}
else
{
if (row == 8 && col == 8)
{
return 1;
}
else if (row == 8)
{
if (solver(0,col+1)) return 1;
}
else
{
if (solver(row+1,col)) return 1;
}
return 0;
}
}
5 0 0 0 0 3 7 0 0
7 4 6 1 0 2 3 0 0
0 3 8 0 9 7 5 0 2
9 7 0 4 0 0 2 0 0
0 0 2 0 0 0 4 0 0
0 0 4 0 0 5 0 1 9
4 0 3 2 7 0 9 8 0
0 0 5 3 0 9 6 7 4
0 0 7 5 0 0 0 0 3
Sudoku solved :
5 2 9 8 0 3 7 4 1
7 4 6 1 5 2 3 9 0
1 3 8 0 9 7 5 6 2
9 7 0 4 1 0 2 3 6
0 1 2 9 6 0 4 5 8
3 6 4 7 8 5 0 1 9
4 0 3 2 7 6 9 8 5
2 8 5 3 0 9 6 7 4
6 9 7 5 4 8 1 2 3
The //return 1; is the original serial code, since return is not allowed in the parallel for, so I used #pragma opm flush to eliminate it, but the result is not complete, it still left few empty grids in the sudoku.
Thanks for answering :>
First, since solver is called recursively, it doubles the number of threads with each level of recursion. I don't think it's what you intended to do.
Edit: This is only true if nested parallelism is enabled with omp_set_nested(), and by default it is not. So only the first call of solver will fork.
#pragma omp parallel num_threads(2)
#pragma omp parallel for
in your code tries to create one parallel region within another, and it will cause the loop that follows to execute twice, because outer parallel already created two threads. This should be replaced with
#pragma omp parallel num_threads(2)
#pragma omp for
or equivalent #pragma omp parallel for num_threads(2).
Second, this code:
if (checkExist(row,col,i))//if not, assign number i to the empty cell
outBoard[row][col] = i;
creates a race condition, with both threads writing different values to the same cell in parallel. You might want to create a separate copy of the board for each thread to work with.
Another code part,
if (outBoard[row][col] != inBoardA[row][col])
outBoard[row][col] = 0;
seems to be outside the parallel region, but in nested calls to solver it's also executed in parallel in different threads created by the outer-most solver.
Final(e) (18.09) Anyway, even if you manage to debug/change your code to run correctly in parallel (as I myself did just for the heck of it - i'll try to provide the code if anyone is interested, which I doubt), the outcome will be that parallel execution of this code doesn't give you much advantage. The reason to my mind is as follows:
Imagine when solver iterates over 9 possible cell values it creates 9 branches of execution. If you create 2 threads using OpenMP, it will distribute top-level branches between threads in some way, say 5 branches executed by one thread and 4 by another, and in each thread branches will be executed consecutively, one by one. If initial sudoku state is valid, only one branch will lead to correct solution. Other branches will be cut short when they'll encounter discrepancy in the solution, so some will take longer and some will take shorter time to run, while the branch leading to correct solution will run the longest. You cannot predict what branches will take what time to execute, so there is no way to reasonably balance the workload among the threads. Even if you use OpenMP dynamic scheduling, chances are that while one thread executes the longest branch, other thread(s) will already finish all other branches and will wait for the last branch, because there are too little branches (so dynamic scheduling will be of little help).
Since creating threads and synchronizing data between them incurs some substantial overhead (compared to sequential solver running time of 0.01-10 ms), you'll see parallel execution time somewhat longer or shorter than sequential, depending on an input.
In any case, if the sequential solver running time is under 10 ms, why do you want to make it parallel?

Resources