OpenMP | Parallelizing a loop with dependent iterations

OpenMP | Parallelizing a loop with dependent iterations - c

I am trying to parallelize an encoding function, I tried adding a simple pragma around the for but the result was wrong. I figured that iterations are dependent (by code variable) and therefore they can not be directly parallelized.
int encodePrimeFactorization(int number){
int code = 0;
for (int i=PF_NUMBER-1; i>=0 ; i--){
code = code * 2;
int f = prime_factors[i];
if (number % f == 0){
code = code + 1;
}
}
return code;
}
Is there a way to make code variable independent for each iteration?

Yes. At least for me, it's easier to think about that if you look at the algorithm this way:
int code = 0;
for (int i=PF_NUMBER-1; i>=0 ; i--) {
code = code << 1;
int f = prime_factors[i];
if (number % f == 0){
// The last bit of code is never set here,
// because it has been just shifted to the left
code = code | 1;
}
}
Now you can shift the set-bit already while setting:
int code = 0;
for (int i=PF_NUMBER-1; i>=0 ; i--) {
int f = prime_factors[i];
if (number % f == 0){
code = code | (1 << i);
}
}
Now it becomes a trivial reduction. Now you can shift the set-bit already while setting:
int code = 0;
#pragma omp parallel for reduction(|,code)
for (int i=PF_NUMBER-1; i>=0 ; i--) {
int f = prime_factors[i];
if (number % f == 0){
code |= (1 << i);
}
}
That said, you will not get any performance gain. This works only up to 31 bits, which is far too little work to benefit from parallelization overhead. If this is a hot part in your code, you have to find something around it to apply the parallelization.

Related

Efficiently print every x iterations in for loop

I am writing a program in which a certain for-loop gets iterated over many many times.
One single iteration doesn't take to long but since the program iterates the loop so often it takes quite some time to compute.
In an effort to get more information on the progress of the program without slowing it down to much I would like to print the progress every xth step.
Is there a different way to do this, than a conditional with a modulo like so:
for(int i = 0; i < some_large_number; i++){
if(i % x == 0)
printf("%f%%\r", percent);
//some other code
.
.
.
}
?
Thanks is advance

This code:
for(int i = 0; i < some_large_number; i++){
if(i % x == 0)
printf("%f%%\r", percent);
//some other code
.
.
.
}
can be restructured as:
/* Partition the execution into blocks of x iterations, possibly including a
final fragmentary block. The expression (some_large_number+(x-1))/x
calculates some_large_number/x with any fraction rounded up.
*/
for (int block = 0, i = 0; block < (some_large_number+(x-1))/x; ++block)
{
printf("%f%%\r", percent);
// Set limit to the lesser of the end of the current block or some_large_number.
int limit = (block+1) * x;
if (some_large_number < limit) limit = some_large_number;
// Iterate the original code.
for (; i < limit; ++i)
{
//some other code
}
}
With the following caveats and properties:
The inner loop has no more work than the original loop (it has no extra variable to count or test) and has the i % x == 0 test completely removed. This is optimal for the inner loop in the sense it reduces the nominal amount of work as much as possible, although real-world hardware sometimes has finicky behaviors that can result in more compute time for less actual work.
New identifiers block and limit are introduced but can be changed to avoid any conflicts with uses in the original code.
Other than the above, the inner loop operates identically to the original code: It sees the same values of i in the same order as the original code, so no changes are needed in that code.
some_large_number+(x-1) could overflow int.

I would do it like this:
int j = x;
for (int i = 0; i < some_large_number; i++){
if(--j == 0) {
printf("%f%%\r", percent);
j = x;
}
//some other code
.
.
.
}

Divide the some_large_number by x. Now loop for x times and nest it with the new integer and then print the percent. I meant this:
int temp = some_large_number/x;
for (int i = 0; i < x; i++){
for (int j = 0; j < temp; j++){
//some code
}
printf("%f%%\r", percent);
}

The fastest approach regarding your performance concern would be to use a nested loop:
unsigned int x = 6;
unsigned int segments = some_large_number / x;
unsigned int y;
for ( unsigned int i = 0; i < segments; i++ ) {
printf("%f%%\r", percent);
for ( unsigned int j = 0; j < x; j++ ) {
/* some code here */
}
}
// If some_large_number can´t be divided evenly through `x`:
if (( y = (some_large_number % x)) != 0 )
{
for ( unsigned int i = 0; i < y; i++ ) {
/* same code as inside of the former inner loop. */
}
}
Another example would be to use a different counting variable for the check to execute the print process by comparing that to x - 1 and reset the variable to -1 if it matches:
unsigned int x = 6;
unsigned int some_large_number = 100000000;
for ( unsigned int i = 0, int j = 0; i < some_large_number; i++, j++ ) {
if(j == (x - 1))
{
printf("%f%%\r", percent);
j = -1;
}
/* some code here */
}

finding subsets dicretely c

for (i = 0; i < (pow(2,n)-1); i++) {
x = binary_conversion(i);
for (j = (n-1); j > 0; j--) {
if (x == 0) {
M[i][j] = 0;
}
else {
M[i][j] = x % 10;
x = x / 10;
}
}
}
i want to print the subsets of a set, so for a set of n elements i get the value of 2^n. from 0 to 2^n, i'm converting the values to binary. and i am keeping the binary values in a matrice and as i go through the matrice, if the value is 1, i am printing the corresponding element of the original set.But while creating the matrice, it assigns the same binary value to two consecutive rows so at the end i can not even get half of the subsets. What do you think is wrong with the code?

Ah because you don't cover the LSB or the 0th element. for (j = (n-1); j >= 0; j--) You missed the =.
Also you have to know if j-th bit is set in i or not.
And instead of pow you can simply use (1<<n)[Equivalent to 2^n]
Your code is not readable. I will post the pseudocode.
for ( int i = 0; i<= (1<<numOfSetElmts)-1; i++)
{
//print Subset-i
for(int pos = 0; pos<=n-1;pos++)
if( i&(1<<pos) )
print Set[pos]
}
Why am I not using pow?
The pow function is implemented by an algorithm and uses floating point functions and values to compute the power value.
So power of floating point to the power n is not necessarily multiplying it repetitively n times. As a result you end up with some errors and execution is a bit slower too.
Bitwise is faster?
Yes it is. Even if modern implementation are making changes to the architecture as a whole but still you won't lose the performance by using bitwise. Most of them it will have better performance than addition operation, if not equal.

Your program have worst complexity. There are better solutions for this problem with minimum complexity. Anyway the problem of your code is put '<='
i <= pow(2,n)-1
Also you can use i < 1<<n Both work same but second one is better and faster. The same problem happens in the inner loop where you didn't put '=' sign. ie, j>=0 . Else the program was good.
The better solution for your problem may look like this.
void subsets(char A[], int N)
{
int i,j;
for( i = 0;i < (1 << N); ++i)
{
for( j = 0;j < N;++j)
if(i & (1 << j))
printf("%c ", A[j] );
printf("\n");
}
}
In this there is no external binary conversion or matrix needed.

Trying to turn the factorial part into another function

I have to begin my thanking you guys for the help. I am trying to turn the factorial part of the code into another function and was wondering if I needed to add everything within the
#include <stdio.h>
#include <stdlib.h>
int main()
{
int num;
int indx;
int arrayIndx;
int accumulator;
int fact;
int individualDigit[50];
int length;
for(indx = 99999; indx > 0; indx--)
{
num = indx;
for (length = 0; num > 0; length++)
{
individualDigit[length] = num % 10;
num /= 10;
}
accumulator = 0;
for (arrayIndx = 0; arrayIndx < length; arrayIndx++)
{
fact = 1;
while(individualDigit[arrayIndx] > 0)
{
fact*= individualDigit[arrayIndx];
individualDigit[arrayIndx]--;
}
accumulator += fact;
}
if(accumulator == indx)
{
printf("%d ", accumulator);
}
}
return 0;
}

You program is badly designed. It is not indented, you are using variable names index, indx and idex which is confusing for the reader and would lead to nightmares for long term maintenance. Also the factorial computation would deserve to be in a function for better modularity.
But apart from that, your program does what you ask, correctly computes factorials and adds them in the accumulator variable. The only problem is that you never print that accumulator except for the last 2 cases (2 and 1) where n = n!.
Simply replace :
if (accumulator == indx)
{
printf("\n%d\n", indx);
}
with
printf("\n%d\n", accumulator);
and you will see your results.
If you want to store the sum of factorials in an array, you just have to declare int sumOfFact[26] = {0}; just before int individualDigit[50]; to define the array and initialize sumOfFact[0] to 1, and then add sumOfFact[indx] = accumulator; just before printing the accumulator.
To put the factorial part in a function, it is quite simple. First declare it above your main:
int ffact(int n);
the define it anywhere in your code (eventually in another compilation unit - a .c file - if you want)
inf ffact(int n) {
fact = 1;
while (n > 1) {
fact *= n--;
/* if (fact < 0) { fprintf(stderr, "Overflow in ffact(%d)\n", n); return 0; } */
}
return fact
}
I commented out the test for overflow, because I assume you use at least 32 bits int and fact(9) will not overflow (but fact(13) would ...)
The loop computing the sum of factorials becomes:
accumulator = 0;
for (arrayIndx = 0; arrayIndx < length; arrayIndx++)
{
accumulator += ffact(individualDigit[arrayIndx]);
}
printf("\n%d\n", accumulator);
Advantages for that modularity: it is simpler to separately test the code for ffact. So when things go wrong, you have not to crawl among one simple piece of code of more than 40 lines (not counting the absent but necessaries comments). And the code no longers clutters the individualDigit array.

When is the gcc optimization most effective?

When I pass the -O3 option to compile C code with gcc, it usually reduces the running time around 10~30% compared to when compiled without optimization. Today I found that one of my program's running time is reduced remarkably, about 1/10, with the -O3 option. Without optimization it took about 7 seconds to complete. However the -O3 option makes in run in 0.7 seconds! I have never seen such incredible amount of time reduction.
So here I wonder what types of program patterns are more likely to benefit from the gcc optimization option, or perhaps are there some ways in programming to have optimization done more feasibly.
The 1/10 code is below. It is a simple program that calculates the sum of all primes numbers less than the macro constant MAXX, by using wheel factorization algorithm.
#include <stdio.h>
#include <math.h>
#include <inttypes.h>
#include <time.h>
#define MAXX 5000000
#define PBLEN 92160
#define PBMAX 510510
int main(){
clock_t startT, endT;
startT = clock();
int pArr[7] = {2, 3, 5, 7, 11, 13, 17};
int pBase[PBLEN];
pBase[0] = 1;
int i, j, k, index = 1;
for (i = 19; i <= PBMAX; ++i){
for (j = 0; j < 7; ++j){
if (i % pArr[j] == 0){
goto next1;
}
}
pBase[index] = i;
++index;
next1:;
}
uint64_t sum = 2 + 3 + 5 + 7 + 11 + 13 + 17;
for (i = 1; i < PBLEN; ++i){
for (j = 0; j < 7; ++j){
if (pArr[j] <= (int)sqrt((double)pBase[i]) + 1){
if (pBase[i] % pArr[j] == 0){
goto next2;
}
}
else{
sum += pBase[i];
goto next2;
}
}
for (j = 1; j < PBLEN; ++j){
if (pBase[j] <= (int)sqrt((double)pBase[i]) + 1){
if (pBase[i] % pBase[j] == 0){
goto next2;
}
}
else{
sum += pBase[i];
goto next2;
}
}
next2:;
}
int temp, temp2;
for (i = PBMAX; ; i += PBMAX){
for (j = 0; j < PBLEN; ++j){
temp = i + pBase[j];
if (temp > MAXX){
endT = clock();
printf("%"PRIu64"\n\n", sum);
printf("%.3f\n", (double)(endT - startT) / (double)CLOCKS_PER_SEC);
return 0;
}
for (k = 0; k < 7; ++k){
if (temp % pArr[k] == 0){
goto next3;
}
}
for (k = 1; k < PBLEN; ++k){
if (pBase[k] <= (int)sqrt((double)temp) + 1){
if (temp % pBase[k] == 0){
goto next3;
}
}
else{
sum += temp;
break;
}
}
next3:;
}
}
}

I'm going to guess how this could happen from looking at your code. I'm guessing the thing that takes the longest, by far, is sqrt(). You have tight loops where you're running sqrt() on a value that isn't changing. gcc probably decided to make a single call, save the return, and reuse that. Same results, far less calls to sqrt(), so significantly faster run time. You will probably see the same results manually if you move the calls to sqrt() out of your tight loops and run them less. Run a profiler to be certain though.
So, the short answer - sometimes gcc can fix pretty major issues that you could have fixed yourself, but only if they're small in scope. Usually you need to sit down with a profiler and see what's really taking up your time and how to work around the problem.

Controlled nested looping

I have the nested loops that nested (r=) 3 times. Each loop running for (n=) 5 times.
for (i=0; i<n; i++)
{
for (j=0; j<n; j++)
{
for (k=0; k<n; k++)
//
}
}
But how do we do the nesting dynamically at run time. Say we know it should be nested r times. Each loop running n times. I thought something like recursion but it goes indefinitely.
funloop (int r)
{
for (int i = 0; i < n; i++)
{
//
if (r < 3)
funloop (r++);
else
return;
}
}
Please let me know how this could be done? I couldn't find a sources online.

If you don't know the depth of the recursion statically, the most common approach is to use recursion to represent the looping. For example, suppose that you need to have d levels of nesting of loops that all need to iterate k times. Then you could implement that using recursion of this form:
void RecursivelyNestIterations(unsigned d, unsigned k) {
/* Base case: If the depth is zero, we don't need to iterate. */
if (d == 0) return;
/* Recursive step: If we need to loop d times, loop once, calling the
* function recursively to have it loop d - 1 times.
*/
for (unsigned i = 0; i < k; ++i) {
/* Recurse by looping d - 1 times using the same number of iterations. */
RecursivelyNestIterations(d - 1, k);
}
}
Hope this helps!

The simplest method is just to collapse it to one for loop:
for(i=0; i<pow(n, r); i++) {
}
That can however make it difficult to access the loop counters, if you need them, but that can be done mathematically. For example, the innermost loop counter variable value is given by :
int c = i % n;
You could have an array of such counters and determine the values with similar equations, or you can just increment them, when required, e.g.:
void iterate(int r, int n) {
int i, rc, *c = malloc(sizeof(int) * r);
memset(c, 0, sizeof(int) * r);
for(i = 0; i < pow(n, r); i++) {
// code here, using loop counters in the 'c' array, where c[0] is counter
// for the outer loop, and c[r - 1] is the counter for the innermost loop
// update the counters
rc = r;
while(rc > 0) {
rc--;
c[rc]++;
if(c[rc] == n) {
c[rc] = 0;
} else {
break;
}
}
}
free(c);
}

Just call if (r) funloop(r-1); in the loop body.

#include <stdlib.h>
#include <stdio.h>
static int n = 3;
void _funloop(int cur,int total)
{
if(cur!=total)
{
for(int cnt=0;cnt!=n;++cnt)
{
fprintf(stdout,"%d::%d\n",cur,cnt);
}
_funloop(cur+1,total);
}
}
void funloop(int total)
{
_funloop(0,total);
}
int main()
{
funloop(10);
return 0;
}

A solution that doesn't use recursion is discussed in this Tip
[link]http://www.codeproject.com/Tips/759707/Generating-dynamically-nested-loops
The code is in C++ and require # for the include and define statements
include <iostream>
define MAXROWS 9
define MAXVALUES 9
using namespace std;
char display[] = {'1','2','3','4','5','6','7','8','9'};
int main() {
int arrs[MAXROWS]; // represent the different variables in the for loops
bool status = false;
for (int r=0;r<MAXROWS;r++)
arrs[r] = 0; // Initialize values
while (!status) {
int total = 0;
// calculate total for exit condition
for (int r=0;r<MAXROWS;r++)
total +=arrs[r];
// test for exit condition
if (total == (MAXVALUES-1)*MAXROWS)
status = true;
// printing
for (int r=0;r<MAXROWS;r++)
cout << display[arrs[r]]; // print(arrs[r])
cout << endl; // print(endline)
// increment loop variables
bool change = true;
int r = MAXROWS-1; // start from innermost loop
while (change && r>=0) {
// increment the innermost variable and check if spill overs
if (++arrs[r] > MAXVALUES-1) {
arrs[r] = 0; // reintialize loop variable
// Change the upper variable by one
// We need to increment the immediate upper level loop by one
change = true;
}
else
change = false; // Stop as there the upper levels of the loop are unaffected
// We can perform any inner loop calculation here arrs[r]
r=r-1; // move to upper level of the loop
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

OpenMP | Parallelizing a loop with dependent iterations - c

Related

Efficiently print every x iterations in for loop

finding subsets dicretely c

Trying to turn the factorial part into another function

When is the gcc optimization most effective?

Controlled nested looping

Categories

Resources