I am writing a program in which a certain for-loop gets iterated over many many times.
One single iteration doesn't take to long but since the program iterates the loop so often it takes quite some time to compute.
In an effort to get more information on the progress of the program without slowing it down to much I would like to print the progress every xth step.
Is there a different way to do this, than a conditional with a modulo like so:
for(int i = 0; i < some_large_number; i++){
if(i % x == 0)
printf("%f%%\r", percent);
//some other code
.
.
.
}
?
Thanks is advance
This code:
for(int i = 0; i < some_large_number; i++){
if(i % x == 0)
printf("%f%%\r", percent);
//some other code
.
.
.
}
can be restructured as:
/* Partition the execution into blocks of x iterations, possibly including a
final fragmentary block. The expression (some_large_number+(x-1))/x
calculates some_large_number/x with any fraction rounded up.
*/
for (int block = 0, i = 0; block < (some_large_number+(x-1))/x; ++block)
{
printf("%f%%\r", percent);
// Set limit to the lesser of the end of the current block or some_large_number.
int limit = (block+1) * x;
if (some_large_number < limit) limit = some_large_number;
// Iterate the original code.
for (; i < limit; ++i)
{
//some other code
}
}
With the following caveats and properties:
The inner loop has no more work than the original loop (it has no extra variable to count or test) and has the i % x == 0 test completely removed. This is optimal for the inner loop in the sense it reduces the nominal amount of work as much as possible, although real-world hardware sometimes has finicky behaviors that can result in more compute time for less actual work.
New identifiers block and limit are introduced but can be changed to avoid any conflicts with uses in the original code.
Other than the above, the inner loop operates identically to the original code: It sees the same values of i in the same order as the original code, so no changes are needed in that code.
some_large_number+(x-1) could overflow int.
I would do it like this:
int j = x;
for (int i = 0; i < some_large_number; i++){
if(--j == 0) {
printf("%f%%\r", percent);
j = x;
}
//some other code
.
.
.
}
Divide the some_large_number by x. Now loop for x times and nest it with the new integer and then print the percent. I meant this:
int temp = some_large_number/x;
for (int i = 0; i < x; i++){
for (int j = 0; j < temp; j++){
//some code
}
printf("%f%%\r", percent);
}
The fastest approach regarding your performance concern would be to use a nested loop:
unsigned int x = 6;
unsigned int segments = some_large_number / x;
unsigned int y;
for ( unsigned int i = 0; i < segments; i++ ) {
printf("%f%%\r", percent);
for ( unsigned int j = 0; j < x; j++ ) {
/* some code here */
}
}
// If some_large_number canĀ“t be divided evenly through `x`:
if (( y = (some_large_number % x)) != 0 )
{
for ( unsigned int i = 0; i < y; i++ ) {
/* same code as inside of the former inner loop. */
}
}
Another example would be to use a different counting variable for the check to execute the print process by comparing that to x - 1 and reset the variable to -1 if it matches:
unsigned int x = 6;
unsigned int some_large_number = 100000000;
for ( unsigned int i = 0, int j = 0; i < some_large_number; i++, j++ ) {
if(j == (x - 1))
{
printf("%f%%\r", percent);
j = -1;
}
/* some code here */
}
Related
Is there a simple way to convert this
for (int i=0; i < 31; i++)
for (int j=0; j < 74; j++)
for (int k=1; k < 12; k++)
for (int l=13; l < 15; l++)
...
To a simpler this
mfor (int start[]={0,0,1,13}; int max[]={31,74,12,15}) {
printf("%i %i\n", start[1], start[3]);
}
Is there a macro or a plugin-like ?
This loop can iterate thought Tensor (like an image) to do stuff like Tensor Convolution or Pooling. In any dimentions (can be more than 4)
Or how to add some syntax to C. I have the implementation for mfor loop. Because for loop is in real a while loop.
There's no simple way, but you could implement an iterator-like type that generates the Cartesian product over n given ranges:
enum {
MAX = 8
};
typedef struct Combinator Combinator;
struct Combinator {
size_t index; // running index of generated numbers
size_t n; // number of dimensions
int data[MAX]; // current combination
int start[MAX]; // lower and ...
int end[MAX]; // .. exclusive upper limits
};
/*
* Adds a dimensin with valid range [start, end) to the combinator
*/
void combo_add(Combinator *c, int start, int end)
{
if (c->n < MAX && start < end) {
c->data[c->n] = start;
c->start[c->n] = start;
c->end[c->n] = end;
c->n++;
}
}
/*
* Reset the combinator to the lower limits
*/
void combo_reset(Combinator *c)
{
c->index = 0;
memcpy(c->data, c->start, sizeof(c->start));
}
/*
* Get the next comnination in c->data. Returns 1 if there
* is a next combination, 0 otherwise.
*/
int combo_next(Combinator *c)
{
size_t i = 0;
if (c->index++ == 0) return 1;
do {
c->data[i]++;
if(c->data[i] < c->end[i]) return 1;
c->data[i] = c->start[i];
i++;
} while (i < c->n);
return 0;
}
This implements an odometer-like counter: It increments the first counter. If it overflows, it resets it and increments the next counter, moving to te next counter as needed. If the last counter overflows, the generation of combinations stops. (There's a bit of a kludge with the index for the first combination so that you can control everything from the loop condition of a while loop. There's probably a more elegant way to solve this.)
Use the combinator like this:
Combinator combo = {0}; // Must initialize with zero
combo_add(&combo, 0, 3);
combo_add(&combo, 10, 12);
combo_add(&combo, 4, 7);
while (combo_next(&combo)) {
printf("%4zu: [%d, %d, %d]\n", combo.index,
combo.data[0], combo.data[1], combo.data[2]);
}
This combiinator is designed to use only once: Create it, set up the ranges, then exhaust the combinations.
If you break out of the loop, the combinator retains its state, so that further call to combo_next continue where you broke off. You can start afresh by calling combo_reset. (This is a bit like reading from a file: The usual way to use them is to read everything, but you can rewind.)
I am trying to parallelize an encoding function, I tried adding a simple pragma around the for but the result was wrong. I figured that iterations are dependent (by code variable) and therefore they can not be directly parallelized.
int encodePrimeFactorization(int number){
int code = 0;
for (int i=PF_NUMBER-1; i>=0 ; i--){
code = code * 2;
int f = prime_factors[i];
if (number % f == 0){
code = code + 1;
}
}
return code;
}
Is there a way to make code variable independent for each iteration?
Yes. At least for me, it's easier to think about that if you look at the algorithm this way:
int code = 0;
for (int i=PF_NUMBER-1; i>=0 ; i--) {
code = code << 1;
int f = prime_factors[i];
if (number % f == 0){
// The last bit of code is never set here,
// because it has been just shifted to the left
code = code | 1;
}
}
Now you can shift the set-bit already while setting:
int code = 0;
for (int i=PF_NUMBER-1; i>=0 ; i--) {
int f = prime_factors[i];
if (number % f == 0){
code = code | (1 << i);
}
}
Now it becomes a trivial reduction. Now you can shift the set-bit already while setting:
int code = 0;
#pragma omp parallel for reduction(|,code)
for (int i=PF_NUMBER-1; i>=0 ; i--) {
int f = prime_factors[i];
if (number % f == 0){
code |= (1 << i);
}
}
That said, you will not get any performance gain. This works only up to 31 bits, which is far too little work to benefit from parallelization overhead. If this is a hot part in your code, you have to find something around it to apply the parallelization.
EDIT: Found a solution! Like the commenters suggested, using memset is an insanely better approach. Replace the entire for loop with
memset(lookup->n, -3, (dimensions*sizeof(signed char)));
where
long int dimensions = box1 * box2 * box3 * box4 * box5 * box6 * box7 * box8 * memvara * memvarb * memvarc * memvard * adirect * tdirect * fs * bs * outputnum;
Intro
Right now, I'm looking at a beast of a for-loop:
for (j = 0;j < box1; j++)
{
for (k = 0; k < box2; k++)
{
for (l = 0; l < box3; l++)
{
for (m = 0; m < box4; m++)
{
for (x = 0;x < box5; x++)
{
for (y = 0; y < box6; y++)
{
for (xa = 0;xa < box7; xa++)
{
for (xb = 0; xb < box8; xb++)
{
for (nb = 0; nb < memvara; nb++)
{
for (na = 0; na < memvarb; na++)
{
for (nx = 0; nx < memvarc; nx++)
{
for (nx1 = 0; nx1 < memvard; nx1++)
{
for (naa = 0; naa < adirect; naa++)
{
for (nbb = 0; nbb < tdirect; nbb++)
{
for (ncc = 0; ncc < fs; ncc++)
{
for (ndd = 0; ndd < bs; ndd++)
{
for (o = 0; o < outputnum; o++)
{
lookup->n[j][k][l][m][x][y][xa][xb][nb][na][nx][nx1][naa][nbb][ncc][ndd][o] = -3; //set to default value
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
The Problem
This loop is called every cycle in the main run to reset values to an initial state. Unfortunately, it is necessary for the structure of the program that this many values are kept in a single data structure.
Here's the kicker: for every 60 seconds of program run time, 57 seconds goes to this function alone.
The Question
My question is this: would hash tables be an appropriate substitute for a linear array? This array has an O(n^17) cardinality, yet hash tables have an ideal of O(1).
If so, what hash library would you recommend? This program is in C and has no native hash support.
If not, what would you recommend instead?
Can you provide some pseudo-code on how you think this should be implemented?
Notes
OpenMP was used in an attempt to parallelize this loop. Numerous implementations only resulted in slightly-to-greatly increased run time.
Memory usage is not particularly an issue -- this program is intended to be ran on an insanely high-spec'd computer.
We are student researchers, thrust into a heretofore unknown world of optimization and parallelization -- please bear with us, and thank you for any help
Hash vs Array
As comments have specified, an array should not be a problem here. Lookup into an array with a known offset is O(1).
The Bottleneck
It seems to me that the bulk of the work here (and the reason it is slow) is the number of pointer de-references in the inner-loop.
To explain in a bit more detail, consider myData[x][y][z] in the following code:
for (int x = 0; x < someVal1; x++) {
for (int y = 0; y < someVal2; y++) {
for (int z = 0; z < someVal3; z++) {
myData[x][y][z] = -3; // x and y only change in outer-loops.
}
}
}
To compute the location for the -3, we do a lookup and add a value - once for myData[x], then again to get to myData[x][y], and once more finally for myData[x][y][z].
Since this lookup is in the inner-most portion of the loop, we have redundant reads. myData[x] and myData[x][y] are being recomputed, even when only z's value is changing. The lookups were performed during a previous iteration, but the results weren't stored.
For your loop, there are many layers of lookups being computed each iteration, even when only the value of o is changing in that inner-loop.
An Improvement for the Bottleneck
To make one lookup, per loop iteration, per loop level, simply store intermediate lookups. Using int* as the indirection (though any type would work here), the sample code above (with myData) would become:
int **a, *b;
for (int x = 0; x < someVal1; x++) {
a = myData[x]; // Store the lookup.
for (int y = 0; y < someVal2; y++) {
b = a[y]; // Indirection based on the stored lookup.
for (int z = 0; z < someVal3; z++) {
b[z] = -3; // This can be extrapolated as needed to deeper levels.
}
}
}
This is just sample code, small adjustments may be necessary to get it to compile (casts and so forth). Note that there is probably no advantage to using this approach with a 3-dimensional array. However, for a 17-dimensional large data set with simple inner-loop operations (such as assignment), this approach should help quite a bit.
Finally, I'm assuming you aren't actually just assigning the value of -3. You can use memset to accomplish that goal much more efficiently.
I saw an interview question which asked to
Interchange arr[i] and i for i=[0,n-1]
EXAMPLE :
input : 1 2 4 5 3 0
answer :5 0 1 4 2 3
explaination : a[1]=2 in input , so a[2]=1 in answer so on
I attempted this but not getting correct answer.
what i am able to do is : for a pair of numbers p and q , a[p]=q and a[q]=p .
any thoughts how to improve it are welcome.
FOR(j,0,n-1)
{
i=j;
do{
temp=a[i];
next=a[temp];
a[temp]=i;
i=next;
}while(i>j);
}
print_array(a,i,n);
It would be easier for me to to understand your answer if it contains a pseudocode with some explaination.
EDIT : I came to knpw it is cyclic permutation so changed the question title.
Below is what I came up with (Java code).
For each value x in a, it sets a[x] to x, and sets x to the overridden value (to be used for a[a[x]]), and repeats until it gets back to the original x.
I use negative values as a flag to indicate that the value's already been processed.
Running time:
Since it only processes each value once, the running time is O(n).
Code:
int[] a = {1,2,4,5,3,0};
for (int i = 0; i < a.length; i++)
{
if (a[i] < 0)
continue;
int j = a[i];
int last = i;
do
{
int temp = a[j];
a[j] = -last-1;
last = j;
j = temp;
}
while (i != j);
a[j] = -last-1;
}
for (int i = 0; i < a.length; i++)
a[i] = -a[i]-1;
System.out.println(Arrays.toString(a));
Here's my suggestion, O(n) time, O(1) space:
void OrderArray(int[] A)
{
int X = A.Max() + 1;
for (int i = 0; i < A.Length; i++)
A[i] *= X;
for (int i = 0; i < A.Length; i++)
A[A[i] / X] += i;
for (int i = 0; i < A.Length; i++)
A[i] = A[i] % X;
}
A short explanation:
We use X as a basic unit for values in the original array (we multiply each value in the original array by X, which is larger than any number in A- basically the length of A + 1). so at any point we can retrieve the number that was in a certain cell of the original array by array by doing A[i] / X, as long as we didn't add more than X to that cell.
This lets us have two layers of values, where A[i] % X represents the value of the cell after the ordering. these two layers don't intersect through the process.
When we finished, we clean A from the original values multiplied by X by performing A[i] = A[i] % X.
Hopes that's clean enough.
Perhaps it is possible by using the images of the input permutation as indices:
void inverse( unsigned int* input, unsigned int* output, unsigned int n )
{
for ( unsigned int i = 0; i < n; i++ )
output[ input[ i ] ] = i;
}
I have the nested loops that nested (r=) 3 times. Each loop running for (n=) 5 times.
for (i=0; i<n; i++)
{
for (j=0; j<n; j++)
{
for (k=0; k<n; k++)
//
}
}
But how do we do the nesting dynamically at run time. Say we know it should be nested r times. Each loop running n times. I thought something like recursion but it goes indefinitely.
funloop (int r)
{
for (int i = 0; i < n; i++)
{
//
if (r < 3)
funloop (r++);
else
return;
}
}
Please let me know how this could be done? I couldn't find a sources online.
If you don't know the depth of the recursion statically, the most common approach is to use recursion to represent the looping. For example, suppose that you need to have d levels of nesting of loops that all need to iterate k times. Then you could implement that using recursion of this form:
void RecursivelyNestIterations(unsigned d, unsigned k) {
/* Base case: If the depth is zero, we don't need to iterate. */
if (d == 0) return;
/* Recursive step: If we need to loop d times, loop once, calling the
* function recursively to have it loop d - 1 times.
*/
for (unsigned i = 0; i < k; ++i) {
/* Recurse by looping d - 1 times using the same number of iterations. */
RecursivelyNestIterations(d - 1, k);
}
}
Hope this helps!
The simplest method is just to collapse it to one for loop:
for(i=0; i<pow(n, r); i++) {
}
That can however make it difficult to access the loop counters, if you need them, but that can be done mathematically. For example, the innermost loop counter variable value is given by :
int c = i % n;
You could have an array of such counters and determine the values with similar equations, or you can just increment them, when required, e.g.:
void iterate(int r, int n) {
int i, rc, *c = malloc(sizeof(int) * r);
memset(c, 0, sizeof(int) * r);
for(i = 0; i < pow(n, r); i++) {
// code here, using loop counters in the 'c' array, where c[0] is counter
// for the outer loop, and c[r - 1] is the counter for the innermost loop
// update the counters
rc = r;
while(rc > 0) {
rc--;
c[rc]++;
if(c[rc] == n) {
c[rc] = 0;
} else {
break;
}
}
}
free(c);
}
Just call if (r) funloop(r-1); in the loop body.
#include <stdlib.h>
#include <stdio.h>
static int n = 3;
void _funloop(int cur,int total)
{
if(cur!=total)
{
for(int cnt=0;cnt!=n;++cnt)
{
fprintf(stdout,"%d::%d\n",cur,cnt);
}
_funloop(cur+1,total);
}
}
void funloop(int total)
{
_funloop(0,total);
}
int main()
{
funloop(10);
return 0;
}
A solution that doesn't use recursion is discussed in this Tip
[link]http://www.codeproject.com/Tips/759707/Generating-dynamically-nested-loops
The code is in C++ and require # for the include and define statements
include <iostream>
define MAXROWS 9
define MAXVALUES 9
using namespace std;
char display[] = {'1','2','3','4','5','6','7','8','9'};
int main() {
int arrs[MAXROWS]; // represent the different variables in the for loops
bool status = false;
for (int r=0;r<MAXROWS;r++)
arrs[r] = 0; // Initialize values
while (!status) {
int total = 0;
// calculate total for exit condition
for (int r=0;r<MAXROWS;r++)
total +=arrs[r];
// test for exit condition
if (total == (MAXVALUES-1)*MAXROWS)
status = true;
// printing
for (int r=0;r<MAXROWS;r++)
cout << display[arrs[r]]; // print(arrs[r])
cout << endl; // print(endline)
// increment loop variables
bool change = true;
int r = MAXROWS-1; // start from innermost loop
while (change && r>=0) {
// increment the innermost variable and check if spill overs
if (++arrs[r] > MAXVALUES-1) {
arrs[r] = 0; // reintialize loop variable
// Change the upper variable by one
// We need to increment the immediate upper level loop by one
change = true;
}
else
change = false; // Stop as there the upper levels of the loop are unaffected
// We can perform any inner loop calculation here arrs[r]
r=r-1; // move to upper level of the loop
}
}