The following snippet is code from water-nsq benchmark from SPLASH 2...
if (comp_last > NMOL1)
{
for (mol = StartMol[ProcID]; mol < NMOL; mol++)
{
pthread_mutex_lock(&gl->MolLock[mol % MAXLCKS]);
for ( dir = XDIR; dir <= ZDIR; dir++) {
temp_p = VAR[mol].F[DEST][dir];
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];
temp_p[O] += PFORCES[ProcID][mol][dir][O];
temp_p[H2] += PFORCES[ProcID][mol][dir][H2];
}
pthread_mutex_unlock(&gl->MolLock[mol % MAXLCKS]);
}
comp = comp_last % NMOL;
for (mol = 0; ((mol <= comp) && (mol < StartMol[ProcID])); mol++)
{
pthread_mutex_lock(&gl->MolLock[mol % MAXLCKS]);
for ( dir = XDIR; dir <= ZDIR; dir++)
{
temp_p = VAR[mol].F[DEST][dir];
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];
temp_p[O] += PFORCES[ProcID][mol][dir][O];
temp_p[H2] += PFORCES[ProcID][mol][dir][H2];
}
pthread_mutex_unlock(&gl->MolLock[mol % MAXLCKS]);
}
}
else
{
for (mol = StartMol[ProcID]; mol <= comp_last; mol++)
{
pthread_mutex_lock(&gl->MolLock[mol % MAXLCKS]);
for ( dir = XDIR; dir <= ZDIR; dir++)
{
temp_p = VAR[mol].F[DEST][dir];
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];
temp_p[O] += PFORCES[ProcID][mol][dir][O];
temp_p[H2] += PFORCES[ProcID][mol][dir][H2];
}
pthread_mutex_unlock(&gl->MolLock[mol % MAXLCKS]);
}
}
pthread_barrier_wait(&(gl->start));
The problem is that it is not deterministic at the barrier in the end, that is, if you execute this code two times with same inputs, it gives different answers. In other words, if the lock order of mutexes is changed, the results are different.
And yes I have verified this by noting the memory pages. Also I can assure you that the change occurs in the VAR's (pointed by temp_p) memory.
I want to know why? Because apparently, all threads are putting their own values (PFORCES[ProcID]...) to the sum of temp_p and at the end, that is at the barrier, the results should be same, no matter the order in which threads acquired the locks.
[EDITED]
Also, please note that variables comp, dir and mol are all local variables of the thread and therefore not shared.
Second try.
I can't check it, but I assume that in temp_p[H1] += PFORCES[ProcID][mol][dir][H1]; you are adding doubles or floats.
For floating point types, the order of addition matters! Floating point addition is not associative!
A different thread order means a different addition order. So changes in the outcome are to be expected.
See http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems for some explanation.
I notice that you do not show the declaration of the loop variables, like mol and dir.
Could it be that the are accidently shared between threads?
If so, all kind of race conditions between e.g. one thread's mol++ and other thread's [mol % MAXLCKS] will cause problems.
UPDATE: According to the comments below, this does not seem to be the case.
Related
This is a tough one, at least for my minimal c skills.
Basically, the user enters a list of prices into an array, and then the desired number of items he wants to purchase, and finally a maximum cost not to exceed.
I need to check how many combinations of the desired number of items are less than or equal to the cost given.
If the problem was a fixed number of items in the combination, say 3, it would be much easier with just three loops selecting each price and adding them to test.
Where I get stumped is the requirement that the user enter any number of items, up to the number of items in the array.
This is what I decided on at first, before realizing that the user could specify combinations of any number, not just three. It was created with help from a similar topic on here, but again it only works if the user specifies he wants 3 items per combination. Otherwise it doesn't work.
// test if any combinations of items can be made
for (one = 0; one < (count-2); one++) // count -2 to account for the two other variables
{
for (two = one + 1; two < (count-1); two++) // count -1 to account for the last variable
{
for (three = two + 1; three < count; three++)
{
total = itemCosts[one] + itemCosts[two] + itemCosts[three];
if (total <= funds)
{
// DEBUG printf("\nMatch found! %d + %d + %d, total: %d.", itemCosts[one], itemCosts[two], itemCosts[three], total);
combos++;
}
}
}
}
As far as I can tell there's no easy way to adapt this to be flexible based on the user's desired number of items per combination.
I would really appreciate any help given.
One trick to flattening nested iterations is to use recursion.
Make a function that takes an array of items that you have selected so far, and the number of items you've picked up to this point. The algorithm should go like this:
If you have picked the number of items equal to your target of N, compute the sum and check it against the limit
If you have not picked enough items, add one more item to your list, and make a recursive call.
To ensure that you do not pick the same item twice, pass the smallest index from which the function may pick. The declaration of the function may look like this:
int count_combinations(
int itemCosts[]
, size_t costCount
, int pickedItems[]
, size_t pickedCount
, size_t pickedTargetCount
, size_t minIndex
, int funds
) {
if (pickedCount == pickedTargetCount) {
// This is the base case. It has the code similar to
// the "if" statement from your code, but the number of items
// is not fixed.
int sum = 0;
for (size_t i = 0 ; i != pickedCount ; i++) {
sum += pickedItems[i];
}
// The following line will return 0 or 1,
// depending on the result of the comparison.
return sum <= funds;
} else {
// This is the recursive case. It is similar to one of your "for"
// loops, but instead of setting "one", "two", or "three"
// it sets pickedItems[0], pickedItems[1], etc.
int res = 0;
for (size_t i = minIndex ; i != costCount ; i++) {
pickedItems[pickedCount] = itemCosts[i];
res += count_combinations(
itemCosts
, costCount
, pickedItems
, pickedCount+1
, pickedTargetCount
, i+1
, funds
);
}
return res;
}
}
You call this function like this:
int itemCosts[C] = {...}; // The costs
int pickedItems[N]; // No need to initialize this array
int res = count_combinations(itemCosts, C, pickedItems, 0, N, 0, funds);
Demo.
This can be done by using a backtracking algorithm. This is equivalent to implementing a list of nested for loops. This can be better understood by trying to see the execution pattern of a sequence of nested for loops.
For example lets say you have, as you presented, a sequence of 3 fors and the code execution has reached the third level (the innermost). After this goes through all its iterations you return to the second level for where you go to the next iteration in which you jump again in third level for. Similarly, when the second level finishes all its iteration you jump back to the first level for which continues with the next iteration in which you jump in the second level and from there in the third.
So, in a given level you try go to the deeper one (if there is one) and if there are no more iterations you go back a level (back track).
Using the backtracking you represent the nested for by an array where each element is an index variable: array[0] is the index for for level 0, and so on.
Here is a sample implementation for your problem:
#define NUMBER_OF_OBJECTS 10
#define FORLOOP_DEPTH 4 // This is equivalent with the number of
// of nested fors and in the problem is
// the number of requested objects
#define FORLOOP_ARRAY_INIT -1 // This is a init value for each "forloop" variable
#define true 1
#define false 0
typedef int bool;
int main(void)
{
int object_prices[NUMBER_OF_OBJECTS];
int forLoopsArray[FORLOOP_DEPTH];
bool isLoopVariableValueUsed[NUMBER_OF_OBJECTS];
int forLoopLevel = 0;
for (int i = 0; i < FORLOOP_DEPTH; i++)
{
forLoopsArray[i] = FORLOOP_ARRAY_INIT;
}
for (int i = 0; i < NUMBER_OF_OBJECTS; i++)
{
isLoopVariableValueUsed[i] = false;
}
forLoopLevel = 0; // Start from level zero
while (forLoopLevel >= 0)
{
bool isOkVal = false;
if (forLoopsArray[forLoopLevel] != FORLOOP_ARRAY_INIT)
{
// We'll mark the loopvariable value from the last iterration unused
// since we'll use a new one (in this iterration)
isLoopVariableValueUsed[forLoopsArray[forLoopLevel]] = false;
}
/* All iterations (in all levels) start basically from zero
* Because of that here I check that the loop variable for this level
* is different than the previous ones or try the next value otherwise
*/
while ( isOkVal == false
&& forLoopsArray[forLoopLevel] < (NUMBER_OF_OBJECTS - 1))
{
forLoopsArray[forLoopLevel]++; // Try a new value
if (loopVariableValueUsed[forLoopsArray[forLoopLevel]] == false)
{
objectUsed[forLoopsArray[forLoopLevel]] = true;
isOkVal = true;
}
}
if (isOkVal == true) // Have we found in this level an different item?
{
if (forLoopLevel == FORLOOP_DEPTH - 1) // Is it the innermost?
{
/* Here is the innermost level where you can test
* if the sum of all selected items is smaller than
* the target
*/
}
else // Nope, go a level deeper
{
forLoopLevel++;
}
}
else // We've run out of values in this level, go back
{
forLoopsArray[forLoopLevel] = FORLOOP_ARRAY_INIT;
forLoopLevel--;
}
}
}
Is there a way to continue the most outer loop from the most nested one in ABAP?
Example in Java. There is a construct in this language using labels (most people do not know of it anyway) which allows me to continue the most outer loop from the nested one.
public class NestedLoopContinue {
public static void main(String[] args) {
label1: for (int i = 0; i < 5; i++) {
for (int j = 0; j < 2; j++) {
if (i == 3) {
continue label1;
}
}
System.out.println(i + 1);
}
}
}
This outputs
1
2
3
5
Now, how can I do it in ABAP in a smart way? One solution would be to use TRY. ENDTRY. block but it is rather a hacking one. Any other ideas?
DATA: l_outer_counter TYPE i.
DO 5 TIMES.
l_outer_counter = sy-index.
TRY.
DO 2 TIMES.
IF l_outer_counter = 4.
RAISE EXCEPTION TYPE cx_abap_random.
ENDIF.
ENDDO.
WRITE / l_outer_counter.
CATCH cx_abap_random.
CONTINUE.
ENDTRY.
ENDDO.
Or maybe there is a way to tell whether the DO. ENDO. ended with an EXIT statement (without introducing an own variable of course, like SYST global variable)?
DATA: l_outer_counter TYPE i.
DO 5 TIMES.
l_outer_counter = sy-index.
DO 2 TIMES.
IF l_outer_counter = 4.
EXIT.
ENDIF.
ENDDO.
IF sy-last_loop_ended_with_exit = abap_true. "???
CONTINUE.
ENDIF.
WRITE / l_outer_counter.
ENDDO.
I don't know of an ABAP-specific solution, but I've used a general programming solution to handle this before; simply use a boolean and check at the end of the inner loop whether or not to continue.
In Java:
public class NestedLoopContinue
{
public static void main(String[] args)
{
for (int i = 0; i < 5; i++)
{
boolean earlyBreak = false;
for (int j = 0; j < 2; j++)
{
if (i == 3)
{
earlyBreak = true;
break;
}
}
if (earlyBreak)
{
continue;
}
System.out.println(i + 1);
}
}
}
And in ABAP:
DATA: l_outer_counter type i,
early_break type FLAG.
DO 5 TIMES.
l_outer_counter = sy-index.
DO 2 TIMES.
IF l_outer_counter = 4.
early_break = ABAP_TRUE.
EXIT.
ENDIF.
ENDDO.
IF early_break = ABAP_TRUE.
CLEAR early_break.
CONTINUE.
ENDIF.
WRITE / l_outer_counter.
ENDDO.
I've read that the reason label-based breaks exist in Java in the first place is because GOTO statements explicitly do not, and the case covered by label-based break was one of the few "good" uses of GOTO that the team wanted to maintain.
In general, though, this is a very awkward construction. Is there no potential way to refactor your code (perhaps swapping the inner-ness of the loops) to remove the need for this in the first place?
When working with nested loops, I often find the best way to improve readability, and avoid using more unusual approaches (such as breaking to a label, which is not only controversial because of its goto-like nature, but also reduces readability because a lot of people are not familiar with them) is to extract the inner loop into a separate function. I do not know how this is done in ABAP, but the refactored Java equivalent would be:
public class NestedLoopContinue {
public static void main(String[] args) {
for (int i = 0; i < 5; i++) {
NestedLoopContinue.innerLoop(i)
}
}
static void innerLoop(int i) {
for (int j = 0; j < 2; j++) {
if (i == 3) {
return;
}
}
System.out.println(i + 1);
}
}
I would argue that in this example, this actually becomes less readable because it is harder to follow the logic across the two methods. However, if this was a real-world example (where the methods and variables had some actual meanings and appropriate names to go with them), then the result of extracting the inner loop into a separate method would be more readable than using a label.
Based on the robjohncox answer, the ABAP code might look like this.
CLASS lcl_nested_loop_continue DEFINITION FINAL.
PUBLIC SECTION.
CLASS-METHODS:
main.
PRIVATE SECTION.
CLASS-METHODS:
inner_loop
IMPORTING
i_index TYPE i.
ENDCLASS.
CLASS lcl_nested_loop_continue IMPLEMENTATION.
METHOD main.
DO 5 TIMES.
lcl_nested_loop_continue=>inner_loop( sy-index ).
ENDDO.
ENDMETHOD.
METHOD inner_loop.
DO 2 TIMES.
IF i_index = 4.
RETURN.
ENDIF.
ENDDO.
WRITE / i_index.
ENDMETHOD.
ENDCLASS.
I'm quite new to programming and have recently started in Processing.
In my code, the collide function sets the touch boolean to be true but by arraying it, it only tests true for the final array and not the ones before it. Where am I going wrong here? I hope my question is clear enough.
edit:
Sorry, let me try again.
I guess my problem is finding out how to array the collide function properly. I cant seem to add a [i] for the collide in the array.
At the moment, the code works but it only tests true for the last array and not for the ones before it.
The array code:
for(int i = 0 ; i < lineDiv; i++){
collide(xPts[i], yPts[i], vecPoints.xPos, vecPoints.yPos, myDeflector.Thk, vecPoints.d);
The collide function:
void collide(float pt1x, float pt1y, float pt2x, float pt2y, int size1, int size2){
if (pt1x + size1/2 >= pt2x - size2/2 &&
pt1x - size1/2 <= pt2x + size2/2 &&
pt1y + size1/2 >= pt2y - size2/2 &&
pt1y - size1/2 <= pt2y + size2/2) {
touch = true;
}
else{
touch=false;
}
Your "touch" variable is global. Every time you call the collide() function, it overwrites whatever it was set to before. Perhaps you just want to test if touch is true after calling collide(), then exit the for loop?
Alternatively, you may want to make collide() return the touch boolean, avoiding the global.
It looks like what you want to do is to run through a loop, run the function on that element of the array and return a value if any of them are true. This is my best guess, you might want to edit your question to clarify what you are looking to do. So assuming this:
1) change you method to a function
boolean collide(float pt1x, float pt1y, float pt2x, float pt2y, int size1, int size2){
if (pt1x + size1/2 >= pt2x - size2/2 &&
pt1x - size1/2 <= pt2x + size2/2 &&
pt1y + size1/2 >= pt2y - size2/2 &&
pt1y - size1/2 <= pt2y + size2/2) {
return true;
}
else{
return false;
}
2) change your loop and how you are calling it
touch = false; // if you don't set this to false before the loop, it will be the last value taken
for(int i = 0 ; i < lineDiv; i++){
if (collide(xPts[i], yPts[i], vecPoints.xPos, vecPoints.yPos, myDeflector.Thk, vecPoints.d)) touch = true;
Before the action would be that touch might cycle between true and false as you iterate through the array and in processing (because you would likely draw out the data) it is unlikely that you would want this behavior because you wouldn't be able to do anything with it unless you packed that data in another structure like an array.
So now, the "touch" is set to false and will change to true if any function calls return a true. If all are false, it will stay false.
note: you might consider using either xPts.length() or yPts.length() vs lineDiv. This would reduce the possibility of a array out of bounds exception assuming xPts and yPts have the same # of elements.
I'm very new to DSP. And have to solve the following problem: applying the low shelving filter for an array of data. The original data is displayed in fract16 (VisualDSP++).
I'm writing something as below but not sure it's correct or not.
Does the following code have any problem with overflow?
If 1 is true, how should I do to prevent it?
Any advice on this problem?
fract16 org_data[256]; //original data
float16 ArrayA[],ArrayB[];
long tmp_A0, tmp_A1, tmp_A2, tmp_B1, tmp_B2;
float filter_paraA[3], filter_paraB[3]; // correctness: 0.xxxxx
// For equalizing
// Low-Shelving filter
for ( i=0; i<2; i++)
{
tmp_A1 = ArrayA[i*2];
tmp_A2 = ArrayA[i*2+1];
tmp_B1 = ArrayB[i*2];
tmp_B2 = ArrayB[i*2+1];
for(j=0;j<256;j++){
tmp_A0 = org_data[j];
org_data[j] = filter_paraA[0] * tmp_A0
+ filter_paraA[1] * tmp_A1
+ filter_paraA[2] * tmp_A2
- filter_paraB[1] * tmp_B1
- filter_paraB[2] * tmp_B2;
tmp_A2 = tmp_A1;
tmp_B2 = tmp_B1;
tmp_A1 = tmp_A0;
tmp_B1 = org_data[j];
}
ArrayA[i*2] = tmp_A1;
ArrayA[i*2+1] = tmp_A2;
ArrayB[i*2] = tmp_B1;
ArrayB[i*2+1] = tmp_B2;
}
I don't know what the range is for fract16, just -1 to +1 approx?
The section that stands out to me as possibly generating an overflow is assigning org_data[j] but will be dependent on what you know about your input signal and your filter coefficients. If you can ensure that multiplying filter_paraA[2:0] to signal with values tmp_A2..1 = [1,1,1] is < max(fract16) you should be fine regardless of the 'B' side.
I would recommend adding some checks for overflow in your code. It doesn't necessarily have to fix it, but you would be able to identify an otherwise very tricky bug. Unless you need absolute max performance I would even leave the check code in place but with less output or setting a flag that gets checked.
macA = filter_paraA[0] * tmp_A0 + filter_paraA[1] * tmp_A1 \
+ filter_paraA[2] * tmp_A2;
macB = filter_paraB[1] * tmp_B1 - filter_paraB[2] * tmp_B2;
if((macA-macB)>1){
printf("ERROR! Overflow detected!\n");
printf("tmp_A[] = [%f, %f, %f]\n",tmp_A2,tmp_A1,tmp_A0);
printf("tmp_B[] = [%f, %f]\n",tmp_B1,tmp_B0);
printf(" i = %i, j = %i\n",i,j);
}
When I remove the tests to compute minimum and maximum from the loop, the execution time is actually longer than with the test. How is that possible ?
Edit :
After running more test, it seems the runtime is not constant, ie the same code
can run in 9 sec or 13 sec.... So it was just a repetable coincidence. Repetable until you do enough tests that is...
Some details :
execution time with the min max test : 9 sec
execution time without the min max test : 13 sec
CFLAGS=-Wall -O2 -fPIC -g
gcc 4.4.3 32 bit
Section to remove is now indicated in code
Some guess :
bad cache interaction ?
void FillFullValues(void)
{
int i,j,k;
double X,Y,Z;
double p,q,r,p1,q1,r1;
double Ls,as,bs;
unsigned long t1, t2;
t1 = GET_TICK_COUNT();
MinLs = Minas = Minbs = 1000000.0;
MaxLs = Maxas = Maxbs = 0.0;
for (i=0;i<256;i++)
{
for (j=0;j<256;j++)
{
for (k=0;k<256;k++)
{
X = 0.4124*CielabValues[i] + 0.3576*CielabValues[j] + 0.1805*CielabValues[k];
Y = 0.2126*CielabValues[i] + 0.7152*CielabValues[j] + 0.0722*CielabValues[k];
Z = 0.0193*CielabValues[i] + 0.1192*CielabValues[j] + 0.9505*CielabValues[k];
p = X * InvXn;
q = Y;
r = Z * InvZn;
if (q>0.008856)
{
Ls = 116*pow(q,third)-16;
}
else
{
Ls = 903.3*q;
}
if (q<=0.008856)
{
q1 = 7.787*q+seiz;
}
else
{
q1 = pow(q,third);
}
if (p<=0.008856)
{
p1 = 7.787*p+seiz;
}
else
{
p1 = pow(p,third);
}
if (r<=0.008856)
{
r1 = 7.787*r+seiz;
}
else
{
r1 = pow(r,third);
}
as = 500*(p1-q1);
bs = 200*(q1-r1);
//
// cast on short int for reducing array size
//
FullValuesLs[i][j][k] = (char) (Ls);
FullValuesas[i][j][k] = (char) (as);
FullValuesbs[i][j][k] = (char) (bs);
//// Remove this and get slower code
if (MaxLs<Ls)
MaxLs = Ls;
if ((abs(Ls)<MinLs) && (abs(Ls)>0))
MinLs = Ls;
if (Maxas<as)
Maxas = as;
if ((abs(as)<Minas) && (abs(as)>0))
Minas = as;
if (Maxbs<bs)
Maxbs = bs;
if ((abs(bs)<Minbs) && (abs(bs)>0))
Minbs = bs;
//// End of Remove
}
}
}
TRACE(_T("LMax = %f LMin = %f\n"),(MaxLs),(MinLs));
TRACE(_T("aMax = %f aMin = %f\n"),(Maxas),(Minas));
TRACE(_T("bMax = %f bMin = %f\n"),(Maxbs),(Minbs));
t2 = GET_TICK_COUNT();
TRACE(_T("WhiteBalance init : %lu ms\n"), t2 - t1);
}
I think compiler is trying to unroll the inner loop because you are removing dependency between iterations. But somehow this doesn't help in your case. Maybe because the loop is too big and using too many registers to be unrolled.
Try to turn off unrolling and post results again.
If this is the case, I would suggest you to submit a performance issue to gcc.
PS. I think you can merge if (q>0.008856) and if (q<=0.008856).
Maybe its the cache, maybe unrolling problems, there is only one way to answer this: look at the generated code (e.g. by using the -S option). Maybe you can post it/or spot the difference when comparing them.
EDIT: As you now clarified that it was just the measurement I can only recommend (or better command ;-) you, that when you want to get runtime numbers: ALWAYS put it into some loop and average it. Best to do it outside your programm (in a shell script), so your cache is not already filled with the right data.