How can I apply a low-shelving filter using Visualdsp++? - c

I'm very new to DSP. And have to solve the following problem: applying the low shelving filter for an array of data. The original data is displayed in fract16 (VisualDSP++).
I'm writing something as below but not sure it's correct or not.
Does the following code have any problem with overflow?
If 1 is true, how should I do to prevent it?
Any advice on this problem?
fract16 org_data[256]; //original data
float16 ArrayA[],ArrayB[];
long tmp_A0, tmp_A1, tmp_A2, tmp_B1, tmp_B2;
float filter_paraA[3], filter_paraB[3]; // correctness: 0.xxxxx
// For equalizing
// Low-Shelving filter
for ( i=0; i<2; i++)
{
tmp_A1 = ArrayA[i*2];
tmp_A2 = ArrayA[i*2+1];
tmp_B1 = ArrayB[i*2];
tmp_B2 = ArrayB[i*2+1];
for(j=0;j<256;j++){
tmp_A0 = org_data[j];
org_data[j] = filter_paraA[0] * tmp_A0
+ filter_paraA[1] * tmp_A1
+ filter_paraA[2] * tmp_A2
- filter_paraB[1] * tmp_B1
- filter_paraB[2] * tmp_B2;
tmp_A2 = tmp_A1;
tmp_B2 = tmp_B1;
tmp_A1 = tmp_A0;
tmp_B1 = org_data[j];
}
ArrayA[i*2] = tmp_A1;
ArrayA[i*2+1] = tmp_A2;
ArrayB[i*2] = tmp_B1;
ArrayB[i*2+1] = tmp_B2;
}

I don't know what the range is for fract16, just -1 to +1 approx?
The section that stands out to me as possibly generating an overflow is assigning org_data[j] but will be dependent on what you know about your input signal and your filter coefficients. If you can ensure that multiplying filter_paraA[2:0] to signal with values tmp_A2..1 = [1,1,1] is < max(fract16) you should be fine regardless of the 'B' side.
I would recommend adding some checks for overflow in your code. It doesn't necessarily have to fix it, but you would be able to identify an otherwise very tricky bug. Unless you need absolute max performance I would even leave the check code in place but with less output or setting a flag that gets checked.
macA = filter_paraA[0] * tmp_A0 + filter_paraA[1] * tmp_A1 \
+ filter_paraA[2] * tmp_A2;
macB = filter_paraB[1] * tmp_B1 - filter_paraB[2] * tmp_B2;
if((macA-macB)>1){
printf("ERROR! Overflow detected!\n");
printf("tmp_A[] = [%f, %f, %f]\n",tmp_A2,tmp_A1,tmp_A0);
printf("tmp_B[] = [%f, %f]\n",tmp_B1,tmp_B0);
printf(" i = %i, j = %i\n",i,j);
}

Related

node red global array variable - Get / Set not seeming to work

Working in Node-Red. I can use regular global variables across nodes and flows no problem. However, would like to use a global array variable.
Method A - desired functionality
I read in 16 data points at a time (type = double) and want to have them be index 0-15, then the following node would update indexes 16-31; then 32-45 and 46-64 in the last two nodes.
Node Red however, won't let update the array from the second node starting from index #16. I get "TypeError: Cannot read property 'indexOf' of undefined" error.
In lieu of Method A, I could have four different 16-index global arrays. However, accessing them gives erratic results. Trying to access index[n] returns the value from some other index - i.e. global.get("variable"[0]) returns variable[10] and global.get("variable"[1]) returns the value from variable[27].
This describes the problem -
https://www.youtube.com/watch?v=cF1bz8bEozI
Here is my sample flow:
[{"id":"ee1694d.7df4768","type":"i2c in","z":"d556390c.391838","name":"Read Camera","address":"105","command":"128","count":"32","x":240,"y":1480,"wires":[["9e27949c.512c28"]]},{"id":"d9eaa7a4.7f0ed8","type":"inject","z":"d556390c.391838","name":"ON","topic":"1","payload":"1","payloadType":"str","repeat":"","crontab":"","once":false,"x":70,"y":1480,"wires":[["ee1694d.7df4768"]]},{"id":"6dc0727a.4cf53c","type":"i2c in","z":"d556390c.391838","name":"Read Camera","address":"105","command":"160","count":"32","x":240,"y":1520,"wires":[["a7ac4b94.44ce58"]]},{"id":"d6d80973.784148","type":"i2c in","z":"d556390c.391838","name":"Read Camera","address":"105","command":"192","count":"32","x":240,"y":1560,"wires":[["b90d910d.8e743","ebeeb439.54cf18"]]},{"id":"b90d910d.8e743","type":"i2c in","z":"d556390c.391838","name":"Read Camera","address":"105","command":"224","count":"32","x":240,"y":1600,"wires":[["2f7b8dde.7a9902"]]},{"id":"6b1509e2.8bd4d8","type":"debug","z":"d556390c.391838","name":"Row 3,4","active":true,"console":"false","complete":"payload","x":1020,"y":1520,"wires":[]},{"id":"a828b6d2.40da08","type":"delay","z":"d556390c.391838","name":"","pauseType":"delay","timeout":"50","timeoutUnits":"milliseconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":98,"y":1656,"wires":[["d6d80973.784148"]]},{"id":"ad0a1424.eaae08","type":"function","z":"d556390c.391838","name":"Save Global variables for Temperature","func":"global.set(\"RangeTemperaturesA\", 0);\n\nfor(i=0; i<16; i++){\n global.set(\"RangeTemperaturesA\"[i], msg.payload[i]); \n}\n\nreturn msg;","outputs":1,"noerr":0,"x":750,"y":1480,"wires":[["6b810d97.0beee4","6dc0727a.4cf53c"]]},{"id":"d386a34f.525d2","type":"function","z":"d556390c.391838","name":"Save Global variables for Temperature","func":"for(i=0; i<16; i++){\n global.set(\"RangeTemperatureB\"[i], msg.payload[i]); \n}\n\nreturn msg;","outputs":1,"noerr":0,"x":750,"y":1520,"wires":[["6b1509e2.8bd4d8"]]},{"id":"11c935c.be330ca","type":"function","z":"d556390c.391838","name":"Save Global variables for Temperature","func":"for(i=0; i<16; i++){\n global.set(\"RangeTemperatureC\"[i], msg.payload[i]); \n}\n\nreturn msg;","outputs":1,"noerr":0,"x":750,"y":1560,"wires":[[]]},{"id":"294185a.d5fe67a","type":"function","z":"d556390c.391838","name":"Save Global variables for Temperature","func":"for(i=0; i<16; i++){\n global.set(\"RangeTemperatureD\"[i], msg.payload[i]); \n}\n\nreturn msg;","outputs":1,"noerr":0,"x":750,"y":1600,"wires":[[]]},{"id":"3ffa9e84.cba002","type":"function","z":"d556390c.391838","name":"Find Max Temperature","func":"//n = Math.max(... global.get(\"RangeTemperature\"));\n\nreturn {payload: global.get(\"RangeTemperature\")};","outputs":1,"noerr":0,"x":900,"y":1660,"wires":[[]]},{"id":"9e27949c.512c28","type":"function","z":"d556390c.391838","name":"Get Temps full row","func":"var gridEye = [];\nvar loop=0;\n\nfor(n=0; n<32; n+=2){\n gridEye[loop] = ((msg.payload[n+1]<<8) | msg.payload[n]) * 0.25;\n //convert to F\n gridEye[loop] = ((5.0/3.0) * gridEye[loop] + 32.0).toFixed(2);\n //add right bitshit to reduce noise\n loop++;\n}\nmsg.payload=gridEye;\n\nreturn msg;","outputs":1,"noerr":0,"x":490,"y":1480,"wires":[["ad0a1424.eaae08"]]},{"id":"a7ac4b94.44ce58","type":"function","z":"d556390c.391838","name":"Get Temps full row","func":"var gridEye = []; //16-byte array with temperature readings\nvar loop=0;\n\nfor(n=0; n<32; n+=2){\n // Get raw values - bitshift left 8 bits then bitwise OR.\n // then take new value and multiply by 0.25 since it reads in 1/4 degree C\n gridEye[loop] = ((msg.payload[n+1]<<8) | msg.payload[n]) * 0.25;\n //convert to F\n gridEye[loop] = ((5.0/3.0) * gridEye[loop] + 32.0).toFixed(2);\n loop++;\n}\n\nmsg.payload=gridEye;\n\nreturn msg;","outputs":1,"noerr":0,"x":490,"y":1520,"wires":[["d386a34f.525d2"]]},{"id":"ebeeb439.54cf18","type":"function","z":"d556390c.391838","name":"Get Temps full row","func":"var gridEye = [];\nvar loop=0;\n/*\nvar pixel = 4;\nvar tmp = ((msg.payload[pixel*2 + 1]<<8) | msg.payload[pixel*2])*0.25; \n//gridEye reads in .25 degree C\ntmp = ((5/3 * tmp) + 32.0); //convert to F\n*/\n\nfor(n=0; n<32; n+=2){\n gridEye[loop] = ((msg.payload[n+1]<<8) | msg.payload[n]) * 0.25;\n //convert to F\n gridEye[loop] = ((5.0/3.0) * gridEye[loop] + 32.0).toFixed(2);\n //add right bitshit to reduce noise\n loop++;\n}\n/*\nfor(n=0; n<8; n++){\n gridEye[n]= ((n/35536 * 60 ) + 20);\n //convert to F\n //gridEye[n] = (((5.0/3.0) * gridEye[n]) + 32).toFixed(2);\n}\n*/\nmsg.payload=gridEye;\n\nreturn msg;","outputs":1,"noerr":0,"x":490,"y":1560,"wires":[["11c935c.be330ca"]]},{"id":"2f7b8dde.7a9902","type":"function","z":"d556390c.391838","name":"Get Temps full row","func":"var gridEye = [];\nvar loop=0;\n/*\nvar pixel = 4;\nvar tmp = ((msg.payload[pixel*2 + 1]<<8) | msg.payload[pixel*2])*0.25; \n//gridEye reads in .25 degree C\ntmp = ((5/3 * tmp) + 32.0); //convert to F\n*/\n\nfor(n=0; n<32; n+=2){\n gridEye[loop] = ((msg.payload[n+1]<<8) | msg.payload[n]) * 0.25;\n //convert to F\n gridEye[loop] = ((5.0/3.0) * gridEye[loop] + 32.0).toFixed(2);\n //add right bitshit to reduce noise\n loop++;\n}\n/*\nfor(n=0; n<8; n++){\n gridEye[n]= ((n/35536 * 60 ) + 20);\n //convert to F\n //gridEye[n] = (((5.0/3.0) * gridEye[n]) + 32).toFixed(2);\n}\n*/\nmsg.payload=gridEye;\n\nreturn msg;","outputs":1,"noerr":0,"x":490,"y":1600,"wires":[["294185a.d5fe67a"]]},{"id":"a0de3101.0c307","type":"function","z":"d556390c.391838","name":"Read from A","func":"var p = global.get(\"RangeTemperaturesA\"[1]);\nmsg.payload = p;\nreturn msg;","outputs":1,"noerr":0,"x":510,"y":1780,"wires":[["dbb0935d.742f7"]]},{"id":"dbb0935d.742f7","type":"debug","z":"d556390c.391838","name":"test A","active":true,"console":"false","complete":"payload","x":670,"y":1860,"wires":[]},{"id":"f6c50374.59f","type":"function","z":"d556390c.391838","name":"Read from B","func":"var n = global.get(\"RangeTemperatureB\"[0]);\nreturn {payload: n};","outputs":1,"noerr":0,"x":770,"y":1780,"wires":[["3e8947bc.be49b8"]]},{"id":"6b810d97.0beee4","type":"debug","z":"d556390c.391838","name":"Row 1,2","active":true,"console":"false","complete":"payload","x":1020,"y":1480,"wires":[]},{"id":"3e8947bc.be49b8","type":"debug","z":"d556390c.391838","name":"test b","active":true,"console":"false","complete":"payload","x":910,"y":1860,"wires":[]},{"id":"23da3f7f.1c4f8","type":"inject","z":"d556390c.391838","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"x":340,"y":1780,"wires":[["a0de3101.0c307"]]},{"id":"2a146fa7.577fc","type":"inject","z":"d556390c.391838","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"x":580,"y":1720,"wires":[["f6c50374.59f"]]}]
Edit -
I did a quick test -
For global.set -
global.set("RangeTemperaturesA", i)[i]; gives "TypeError: Cannot read property '0' of undefined"
global.set("RangeTemperaturesA[i]", i); gives "Error: Invalid property expression: unexpected i at position 19"
global.set("RangeTemperaturesA"[i], i); appears to probably work.
Sample code:
for(i=0; i<16; i++){
global.set("RangeTemperaturesA"[i], i);
node.warn("Value: " + i);
}
return msg;
Global.get -
global.get("RangeTemperaturesA"[n]) gives erratic results.
global.get("RangeTemperaturesA[n]") gives "Error: Invalid property expression: unexpected n at position 19"
global.get("RangeTemperaturesA")[n] gives "Value: undefined; Count: 0" gives "Value: undefined; Count: 0" which is perhaps the most promising if the array was never populated correctly.
Sample code:
for(n=0; n<16; n++){
node.warn("Value: " + global.get("RangeTemperaturesA")[n] + "; Count: " + n);
}
return msg;
The problem is due to the way you're trying to address the individual array entries.
With the code: global.get("variable"[0]), you are asking it to use the 0th element of the string "variable" as the argument passed to the get function. In otherwords, it is equivalent to: global.get("v")
Similarly, global.get("variable"[2]) will be equivalent to global.get("r").
You should either move array index inside the quotes:
global.get("variable[0]");
or access the 0th element of the result of the get function:
global.get("variable")[0];
The same holds true for how you are trying to use the set function.
Updates to reflect edits to the quesiton
None of your attempts to use global.set() are correct:
global.set("RangeTemperaturesA", i)[i] - here you are setting the global property RangeTemperaturesA to the value of i. The function set doesn't return anything, so attempting to treat it as an array is just wrong.
global.set("RangeTemperaturesA[i]", i); - this is the closest of the three, however, you are setting the string literal RangeTemperaturesA[i] - JavaScript doesn't know you want the i in the middle of that string to be the value of your local variable i.
global.set("RangeTemperaturesA"[i], i); - no. This is the same error as you had in the original question. "RangeTemperaturesA"[i] will evalute to the ith character of the string RangeTemperaturesA.
To do it properly, you want to use "RangeTemperaturesA["+i+"]" as the key:
global.set("RangeTemperaturesA["+i+"]", i);
When i is 0, that will generate the key RangeTemperaturesA[0].
The same applies for global.get:
var myValue = global.get("RangeTemperaturesA["+i+"]");
All of these examples assume you have already set RangeTemperaturesA to be an array:
global.set("RangeTemperaturesA",[]);

Move Zeroes in Scala

I'm working on "Move Zeroes" of leetcode with scala. https://leetcode.com/problems/move-zeroes/description/
Given an array nums, write a function to move all 0's to the end of it while maintaining the relative order of the non-zero elements. You must do this in-place without making a copy of the array.
I have a solution which works well in IntelliJ but get the same Array with input while executing in Leetcode, also I'm not sure whether it is done in-place... Something wrong with my code ?
Thanks
def moveZeroes(nums: Array[Int]): Array[Int] = {
val lengthOrig = nums.length
val lengthFilfter = nums.filter(_ != 0).length
var numsWithoutZero = nums.filter(_ != 0)
var numZero = lengthOrig - lengthFilfter
while (numZero > 0){
numsWithoutZero = numsWithoutZero :+ 0
numZero = numZero - 1
}
numsWithoutZero
}
And one more thing: the template code given by leetcode returns Unit type but mine returns Array.
def moveZeroes(nums: Array[Int]): Unit = {
}
While I agree with #ayush, Leetcode is explicitly asking you to use mutable states. You need to update the input array so that it contains the changes. Also, they ask you to do that in a minimal number of operations.
So, while it is not idiomatic Scala code, I suggest you a solution allong these lines:
def moveZeroes(nums: Array[Int]): Unit = {
var i = 0
var lastNonZeroFoundAt = 0
while (i < nums.size) {
if(nums(i) != 0) {
nums(lastNonZeroFoundAt) = nums(i)
lastNonZeroFoundAt += 1
}
i += 1
i = lastNonZeroFoundAt
while(i < nums.size) {
nums(i) = 0
i += 1
}
}
As this is non-idomatic Scala, writing such code is not encouraged and thus, a little bit difficult to read. The C++ version that is shown in the solutions may actually be easier to read and help you to understand my code above:
void moveZeroes(vector<int>& nums) {
int lastNonZeroFoundAt = 0;
// If the current element is not 0, then we need to
// append it just in front of last non 0 element we found.
for (int i = 0; i < nums.size(); i++) {
if (nums[i] != 0) {
nums[lastNonZeroFoundAt++] = nums[i];
}
}
// After we have finished processing new elements,
// all the non-zero elements are already at beginning of array.
// We just need to fill remaining array with 0's.
for (int i = lastNonZeroFoundAt; i < nums.size(); i++) {
nums[i] = 0;
}
}
Your answer gives TLE (Time Limit Exceeded) error in LeetCode..I do not know what the criteria is for that to occur..However i see a lot of things in your code that are not perfect .
Pure functional programming discourages use of any mutable state and rather focuses on using val for everything.
I would try it this way --
def moveZeroes(nums: Array[Int]): Array[Int] = {
val nonZero = nums.filter(_ != 0)
val numZero = nums.length - nonZero.length
val zeros = Array.fill(numZero){0}
nonZero ++ zeros
}
P.S - This also gives TLE in Leetcode but still i guess in terms of being functional its better..Open for reviews though.

Partially working array/hitTestobject

I am trying new things with arrays and having some difficulty. I am trying to create multiple instances of 1 class and putting them into an array.
I am creating the instances like so:
public function creatingitem(e:TimerEvent)
{
amtcreated = Math.ceil(Math.random() * 4);
while (amtcreated >= 1)
{
amtcreated--;
var i:Number = Math.ceil(Math.random() * 3);
switch (i)
{
case 1 :
//Object1
objectnum = 1;
objectwei = 3;
r = new Board(objectnum,objectwei,stagw,stagh);
addChild(r);
fallingitem.push(r);
break;
case 2 :
//Object2
objectnum = 2;
objectwei = 4;
c = new Board(objectnum,objectwei,stagw,stagh);
addChild(c);
fallingitem.push(c);
break;
case 3 :
//Object3
objectnum = 3;
objectwei = 4;
l = new Board(objectnum,objectwei,stagw,stagh);
addChild(l);
fallingitem.push(l);
break;
default :
break;
}
}
}
Once these are created they check if they collide with the main ball:
public function hitcheck(e:Event)
{
for (var v:int = fallingitem.length - 1; v >= 0; v--)
{
if (ball.hitTestObject(fallingitem[v]))
{
trace(fallingitem[v]);
if (fallingitem[v] == r)
{
bonusscore += 100;
fallingitem[v].removeitem();
}
else if (fallingitem[v] == c)
{
bonusscore += 75;
fallingitem[v].removeitem();
}
else if (fallingitem[v] == l)
{
bonusscore += 75;
fallingitem[v].removeitem();
}
trace(bonusscore);
}
}
}
The issue is I am seeing every item getting hit due to the trace function. Not all instances are meeting the if conditions. As an example I could have 2 "r" instances and when I hit both 1 will go through and add to the score and the other will just continue past. The trace directly following the hitTestObject shows me that both are being hit and registered but I am not sure why it does not add score.
Thank you,
You can't really have 2 r instances. When you're creating the instances, if you happen to create 2 rs, the second r = new Board... statement overwrites the reference, and the variable r is referring to the second one. Both objects still exist, but the variable can only refer to one of them, so when you perform the check, you're ignoring the object that was previously set to r but isn't any more.
To fix this, you could turn r, c and l into Arrays and whenever you create an instance, add it to the appropriate array. Then, you would perform the check using (r.indexOf(fallingitem[v]) != -1), which returns true if the object is in the array.
The other way, based on the provided code, would be to check whatever value objectnum is setting in the constructor, since you're setting that value based on whether it's in the r, c or l category. Though, that won't work if the property is private or might be changed.

Why is this code not deterministic?

The following snippet is code from water-nsq benchmark from SPLASH 2...
if (comp_last > NMOL1)
{
for (mol = StartMol[ProcID]; mol < NMOL; mol++)
{
pthread_mutex_lock(&gl->MolLock[mol % MAXLCKS]);
for ( dir = XDIR; dir <= ZDIR; dir++) {
temp_p = VAR[mol].F[DEST][dir];
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];
temp_p[O] += PFORCES[ProcID][mol][dir][O];
temp_p[H2] += PFORCES[ProcID][mol][dir][H2];
}
pthread_mutex_unlock(&gl->MolLock[mol % MAXLCKS]);
}
comp = comp_last % NMOL;
for (mol = 0; ((mol <= comp) && (mol < StartMol[ProcID])); mol++)
{
pthread_mutex_lock(&gl->MolLock[mol % MAXLCKS]);
for ( dir = XDIR; dir <= ZDIR; dir++)
{
temp_p = VAR[mol].F[DEST][dir];
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];
temp_p[O] += PFORCES[ProcID][mol][dir][O];
temp_p[H2] += PFORCES[ProcID][mol][dir][H2];
}
pthread_mutex_unlock(&gl->MolLock[mol % MAXLCKS]);
}
}
else
{
for (mol = StartMol[ProcID]; mol <= comp_last; mol++)
{
pthread_mutex_lock(&gl->MolLock[mol % MAXLCKS]);
for ( dir = XDIR; dir <= ZDIR; dir++)
{
temp_p = VAR[mol].F[DEST][dir];
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];
temp_p[O] += PFORCES[ProcID][mol][dir][O];
temp_p[H2] += PFORCES[ProcID][mol][dir][H2];
}
pthread_mutex_unlock(&gl->MolLock[mol % MAXLCKS]);
}
}
pthread_barrier_wait(&(gl->start));
The problem is that it is not deterministic at the barrier in the end, that is, if you execute this code two times with same inputs, it gives different answers. In other words, if the lock order of mutexes is changed, the results are different.
And yes I have verified this by noting the memory pages. Also I can assure you that the change occurs in the VAR's (pointed by temp_p) memory.
I want to know why? Because apparently, all threads are putting their own values (PFORCES[ProcID]...) to the sum of temp_p and at the end, that is at the barrier, the results should be same, no matter the order in which threads acquired the locks.
[EDITED]
Also, please note that variables comp, dir and mol are all local variables of the thread and therefore not shared.
Second try.
I can't check it, but I assume that in temp_p[H1] += PFORCES[ProcID][mol][dir][H1]; you are adding doubles or floats.
For floating point types, the order of addition matters! Floating point addition is not associative!
A different thread order means a different addition order. So changes in the outcome are to be expected.
See http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems for some explanation.
I notice that you do not show the declaration of the loop variables, like mol and dir.
Could it be that the are accidently shared between threads?
If so, all kind of race conditions between e.g. one thread's mol++ and other thread's [mol % MAXLCKS] will cause problems.
UPDATE: According to the comments below, this does not seem to be the case.

Why is my loop slower when I remove code

When I remove the tests to compute minimum and maximum from the loop, the execution time is actually longer than with the test. How is that possible ?
Edit :
After running more test, it seems the runtime is not constant, ie the same code
can run in 9 sec or 13 sec.... So it was just a repetable coincidence. Repetable until you do enough tests that is...
Some details :
execution time with the min max test : 9 sec
execution time without the min max test : 13 sec
CFLAGS=-Wall -O2 -fPIC -g
gcc 4.4.3 32 bit
Section to remove is now indicated in code
Some guess :
bad cache interaction ?
void FillFullValues(void)
{
int i,j,k;
double X,Y,Z;
double p,q,r,p1,q1,r1;
double Ls,as,bs;
unsigned long t1, t2;
t1 = GET_TICK_COUNT();
MinLs = Minas = Minbs = 1000000.0;
MaxLs = Maxas = Maxbs = 0.0;
for (i=0;i<256;i++)
{
for (j=0;j<256;j++)
{
for (k=0;k<256;k++)
{
X = 0.4124*CielabValues[i] + 0.3576*CielabValues[j] + 0.1805*CielabValues[k];
Y = 0.2126*CielabValues[i] + 0.7152*CielabValues[j] + 0.0722*CielabValues[k];
Z = 0.0193*CielabValues[i] + 0.1192*CielabValues[j] + 0.9505*CielabValues[k];
p = X * InvXn;
q = Y;
r = Z * InvZn;
if (q>0.008856)
{
Ls = 116*pow(q,third)-16;
}
else
{
Ls = 903.3*q;
}
if (q<=0.008856)
{
q1 = 7.787*q+seiz;
}
else
{
q1 = pow(q,third);
}
if (p<=0.008856)
{
p1 = 7.787*p+seiz;
}
else
{
p1 = pow(p,third);
}
if (r<=0.008856)
{
r1 = 7.787*r+seiz;
}
else
{
r1 = pow(r,third);
}
as = 500*(p1-q1);
bs = 200*(q1-r1);
//
// cast on short int for reducing array size
//
FullValuesLs[i][j][k] = (char) (Ls);
FullValuesas[i][j][k] = (char) (as);
FullValuesbs[i][j][k] = (char) (bs);
//// Remove this and get slower code
if (MaxLs<Ls)
MaxLs = Ls;
if ((abs(Ls)<MinLs) && (abs(Ls)>0))
MinLs = Ls;
if (Maxas<as)
Maxas = as;
if ((abs(as)<Minas) && (abs(as)>0))
Minas = as;
if (Maxbs<bs)
Maxbs = bs;
if ((abs(bs)<Minbs) && (abs(bs)>0))
Minbs = bs;
//// End of Remove
}
}
}
TRACE(_T("LMax = %f LMin = %f\n"),(MaxLs),(MinLs));
TRACE(_T("aMax = %f aMin = %f\n"),(Maxas),(Minas));
TRACE(_T("bMax = %f bMin = %f\n"),(Maxbs),(Minbs));
t2 = GET_TICK_COUNT();
TRACE(_T("WhiteBalance init : %lu ms\n"), t2 - t1);
}
I think compiler is trying to unroll the inner loop because you are removing dependency between iterations. But somehow this doesn't help in your case. Maybe because the loop is too big and using too many registers to be unrolled.
Try to turn off unrolling and post results again.
If this is the case, I would suggest you to submit a performance issue to gcc.
PS. I think you can merge if (q>0.008856) and if (q<=0.008856).
Maybe its the cache, maybe unrolling problems, there is only one way to answer this: look at the generated code (e.g. by using the -S option). Maybe you can post it/or spot the difference when comparing them.
EDIT: As you now clarified that it was just the measurement I can only recommend (or better command ;-) you, that when you want to get runtime numbers: ALWAYS put it into some loop and average it. Best to do it outside your programm (in a shell script), so your cache is not already filled with the right data.

Resources