Fastest method for checking overflow? [duplicate]

Fastest method for checking overflow? [duplicate] - c

This question already has answers here:
How do I detect unsigned integer overflow?
(31 answers)
Closed 5 years ago.
Here's my attempt. Any tips on a better solution?:
// for loop to convert 32 to 16 bits
uint32_t i;
int32_t * samps32 = (int32_t *)&(inIQbuffer[0]);
int16_t * samps16 = (int16_t *)&(outIQbuffer[0]);
for( i = 0; i < ( num_samples * 2/* because each sample is two int32 s*/ ); i++ ) {
overflowCount += ( abs(samps32[i]) & 0xFFFF8000 ) ? 1 : 0;
samps16[i] = (int16_t)samps32[i];
}
// Only report error every 4096 accumulated overflows
if( ( overflowCount & 0x1FFF ) > 4096 ) {
printf( "ERROR: Overflow has occured while scaling from 32 "
"bit to 16 bit samples %d times",
overflowCount );
}
Here's the part that actually checks for overflow:
overflowCount += ( abs(samps32[i]) & 0xFFFF8000 ) ? 1 : 0;

I personally prefer to use the SafeInt class to do my overflow checking. It reduces the need for tedious error checking and turns it into an easy to process, yet difficult to ignore exception.
http://blogs.msdn.com/david_leblanc/archive/2008/09/30/safeint-3-on-codeplex.aspx

What you already do, is closest to the fastests possible for a single cast. you can however omit some code
overflowCount += ( abs(samps32[i]) & 0xFFFF8000 ) ? 1 : 0;
can be changed into:
if (samps32[i] & 0xFFFF8000) overflowCount++;
or even simpler
if (samps32[i] >> 15) overflowCount++;
both of these will be equally fast, and both will be faster than yours.
If you are actually interrested in the count of overflows, you might consider processing the array of integers with SIMD operations.

It seems that you are checking for the overflow of a 16-bit addition. You can avoid branch in the assembler code by just having
overflowCount += (samps32[i] & 0x8000) >> 15;
This generates three ALU operations but no branch in the code. It may or may not be faster than a branching version.

Bit ops would be my choice, too. the only faster way I can imagine at the moment is to use inline assembly where you load the source operand, make a copy onboard the chip, truncate, and bitwise compare (that was pseudo pseudo code).
Your code has an issue: It violates aliasing rules. You could use something like this instead:
union conv_t {
int32_t i32;
int16_t i16;
};
Then you could ensure that IQBuffer is of that type. Finally, you could run:
for( i = 0; i < (num_samples * 2); i++ ) {
<test goes here>
samps [i].i16 = static_cast<int16_t>(samps [i].i32);
}
edit: As per your edit (https://stackoverflow.com/revisions/677427/list) you drove nearly my whole post invalid. Thanks for not mentioning your edit in your question.

Related

Don't fully understand custom-written 'memcpy' function in C

So I was browsing the Quake engine source code earlier today and stumbled upon some written utility functions. One of them was 'Q_memcpy':
void Q_memcpy (void *dest, void *src, int count)
{
int i;
if (( ( (long)dest | (long)src | count) & 3) == 0 )
{
count>>=2;
for (i=0 ; i<count ; i++)
((int *)dest)[i] = ((int *)src)[i];
}
else
for (i=0 ; i<count ; i++)
((byte *)dest)[i] = ((byte *)src)[i];
}
I understand the whole premise of the function but I don't quite understand the reason for the bitwise OR between the source and destination address. So the sum of my questions are as follows:
Why does 'count' get used in the same bitwise arithmetic?
Why is that result's last two bits checked if they are differing?
What purpose does this whole check serve?
I'm sure it's something obvious but please excuse my ignorance because I haven't really delved into the more low level side of things when it comes to programming. I just find it interesting and want to learn more.

It is finding out whether the source and destination pointers are int aligned, and whether the count is an exact int size of bytes.
If those three things are all true, the l.s. 2 bits of them all will be 0 (assuming pointers and int are 4 bytes). So the algorithm ORs the three values, and isolates the l.s. 2 bits.
In this case, it copies int by int. Otherwise it copies char by char.
If the test fails, a more sophisticated algorithm would copy some of the leading and trailing bytes char by char and the intermediate bytes int by int.

The bitwise ORing and ANding with 3 is to check whether the source, destination and count are divisible by 4. If they are, the operation can work with 4-byte words, while this code is assuming int as 4 bytes. Otherwise the operation is performed bytewise.

It first tests if all 3 arguments are divisible by 4. If - and only if - they all are, it proceeds with copying 4 bytes at a time.
I.e. this undecoded would be
if ((long) src % 4 == 0 && (long) dst % 4 == 0 && count % 4 == 0 )
{
count = count / 4;
for (i = 0; i < count; i++)
((int *)dest)[i] = ((int *)src)[i];
}
I am not sure if they tested their compiler and it generated bad code for even a test, and therefore they decided to write it in such a convoluted way. In any case, the x | y | z will guarantee that a bit n is set in the result if it is set in any of x, y or z. Therefore if the (x | y | z) & 3 results in 0, none of the numbers had either of the 2 lowest bits set, and therefore are divisible by 4.
Of course it would be rather silly to use now - the standard library memcpy in recent library implementations is almost certainly better than this.
Therefore, on recent compilers you can optimize all calls to Q_memcpy by switching them to memcpy. GCC could generate things like 64-bit or SIMD moves with memcpy depending on the size of area to be copied.

Received remark as comparison between signed and unsigned operands [duplicate]

This question already has answers here:
Comparison operation on unsigned and signed integers
(7 answers)
Closed 5 years ago.
I have a "C"code snippet as below
int32_t A = 5;
uint32_t B = 8;
if ( A >= B )
{
printf("Test");
}
When i build this i received an remark/warning as "comparison between signed and unsigned operands.Can any one address this issue?

Everything is ok while A is positive and B is less than 2^31.
But, if A is less than 0, then unexpected behavior occurs.
A = -1, in memory it will be saved as 0xFFFFFFFF.
B = 5, in memory it will be saved as 0x00000005.
When you do
if (A < B) {
//Something, you are expecting to be here
}
Compiler will compare them as unsigned 32-bit integer and your if will be expanded to:
if (0xFFFFFFFF < 0x00000005) {
//Do something, it will fail.
}
Compiler warns you about this possible problem.
Comparison operation on unsigned and signed integers

Good, very good! You are reading and paying attention to your compiler warnings.
In your code:
int32_t A = 5;
uint32_t B = 8;
if ( A >= B )
{
printf("Test");
}
You have 'A' as a signed int32_t value with min/max values of -2147483648/2147483647 and you have and unsigned uint32_t with min/max of 0/4294967295, respectively. The compiler generates the warning to guard against cases that are always true or false based on the types involved. Here A can never be greater than B for any values in the allowable range of B from 2147483648 - 4294967295. That whole swath of numbers will provide False regardless of the individual values involved.
Another great example would be if ( A < B ) which produces a TRUE for all values of A from -2147483648 - -1 because the unsigned type can never be less than zero.
The compiler warnings are there to warn that testing with these types may not provide valid comparisons for certain ranges of numbers -- that you might not have anticipated.
In the real world, if you know A is only holding values from 0 - 900, then you can simply tell the compiler that 1) you understand the warning and by your cast will 2) guarantee the values will provide valid tests, e.g.
int32_t A = 5;
uint32_t B = 8;
if (A >= 0 ) {
if ( (uint32_t)A >= B )
printf("Test");
}
else
/* handle error */
If you cannot make the guarantees for 1) & 2), then it is time to go rewrite the code in a way you are not faced with the warning.
Two good things happened here. You had compiler warnings enabled, and you took the time to read and understand what the compiler was telling you. This will come up time and time again. Now you know how to approach a determination of what can/should be done.

does this condition suffice for overflow check in multiplication [duplicate]

This question already has answers here:
Catch and compute overflow during multiplication of two large integers
(14 answers)
Closed 9 years ago.
int isOverflow(uint a, uint b) {
// a and b are unsigned non-zero integers.
uint c = a * b;
if (c < ( a > b ? a : b))
return 1;
else
return 0;
}
Am I missing something ? I think the above snippet will work.
EDIT : I have seen other solutions like multiplication of large numbers, how to catch overflow which uses some fancy methods to check it. But to me above simple solution also looks correct. Thats why I am asking this question.

It's easy to prove this is wrong by finding an exception:
Consider these two 8-bit unsigned values: a = 0x1F and b = 0xF.
c = a * b
c = 0x1F * 0xF
c = 0xD1 (Overflow! The real answer is 0x1D1)
c < ( a > b ? a : b)
0xD1 < 0x1F => False (Wrong!)
A correct answer is here.

CERT has a great document INT30-C. Ensure that unsigned integer operations do not wrap which covers all the cases of unsigned integer overflow and check they advocate for multiplications requires that you test before you perform the multiplication to prevent the overflow before it occurs (I modified the example to fit your questions):
if (a > SIZE_MAX / b) {
/* Handle error condition */
}
c = a * b;
This is a straight forward solution to your problem, it has been solved and you should use the solutions that have been proven to work, coming up with your own solutions can be error prone.

Best method to find out set bit positions in a bit mask in C [duplicate]

This question already has answers here:
Count the number of set bits in a 32-bit integer
(65 answers)
Closed 9 years ago.
What would be the best way to identify all the set bit positions in a 64 bit bitmask. Suppose my bit mask is 0xDeadBeefDeadBeef, then what is the best way, to identify all the bit positions of the set bits in it.
long long bit_mask = 0xdeadbeefdeadbeef;
unsigned int bit_pos=0;
while(mask) {
if((mask&1)==1) {
printf("Set bit position is:%d \n",bit_pos};
}
bit_pos++;
mask>>=1;
}
One way is to loop through it, and check if a bit is set or not, if it is set, Return the count position and continue looping until the MSB, so for 64 bits, I would iterate until I have all the set bits traversed or all 64 bits traversed, if MSB is set, but there must be a better way of doing it?

Algorithm from Hacker's Delight (book):
int count_bits(long long s)
{
s = (s&0x5555555555555555L) + ((s>>1)&0x5555555555555555L);
s = (s&0x3333333333333333L) + ((s>>2)&0x3333333333333333L);
s = (s&0x0F0F0F0F0F0F0F0FL) + ((s>>4)&0x0F0F0F0F0F0F0F0FL);
s = (s&0x00FF00FF00FF00FFL) + ((s>>8)&0x00FF00FF00FF00FFL);
s = (s&0x0000FFFF0000FFFFL) + ((s>>16)&0x0000FFFF0000FFFFL);
s = (s&0x00000000FFFFFFFFL) + ((s>>32)&0x00000000FFFFFFFFL);
return (int)s;
}

Besides already explained nice bit twiddling hacks, there are other options.
This assumes that you have x86(64), SSE4, gcc and compile with -msse4 switch you can use:
int CountOnesSSE4(unsigned int x)
{
return __builtin_popcount(x);
}
This will compile into single popcnt instruction. If you need fast code you can actually check for SSE at runtime and use best function available.
If you expect number to have small number of ones, this could also be fast (and is always faster than the usual shift and compare loop):
int CountOnes(unsigned int x)
{
int cnt = 0;
while (x) {
x >>= ffs(x);
cnt++;
}
return cnt;
}
On x86 (even without SSE) ffs will compile into single instruction (bsf), and number of loops will depend on number of ones.

You might do this:
long long bit_mask = 0xdeadbeefdeadbeef;
int i;
for (i = 0; i < (sizeof(long long) * 8); i++) {
int res = bit_mask & 1;
printf ("Pos %i is %i\n", i, res);
bit_mask >>= 1;
}

It depends if you want clarity in your code or a very fast result. I almost always choose clarity in the code unless profiling tells me otherwise. For clarity, you might do something like:
int count_bits(long long value) {
int n = 0;
while(value) {
n += (value & 1);
value >>= 1;
}
return n;
}
For performance you might want to call count_bits from X J's answer.
int count_bits(long long s)
{
s = (s&0x5555555555555555L) + ((s>>1)&0x5555555555555555L);
s = (s&0x3333333333333333L) + ((s>>2)&0x3333333333333333L);
s = (s&0x0F0F0F0F0F0F0F0FL) + ((s>>4)&0x0F0F0F0F0F0F0F0FL);
s = (s&0x00FF00FF00FF00FFL) + ((s>>8)&0x00FF00FF00FF00FFL);
s = (s&0x0000FFFF0000FFFFL) + ((s>>16)&0x0000FFFF0000FFFFL);
s = (s&0x00000000FFFFFFFFL) + ((s>>32)&0x00000000FFFFFFFFL);
return (int)s;
}
It depends if you want to look through your code and say to yourself, "yeah, that makes sense" or "I'll take that guy's word it".
I've been called out on this before in stack overflow. Some people do not agree. Some very smart people choose complexity over simplicity. I believe clean code is simple code.
If performance calls for it, use complexity. If not, don't.
Also, consider a code review. What are you going to say when someone says "How does count_bits work?"

If you are counting the ones you can use the hackers delight solution which is fast, but a lookup table can be (isnt always) faster. And much more understandable. You could pre-prepare a table for example 256 items deep that represent the counts for the byte values 0x00 to 0xFF
0, //0x00
1, //0x01
1, //0x02
2, //0x03
1, //0x04
2, //0x05
2, //0x06
3, //0x07
...
The code to build that table would likely use the slow step through every bit approach.
Once built though you can break your larger number into bytes
count = table8[number&0xFF]; number>>=8;
count += table8[number&0xFF]; number>>=8;
count += table8[number&0xFF]; number>>=8;
count += table8[number&0xFF]; number>>=8;
...
if you have more memory you can make the table even bigger by representing wider numbers, a 65536 deep table for the numbers 0x0000 to 0xFFFF.
count = table16[number&0xFFFF]; number>>16;
count += table16[number&0xFFFF]; number>>16;
count += table16[number&0xFFFF]; number>>16;
count += table16[number&0xFFFF]; number>>16;
Tables are a general way to make things like this faster at the expense of memory consumption. The more memory you are able to consume the more you can pre-compute (at or before compile time) rather than real-time compute.

Write a C function that round up a number to next power of 2

I got the following question in an interview: "Write a C function that round up a number to next power of 2."
I wrote the following answer:
#include <stdio.h>
int next_pwr_of_2(int num)
{
int tmp;
do
{
num++;
tmp=num-1;
}
while (tmp & num != 0);
return num;
}
void main()
{
int num=9;
int next_pwr;
next_pwr=next_pwr_of_2(num);
printf(" %d \n",next_pwr);
}
The question is: why does the program go out of its do-while loop when getting to the values 11 and 10?

Precedence my friend, precedence.
while ((tmp & num) != 0);
Will fix it. ( note the parenthesis around the expression tmp & num)
!= has higher precedence than &, so num != 0 is evaluated before tmp & num.
If you skip the parenthesis, the expression that is evaluated is : tmp & (num != 0)
First time round, tmp = 9 (1001) and num != 0 is 1 (0001) so & evaluates to 1 (true), and the loop continues.
Now at the end of second iteration, we have, tmp = 10 (1010). num != 0 is again 0001, so 1010 & 0001 evaluates to 0, hence the loop breaks.
Here is the table for reference.
The precedence order is quite unusual, as noted here. Happens all the time :).
Of course you don't have to remember any precedence order, which is just to help the compiler in deciding what is done first if the programmer does not make it clear. You can just correctly parenthesize the expression and avoid such situations.

The loop exits because you did not put parentheses around your condition. This should teach you not to put the unnecessary != 0 in your C/C++ conditions.
You can simplify your code quite a bit, though.
First, observe that temp equals the prior value of num, so you can change your loop to
int tmp;
do {
tmp = mum++;
} while (tmp & num); // Don't put unnecessary "!= 0"
Second, the interviewer was probably looking to see if you are familiar with this little trick:
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;
Unlike your code that may take up to 1,000,000,000 operations to complete, the above always completes after twelve operations (a decrement, an increment, five shifts, and five ORs).

Such questions always deserve counter questions to clarify requirements, if only to demonstrate your thinking and analytical skills and even creativity - that is what the interview should be about.
For example in the absence of any specification that the "number" in question is necessarily an integer, you might propose the following:
int nextPow2( double x )
{
return (int)pow( 2, ceil(log10(x) / log10(2))) ;
}
But if you did you might also express concern about the applicability of such a solution to an embedded system with possibly no floating-point unit.

I would answer by saying no one should write that in pure C. Especially in an embedded environment. If the chipset does not provide a feature to count the number of leading zeros in a word, then it's probably pretty old, and certainly not something you want to be using. If it does, you would want to use that feature.
As an example of a non-standard way to round an unsigned integer up to a power of two (you really need to clarify the type of the argument, as "number" is ambiguous) using gcc, you could do:
unsigned
round_up( unsigned x )
{
if( x < 2 ) {
return 1U;
} else {
return 1U << ( CHAR_BIT * sizeof x - __builtin_clz( x - 1 ));
}
}

In contrast to what others have said, the bit-twiddling trick actually can be used on any number of bits portably. Just change it a bit:
unsigned int shift = 1;
for (v--; shift < 8 * sizeof v; shift <<= 1)
{
v |= v >> shift;
}
return ++v;
I believe any compiler will optimize the loop away, so it should be the same performence-wise (plus I think it looks better).

Yet another variant.
int rndup (int num)
{
int tmp=1;
while (tmp<num)
{
tmp*=2;
}
return tmp;
}

Another variant using while loop and bit-wise operator
int next_pwr_of_2(unsigned &num)
{
unsigned int number = 1;
while (number < num)
{
number<<=1;
}
return number;
};

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Fastest method for checking overflow? [duplicate] - c

I personally prefer to use the SafeInt class to do my overflow checking. It reduces the need for tedious error checking and turns it into an easy to process, yet difficult to ignore exception. http://blogs.msdn.com/david_leblanc/archive/2008/09/30/safeint-3-on-codeplex.aspx

It seems that you are checking for the overflow of a 16-bit addition. You can avoid branch in the assembler code by just having overflowCount += (samps32[i] & 0x8000) >> 15; This generates three ALU operations but no branch in the code. It may or may not be faster than a branching version.

Related

Don't fully understand custom-written 'memcpy' function in C

Received remark as comparison between signed and unsigned operands [duplicate]

does this condition suffice for overflow check in multiplication [duplicate]

Best method to find out set bit positions in a bit mask in C [duplicate]

Write a C function that round up a number to next power of 2

Categories

Resources