Assuming no string less than 4 bytes is ever passed, is there anything wrong with this optimization? And yes it is a significant speedup on the machines I've tested it on when comparing mostly dissimilar strings.
#define STRCMP(a, b) ( (*(int32_t*)a) == (*(int32_t*)b) && strcmp(a, b) == 0)
And assuming strings are no less than 4 bytes, is there a faster way to do this without resorting to assembly, etc?
Casting the address of a char array to an int *and dereferencing it is always a strict aliasing violation in addition to possibly violating alignment restrictions.
Example
See UDP checksum calculation not working with newer version of gcc for just one example of the dangers of strict aliasing violations.
Note that C implementations themselves are free to make use of undefined behavior internally. The implementers have knowledge and complete control over the implementation, neither of which someone using someone else's compiler will in general have.
*(int32_t*)a assumes that a is 4-byte aligned. That's in general not the case.
is there anything wrong with this optimization?
Alignment
Yes, (int32_t*)a risks undefined behavior due to a not meeting int * alignment.
Inverted meaning
strcmp() returns 0 on match. STRCMP() returns 1 on match. Consider alternatives like STREQ().
Multiple and inconsistent a evaluations
Consider STRCMP(s++, t). s will get incremented 1 or 2 times.
And assuming strings are no less than 4 bytes, is there a faster way to do this without resorting to assembly, etc?
Test 1 character
Try profiling the below. Might not be faster than OP's UB code, but faster than strcmp().
//#define STRCMP(a, b) ( (*(int32_t*)a) == (*(int32_t*)b) && strcmp(a, b) == 0)
#define STREQ(a, b) ( (*(unsigned char *)a) == (*(unsigned char *)b) && strcmp(a, b) == 0)
Step back and look at the larger picture for performance improvements.
Related
Code:
unsigned char array_add[8]={0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};
...
if ((*((uint32_t*)array_add)!=0)||(*((uint32_t*)array_add+1)!=0))
{
...
}
I want to check if the array is all zero. So naturally I thought of casting the address of an array, which also happens to be the address of the first member, to an unsigned int 32 type, so I'll only need to do this twice, since it's a 64 bit, 8 byte array. Problem is, it was successfully compiled but the program crashes every time around here.
I'm running my program on an 8bit microcontroller, cortex-M0.
How wrong am I?
In theory this could work but in practice there is a thing you aren't considering: aligned memory accesses.
If a uint32_t requires aligned memory access (eg to 4 bytes), then casting an array of unsigned char which has 1 byte alignment requirement to an uint32_t* produces a pointer to an unaligned array of uint32_t.
According to documentation:
There is no support for unaligned accesses on the Cortex-M0 processor. Any attempt to perform an unaligned memory access operation results in a HardFault exception.
In practice this is just dangerous and fragile code which invokes undefined behavior in certain circumstances, as pointed out by Olaf and better explained here.
To test multiple bytes as once code could use memcmp().
How speedy this is depends more on the compiler as a optimizing compiler may simple emit code that does a quick 8 byte at once (or 2 4-byte) compare. Even the memcmp() might not be too slow on an 8-bit processor. Profiling code helps.
Take care in micro-optimizations, as they too often are not efficient use of coders` time for significant optimizations.
unsigned char array_add[8] = ...
const unsigned char array_zero[8]={0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};
if (memcmp(array_zero, array_add, 8) == 0) ...
Another method uses a union. Be careful not to assume if add.arr8[0] is the most or least significant byte.
union {
uint8_t array8[8];
uint64_t array64;
} add;
// below code will check all 8 of the add.array8[] is they are zero.
if (add.array64 == 0)
In general, focus on writing clear code and reserve such small optimizations to very select cases.
I am not sure but if your array has 8 bytes then just assign base address to a long long variable and compare it to 0. That should solve your problem of checking if the array is all 0.
Edit 1: After Olaf's comment I would say that replace long long with int64_t. However, why do you not a simple loop for iterating the array and checking. 8 chars is all you need to compare.
Edit 2: The other approach could be to OR all elements of array and then compare with 0. If all are 0 then OR will be zero. I do not know whether CMP will be fast or OR. Please refer to Cortex-M0 docs for exact CPU cycles requirement, however, I would expect CMP to be slower.
In https://github.com/numpy/numpy/issues/6428, the root cause for the bug seems to be that at simd.inc.src:543, a compiler optimizes !(tmp == 0.) to tmp != 0..
A comment says that these are "not quite the same thing." But doesn't specify any details. NaNs are mentioned further on, but a test shows that a NaN compares to 0. the expected way.
What are the cases where == and != can both return true/false?
Or the discrepancy is in another field - e.g. returning values that have the same truth value but are different as ints (but testing shows even this doesn't seem the case)?
A comment says that these are "not quite the same thing." But doesn't specify any details. NaNs are mentioned further on, but a test shows that a NaN compares to 0. the expected way.
What are the cases where == and != can both return true/false?
The standard says:
The == (equal to) and != (not equal to) operators are analogous to the relational operators except for their lower precedence. [...] For any pair of operands, exactly one of the relations is true.
(C2011, 6.5.9/3; emphasis added)
Therefore, for any expressions X and Y that are jointly allowed as operands of these operators, (X) != (Y) must evaluate to the same result as !((X) == (Y)). If they are found in practice not to do so, then the compiler that yielded that result is non-conforming in that respect. If that non-conformance is unexpected, then it constitutes a bug in the compiler.
Additionally, I observe that 6.5.9/3 applies just as much to NaNs, infinities, and subnormals as to any other operands. NaNs are special with respect to these operators for a different reason: NaNs compare unequal to all operands, including themselves (supposing IEEE semantics).
From the linked post:
charris commented on Oct 9, 2015
I'm going to guess the !(tmp == 0.) is optimized to tmp != 0., which is not quite the same thing.
Comment by the OP:
The author says it's a guess but they are quite positive that !(tmp==0.) and tmp!=0. are not equivalent and express that as if it's common knowledge
How do we reconcile these two?
Clearly, they are logically equivalent. But implementation-wise, they may not be. A compiler might implement !(a == b) as the test a == b followed by a negation. Alternately, it might optimize the expression, and directly test a != b. Resulting assembly code would be different in those two cases. The same result should (must) be achieved, but the execution time could be different.
"not quite the same thing" would simply be an acknowledgement that !(a == b) and a != b are actually different combinations of characters and the compiler might do something technically different with them, that must yield the same result. And if different results are observed, then a bug might exist in the compiler.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Best way to detect integer overflow in C/C++
This is probably a rookie question, but how can I check some overflow affected the value of my numbers in C. For example, when multiplying integers, and waiting for an integer result, if actual result was bigger than max-integer value, actual result is altered(right?). So how can I tell if something like this occured?
Signed integer overflow is like division by zero - it leads to undefined behaviour, so you have to check if it would occur before executing the potentially-overflowing operation. Once you've overflowed, all bets are off - your code could do anything.
The *_MAX and _MIN macros defined in <limits.h> come in handy for this, but you need to be careful not to invoke undefined behaviour in the tests themselves. For example, to check if a * b will overflow given int a, b;, you can use:
if ((b > 0 && a <= INT_MAX / b && a >= INT_MIN / b) ||
(b == 0) ||
(b == -1 && a >= -INT_MAX) ||
(b < -1 && a >= INT_MAX / b && a <= INT_MIN / b))
{
result = a * b;
}
else
{
/* calculation would overflow */
}
(Note that one subtle pitfall this avoids is that you can't calculate INT_MIN / -1 - such a number isn't guaranteed to be representable and indeed causes a fatal trap on common platforms).
The C99 standard has this section explaining what undefined behavior is:
3.4.3
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
EXAMPLE
An example of undefined behavior is the behavior on integer overflow.
So you're pretty much out of luck, there is no portable way of detecting that in the general case, after the fact.
Your compiler/implementation might have extensions/support for it though, and there are techniques to avoid these situations.
See this question for excellent advice: Best way to detect integer overflow in C/C++.
If you mean while you're programming, you can debug the code.
If you mean in runtime, you can add some conditionals that if it exceeds the limit, do something.
C doesn't know what to do when a calculation's yield would be out of range. You must evade this by testing operands.
Check this http://www.fefe.de/intof.html. It shows you how to check if actual result was bigger than max-integer value.
If the resulting number is smaller than one of the inputs.
a + b = c, if c < a => overflow.
edit: to fast, this is only for addition on unsigned integers.
You cannot know, in the general case, if overflow occurred just by staring at the result. What you can do, however, is to check whether the operation would overflow separately. E.g. if you want to check whether a*b overflows, where a and b are int's, you need to solve the inequality
a * b <= INT_MAX
That is, if a <= INT_MAX / b, then the multiplication would overflow.
As long as you do your arithmetic in unsigned integers, or else can rely on implementation-specific guarantees about how signed integer overflow behaves, there are various tricks you can use.
In the case of unsigned multiplication, the simplest is:
unsigned int lhs = something, rhs = something_else;
unsigned int product = lhs * rhs;
if (lhs != 0 && product/lhs != rhs) { overflow occurred }
It's unlikely to be fast, but it's portable. The unsigned overflow check for addition is also quite simple -- pick either one of the operands, then overflow occurred if and only if the sum is less than that.
In C99, equality == does not seem ever to be undefined. It can produce 1 by accident if you apply it to invalid addresses (for instance &x + 1 == &y may be true by accident). It does not produce undefined behavior. Many, but not all, invalid addresses are undefined to compute/use according to the standard, so that in p == &x with p a dangling pointer, or in &x + 2 == &y, the invalid address causes the undefined behavior, not ==.
On the other hand, >= and other comparisons are undefined when applied to pointers that do not point within the same object. That includes testing q >= NULL where q is a valid pointer. This test is the subject of my question.
I work on a static analyzer for low-level embedded code. It is normal for this kind of code to do things outside what the standard allows. As an example, an array of pointers may, in this kind of code, be initialized with memset(...,0,...), although the standard does not specify that NULL and 0 must have the same representation. In order to be useful, the analyzer must accept this kind of thing and interpret them the way the programmer expects. Warning the programmer would be perceived as a false positive.
So the analyzer is already assuming that NULL and 0 have the same representation (you are supposed to check your compiler against the analyzer to make sure they agree on this kind of assumptions). I am noticing that some programs compare valid pointers against NULL with >= (this library is an example). This works as intended as long as NULL is represented as 0 and pointer comparison is compiled as an unsigned integer comparison.
I only wish the analyzer to warn about this if, perhaps because of some agressive optimization, it may be compiled into something different from what the programmer meant on conventional platforms. Hence my question: is there any example of a program not evaluating q >= NULL as 1, on a platform where NULL is represented as 0?
NOTE: this question is not about using 0 in a pointer context to get a null pointer. The assumption about the representation of NULL is a real assumption, because there is no conversion in the memset() example.
There are definitely pointers that when you reinterpret them as a signed integer of pointer size will have negative sign.
In particular all kernel memory on Win32, and if you use "large address aware" then even 1GB of userspace since you get 3GB of userspace.
I don't know the details of c pointer arithmetic, but I suspect that these might compare as <0 in some compilers.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
An example of unspecified behavior in the C language is the order of evaluation of arguments to a function. It might be left to right or right to left, you just don't know. This would affect how foo(c++, c) or foo(++c, c) gets evaluated.
What other unspecified behavior is there that can surprise the unaware programmer?
A language lawyer question. Hmkay.
My personal top3:
violating the strict aliasing rule
violating the strict aliasing rule
violating the strict aliasing rule
:-)
Edit Here is a little example that does it wrong twice:
(assume 32 bit ints and little endian)
float funky_float_abs (float a)
{
unsigned int temp = *(unsigned int *)&a;
temp &= 0x7fffffff;
return *(float *)&temp;
}
That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.
However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).
In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.
There are three valid ways to do the same.
Use a char or void pointer during the cast. These always alias to anything, so they are safe.
float funky_float_abs (float a)
{
float temp_float = a;
// valid, because it's a char pointer. These are special.
unsigned char * temp = (unsigned char *)&temp_float;
temp[3] &= 0x7f;
return temp_float;
}
Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.
float funky_float_abs (float a)
{
int i;
float result;
memcpy (&i, &a, sizeof (int));
i &= 0x7fffffff;
memcpy (&result, &i, sizeof (int));
return result;
}
The third valid way: use unions. This is explicitly not undefined since C99:
float funky_float_abs (float a)
{
union
{
unsigned int i;
float f;
} cast_helper;
cast_helper.f = a;
cast_helper.i &= 0x7fffffff;
return cast_helper.f;
}
My personal favourite undefined behaviour is that if a non-empty source file doesn't end in a newline, behaviour is undefined.
I suspect it's true though that no compiler I will ever see has treated a source file differently according to whether or not it is newline terminated, other than to emit a warning. So it's not really something that will surprise unaware programmers, other than that they might be surprised by the warning.
So for genuine portability issues (which mostly are implementation-dependent rather than unspecified or undefined, but I think that falls into the spirit of the question):
char is not necessarily (un)signed.
int can be any size from 16 bits.
floats are not necessarily IEEE-formatted or conformant.
integer types are not necessarily two's complement, and integer arithmetic overflow causes undefined behaviour (modern hardware won't crash, but some compiler optimizations will result in behavior different from wraparound even though that's what the hardware does. For example if (x+1 < x) may be optimized as always false when x has signed type: see -fstrict-overflow option in GCC).
"/", "." and ".." in a #include have no defined meaning and can be treated differently by different compilers (this does actually vary, and if it goes wrong it will ruin your day).
Really serious ones that can be surprising even on the platform you developed on, because behaviour is only partially undefined / unspecified:
POSIX threading and the ANSI memory model. Concurrent access to memory is not as well defined as novices think. volatile doesn't do what novices think. Order of memory accesses is not as well defined as novices think. Accesses can be moved across memory barriers in certain directions. Memory cache coherency is not required.
Profiling code is not as easy as you think. If your test loop has no effect, the compiler can remove part or all of it. inline has no defined effect.
And, as I think Nils mentioned in passing:
VIOLATING THE STRICT ALIASING RULE.
My favorite is this:
// what does this do?
x = x++;
To answer some comments, it is undefined behaviour according to the standard. Seeing this, the compiler is allowed to do anything up to and including format your hard drive.
See for example this comment here. The point is not that you can see there is a possible reasonable expectation of some behaviour. Because of the C++ standard and the way the sequence points are defined, this line of code is actually undefined behaviour.
For example, if we had x = 1 before the line above, then what would the valid result be afterwards? Someone commented that it should be
x is incremented by 1
so we should see x == 2 afterwards. However this is not actually true, you will find some compilers that have x == 1 afterwards, or maybe even x == 3. You would have to look closely at the generated assembly to see why this might be, but the differences are due to the underlying problem. Essentially, I think this is because the compiler is allowed to evaluate the two assignments statements in any order it likes, so it could do the x++ first, or the x = first.
Dividing something by a pointer to something. Just won't compile for some reason... :-)
result = x/*y;
Another issue I encountered (which is defined, but definitely unexpected).
char is evil.
signed or unsigned depending on what the compiler feels
not mandated as 8 bits
I can't count the number of times I've corrected printf format specifiers to match their argument. Any mismatch is undefined behavior.
No, you must not pass an int (or long) to %x - an unsigned int is required
No, you must not pass an unsigned int to %d - an int is required
No, you must not pass a size_t to %u or %d - use %zu
No, you must not print a pointer with %d or %x - use %p and cast to a void *
I've seen a lot of relatively inexperienced programmers bitten by multi-character constants.
This:
"x"
is a string literal (which is of type char[2] and decays to char* in most contexts).
This:
'x'
is an ordinary character constant (which, for historical reasons, is of type int).
This:
'xy'
is also a perfectly legal character constant, but its value (which is still of type int) is implementation-defined. It's a nearly useless language feature that serves mostly to cause confusion.
A compiler doesn't have to tell you that you're calling a function with the wrong number of parameters/wrong parameter types if the function prototype isn't available.
The clang developers posted some great examples a while back, in a post every C programmer should read. Some interesting ones not mentioned before:
Signed integer overflow - no it's not ok to wrap a signed variable past its max.
Dereferencing a NULL Pointer - yes this is undefined, and might be ignored, see part 2 of the link.
The EE's here just discovered that a>>-2 is a bit fraught.
I nodded and told them it was not natural.
Be sure to always initialize your variables before you use them! When I had just started with C, that caused me a number of headaches.