What does uintptr_t have to do with strict aliasing? - c

I was doing some research on strict aliasing and how to handle it and found this commit on DPDK.
To fix strict aliasing (according to the comments), they are casting the void* parameters src and dst into uintptr_t. And then using the casted versions.
In my understanding, this should do nothing with the strict aliasing rule since there is no mention of casting to uintptr_t in the rule itself.
Would a cast to uintptr_t really help strict-aliasing? Or would this just fix some possible warnings from GCC?

Would a cast to uintptr_t really help strict-aliasing?
No, it would not.
Or would this just fix some possible warnings from GCC?
"Fix" in the sense of disguising the strict-aliasing violations well enough that the compiler does not diagnose them, yes, it might. And presumably it indeed did so for whoever made that change.
This is pernicious, because now, not only may the compiler do something unwanted with the code, but you cannot even prevent it from doing so by passing it the -fno-strict-aliasing option (or whatever similar option a different compiler might provide). Worse, it might work fine with the compiler used today, but break months or years later when you upgrade to a new version or when you switch to a different C implementation.

The "strict aliasing rules" specify situations where even implementations that are not intended to be suitable for low-level programming must allow for the possibility of aliasing between seemingly-unrelated objects. Compilers which are designed to be suitable for low-level programming are allowed to, and will, extend the language by behaving meaningfully--typically processing constructs "in a documented fashion characteristic of the environment" in more circumstances than mandated by the Standard, especially in the presence of constructs that would generally be useless otherwise.
Relatively few programs that aren't intending to access storage in low-level fashion will perform integer-to-pointer conversions. Thus, implementations that treat such conversions as an indication that they should avoid making any assumptions about the pointers formed thereby will be able to usefully process a wider range of programs than those which don't, without having to give up many opportunities for genuinely-useful optimizations. While it would be better to have the Standard specify a syntax for the purpose of erasing any evidence of pointer provenance, conversions through integer types presently work for almost all compilers other than clang.

Related

Are there any examples of semantics non-preserving optimizations (except FP optimizations)?

It is considered that optimizations have semantics preservation property. However, floating-point (FP) optimizations may not preserve the semantics. Usually these FP-optimizations are the result of selection of non-strict FP models (examples: ICC, MSVC, GCC, Clang/LLVM, KEIL, etc.).
Out of curiosity, are there any examples of other semantics non-preserving optimizations?
There are, but you have to look hard to find them.
Try replacing a standard library function. If it doesn't do what the standard library function does, you may find that your code doesn't do what you expect, because the compiler assumes standard library functions do what the documentation says they do.
Also, mmap() a region at address zero. The compiler may omit code that accesses it because it assumes that code is unreachable because it dereferences a NULL pointer and thus undefined behavior. However, if that mmap() call succeeds, the behavior of dereferencing a zero (NULL is zero on most platforms) just became defined. gcc has a compiler option to tell it to stop doing that. Clang eventually caved to pressure to add it because it would otherwise miscompile the kernel. https://reviews.llvm.org/D47894#change-z5AkMbcq7h1h
Back in the 90s when the aliasing rules were just starting to become things, there were more examples, as the aliasing rules changed the definition of the language. But this is well-settled now.

Dependency of MISRA-C coding rules checker to the compiler

I have started using a tool allowing to check the compliancy to MISRA-C 2012. The tool is Helix QAC. During the configuration it requests to select one compiler. My understanding is that MISRA-C (and coding rules in general) are not linked to a compiler toolchain, since one of their objective is portability. Moreover one rule of MISRAC is to not use language extensions (obviously this rule may be disabled or there may be exceptions to it). Helix documentation or support is rather vague about this (still trying to get more info from them) and just mention the need to know the integer type length or the path of standard includes. But the rules analysis should be independant from int size and the interface of standard includes is standard so the actual files should not be needed.
What are the dependencies between a MISRA-C rules checker and the compiler ?
Some of the Guidelines depend on knowing what the implementation is doing - this is particularly the case with the implementation defined aspects, including (but not limited to integer sizes, maximum/minimum values, method of implementing boolean etc)
MISRA C even has a section 4.2 Understanding the compiler which coupled with 4.3 Understanding the static analysis tool addresses these issues.
There is one thing every MISRA-C checker needs to know and that's what type you use as bool. This is necessary since MISRA-C:2012 still supports C90 which didn't have standard support for a boolean type. (C99 applications should use _Bool/bool, period.) It also needs to know which constants that false and true correspond to, in case stdbool.h with false and true is unavailable. This could be the reason why it asks which compiler that is used. Check Appendix D - Essential types for details.
Type sizes of int etc isn't relevant for the MISRA checker to know. Though it might be nice with some awareness of non-standard extensions. We aren't allowed to use non-standard extensions or implementation-defined behavior without documenting them. The usual suspects being inline assembler, interrupts, memory allocation at specific places and so on. But once we have documented them in our deviation to Dir 1.1/Rule 1.1, we might want to disable warnings about using those specific, allowed deviations. If the MISRA checker is completely unaware of a certain feature, then how can you disable the warning caused by it?

Are there any categories to characterize warnings?

My empirical assumption of what compilers warn about in C-Code was actually that they warn the kind of behaving which is implementation defined, or in cases where they detect an construct causing undefined behavior, which they support nevertheless (if they detect and wouldn't they'd throw an error over just warning).
After I had an discussion about this the final proof that I was wrong was this:
#include <whatever_this_needs.h>
int main()
{
int i = 50;
return 0;
}
The compiler obvious warned about i was declared but never used.
I wasn't thinking about this kinds of warning anymore, since I was seeing them more as kind of a tool.... an information.
While I would strictly dissociate this kind of warning from something that warns me about causing inportability or droping significance without explicit cast, it is still something that can cause confusion by compiler optimizations.
So I'm now interested: Are there any categorizations of warning types?
If no standards about it are existing, what are the categories, GCC groups their warnings in?
What I noticed so far (empirical again):
Warnings about:
implementation- / un- defined behaving
unnecessary code (targeted for optimization)
breaking of optional standards (i.e. MISRA or POSIX)
But especially the 2nd point bothers me, since there are constructs (i.e. strict aliasing rules) where optimization can even result in unpredicted runtime behaving, while most cases it just cuts away code that isn't used anyway.
So are my points correct? And what (additional) official categories are there you can 'typecast' warnings in, what are their characteristics, and what is their impact?
Warnings are beyond the scope of the C standard, so there are no requirements or specification for how they should behave. The C standard is only concerned about diagnostics, as in diagnostic messages from the compiler to the programmer. The standard doesn't split those up in errors and warnings.
However, all compilers out there use errors to indicate direct violations of the C standard: syntax errors and similar. They use warnings to point out things beyond what is required by the C standard.
In almost every case, a warning simply means "oh by the way, you have a bug here".
Regarding GCC (see this), it just categories warnings in:
Things that are direct violations against the C standard but valid as non-standard GNU extensions (-pedantic)
"A handful of warnings" (-Wall). Enable all warnings, except some...
"A few warnings more" (-Wextra)
Plus numerous individual warnings with no category.
There's no obvious logic behind the system.
Note that GCC, being filled to the brim with non-standard extensions, have decided just to give warnings instead of errors for some C standard violations. So always compile with -pedantic-errors if you care about standard compliance.
Regarding implementation-defined behavior: C contains a lot of this, it would get very tedious if you would get a warning for every such case ("warning: two's complement int used"...). There's no relation between implementation-defined behavior and compiler warnings.
Regarding any case of undefined behavior, the compiler is often unable to detect it, since the definition of UB is runtime behavior beyond the scope of the standard. Therefore the responsibility to know about and avoid UB lies on the programmer.

Explicit arithmetic, Does the compiler take care of it?

Sometimes I find it's easier to understand code (for yourself in the future or others) by being explicit about arithmetic. E.g. writing 1+2+3 if you're adding 3 values from elsewhere, rather than a single magic number +6.
Is this inefficient or would a compiler optimize/reduce it appropriately? I'm thinking about C but in general is this something to worry about?
Yes. All competent C compilers will perform constant folding optimizations where possible, replacing constant mathematical expressions with their results. In most compilers, this type of optimization is applied even when optimizations are otherwise disabled (e.g, -O0). Here's an example.
This behavior is not restricted to C; most other compiled languages will perform this type of optimization as well. Interpreted languages typically do not, as the benefits are less dramatic there, and some of them may have semantics which may make constant folding an unsafe optimization (e.g, allowing basic operations to be overridden on builtin types).

Is C99 backward compatible with C89?

I'm used to old-style C and and have just recently started to explore c99 features. I've just one question: Will my program compile successfully if I use c99 in my program, the c99 flag with gcc and link it with prior c99 libraries?
So, should I stick to old C89 or evolve?
I believe that they are compatible in that respect. That is as long as the stuff that you are compiling against doesn't step on any of the new goodies. For instance, if the old code contains enum bool { false, true }; then you are in trouble. As a similar dinosaur, I am slowly embracing the wonderful new world of C99. After all, it has only been out there lurking for about 10 years now ;)
You should evolve. Thanks for listening :-)
Actually, I'll expand on that.
You're right that C99 has been around for quite a while. You should (in my opinion) be using that standard for anything other than legacy code (where you just fix bugs rather than add new features). It's probably not worth it for legacy code but you should (as with all business decisions) do your own cost/benefit analysis.
I'm already ensuring my new code is compatible with C1x - while I'm not using any of the new features yet, I try to make sure it won't break.
As to what code to look out for, the authors of the standards take backward compatibility very seriously. Their job was not ever to design a new language, it was to codify existing practices.
The phase they're in at the moment allows them some more latitude in updating the language but they still follow the Hippocratic oath in terms of their output: "first of all, do no harm".
Generally, if your code is broken with a new standard, the compiler is forced to tell you. So simply compiling your code base will be an excellent start. However, if you read the C99 rationale document, you'll see the phrase "quiet change" appear - this is what you need to watch out for.
These are behavioral changes in the compiler that you don't need to be informed about and may be the source of much angst and gnashing of teeth if your application starts acting strange. Don't worry about the "quiet change in c89" bits - if they were a problerm, you would have already been bitten by them.
That document, by the way, is an excellent read to understand why the actual standard says what it says.
Some C89 features are not valid C99
Arguably, those features exist only for historical reasons, and should not be used in modern C89 code, but they do exist.
The C99 N1256 standard draft foreword paragraph 5 compares C99 to older revisions, and is a good place to start searching for those incompatibilities, even though it has by far more extensions than restrictions.
Implicit int return and variable types
Mentioned by Lutz in a comment, e.g. the following are valid C89:
static i;
f() { return 1; }
but not C99, in which you have to write:
static int i;
int f() { return 1; }
This also precludes calling functions without prototypes in C99: Are prototypes required for all functions in C89, C90 or C99?
n1256 says:
remove implicit int
Return without expression for non void function
Valid C89, invalid C99:
int f() { return; }
I think in C89 it returns an implementation defined value. n1256 says:
return without expression not permitted in function that returns a value
Integer division with negative operand
C89: rounds to an implementation defined direction
C99: rounds to 0
So if your compiler rounded to -inf, and you relied on that implementation defined behavior, your compiler is now forced to break your code on C99.
https://stackoverflow.com/a/3604984/895245
n1256 says:
reliable integer division
Windows compatibility
One major practical concern is being able to compile in Windows, since Microsoft does not intend to implement C99 fully too soon.
This is for example why libgit2 limits allowed C99 features.
Respectfully: Try it and find out. :-)
Though, keep in mind that even if you need to fix a few minior compiling differences, moving up is probably worth it.
If you don't violate the explicit C99 features,a c90 code will work fine c99 flag with another prior c99 libraries.
But there are some dos based libraries in C89 like ,, that will certainly not work.
C99 is much flexible so feel free to migrate :-)
The calling conventions between C libraries hasn't changed in ages, and in fact, I'm not sure it ever has.
Operating systems at this point rely heavily on the C calling conventions since the C APIs tend to be the glue between the pieces of the OS.
So, basically the answer is "Yes, the binaries will be backwards compatible. No, naturally, code using C99 features can't later be compiled with a non-C99 compiler."
It's intended to be backwards compatible. It formalizes extensions that many vendors have already implemented. It's possible, maybe even probable, that a well written program won't have any issues when compiling with C99.
In my experience, recompiling some modules and not others to save time... wastes a lot of time. Usually there is some easily overlooked detail that needs the new compiler to make it all compatible.
There are a few parts of the C89 Standard which are ambiguously written, and depending upon how one interprets the rule about types of pointers and the objects they're accessing, the Standard may be viewed as describing one of two very different languages--one of which is semantically much more powerful and consequently usable in a wider range of fields, and one of which allows more opportunities for compiler-based optimization. The C99 Standard "clarified" the rule to make clear that it makes no effort to mandate compatibility with the former language, even though it was overwhelmingly favored in many fields; it also treats as undefined some things that were defined in C89 but only because the C89 rules weren't written precisely enough to forbid them (e.g. the use of memcpy for type punning in cases where the destination has heap duration).
C99 may thus be compatible with the language that its authors thought was described by C89, but is not compatible with the language that was processed by most C89 compilers throughout the 1990s.

Resources