Should we disable compiler optimization until the program is bug free? - c

Sometimes compiler optimization hides errors; for exemple:
double val = sin(1.5);
If the compiler optimization is enabled although the math library is not linked this will compile since the compiler will calculate the value of sin(1.5) and replace it.
Is it a good practice to disable compiler optimization until the program is bug free?

Since nobody wrote it up as an answer, I take a shot.
Is it a good practice to disable compiler optimization until the program is bug free?
I wouldn't recommend it. Instead, I would regularly test my code in release mode as well (with optimizations enabled). I have had some bugs personally and I have seen many cases where the code worked beautifully in debug mode but crashed or produced weird things in release mode. (Some of the latter bugs were stack corruption related.)
The sooner you realize that you have such a bug, the better. You will probably have an easier time finding it while your memories of the code are still fresh.
Another thing I have seen is bugs due to side effects in code snippets that only run in debug mode. These are clearly the developer's mistake but the sooner you notice it, the more likely that you will have an easier time fixing it.
Developing in release mode unless I need the debugger seems a little over the top to me. In debug mode you may get many useful checks in your third party libraries which in turn reduces your time spent on debugging.
In short: develop in debug mode but test regularly in release mode.

I always use the following argument: All code that is deployed, is optimized. So there is little point in hampering down the development process by unnecessarily using unoptimized code. Especially, it does not make sense to do any performance evaluation with unoptimized code. The only reason for switching off the optimizer is to be able to follow the program in a debugger, nothing else. If switching off optimization breaks the build, that's a nuisance, nothing more.
Much worse is the opposite effect where a program suddenly fails due to optimization. That is the effect you need to safeguard against, because that is the effect your users will get angry about. And due to the combination of clever optimizers and undefined behaviour in the language definition, this effect can happen quite easily.
So I try to do all my testing at least with -O2, and switch to -O0 only when I need to use the debugger.

Related

How do you debug the bug that only appears when the load is huge?

We are currently developing a cluster manager software in C. If several nodes connect to the manager, it works perfect, but if we use some tools to simulate 1000 nodes to connect the manager, it will sometimes work in unexpected ways.
How can one debug this kind of bug? It only appears when the load(connection/nodes) is large?
If I use gdb to debug step by step, the app never malfunctions.
How to debug this kind of bug?
In general, you want to use at least these techniques:
Make sure the code compiles and links without warnings. The -Wall is a good start, but -Wextra is better.
Make sure the application has designed-in logging and tracing, which can be turned on or off, and which has sufficient details to debug these kinds of issues, and low overhead.
Make sure the code has good unit-test coverage.
Make sure the tests are sanitizer-clean.
there's also no warning in valgrind check.
It's not clear whether you've simply ran the target application under Valgrind, or whether you also have the unit tests, and the tests are Valgrind-clean. It's also not clear whether you've observed the application mis-behavior under Valgrind or not.
Valgrind used to be the best tool available for heap and unintialized memory problems, but in 2017 this is no longer the case.
Compiler-based Address, Thread and Memory sanitizers catch significantly wider class of errors (e.g. global and stack overflows, and data races), and you should run your unit tests under all of them.
When all of the above still fails to find the problem, you may be able to run the real application instrumented with sanitizers.
Lastly, there are tools like GDB tracing and systemtap -- they are harder to learn, but give you significant power. Overview here.
Sadly the debugger is less useful for debugging concurrency/load issues.
Keep adding logs/printfs, trigger the issue with load testing, then try to narrow it down with more logs/printfs. Repeat.
The faster it is to trigger the bug the faster this will converge. Also prefer the classic "bisection" / "binary search" technique when adding logs - try to narrow down the areas you're looking at by at least half every time.

Is using an outdated C compiler a security risk?

We have some build systems in production which no one cares about and these machines run ancient versions of GCC like GCC 3 or GCC 2.
And I can't persuade the management to upgrade it to a more recent: they say, "if ain't broke, don't fix it".
Since we maintain a very old code base (written in the 80s), this C89 code compiles just fine on these compilers.
But I'm not sure it is good idea to use these old stuff.
My question is:
Can using an old C compiler compromise the security of the compiled program?
UPDATE:
The same code is built by Visual Studio 2008 for Windows targets, and MSVC doesn't support C99 or C11 yet (I don't know if newer MSVC does), and I can build it on my Linux box using the latest GCC. So if we would just drop in a newer GCC it would probably build just as fine as before.
Actually I would argue the opposite.
There are a number of cases where behaviour is undefined by the C standard but where it is obvious what would happen with a "dumb compiler" on a given platform. Cases like allowing a signed integer to overflow or accessing the same memory though variables of two different types.
Recent versions of gcc (and clang) have started treating these cases as optimisation opportunities not caring if they change how the binary behaves in the "undefined behaviour" condition. This is very bad if your codebase was written by people who treated C like a "portable assembler". As time went on the optimisers have started looking at larger and larger chunks of code when doing these optimisations increasing the chance the binary will end up doing something other than "what a binary built by a dumb compiler" would do.
There are compiler switches to restore "traditional" behaviour (-fwrapv and -fno-strict-aliasing for the two I mentioned above) , but first you have to know about them.
While in principle a compiler bug could turn compliant code into a security hole I would consider the risk of this to be negligable in the grand scheme of things.
There are risks in both courses of action.
Older compilers have the advantage of maturity, and whatever was broken in them has probably (but there's no guarantee) been worked around successfully.
In this case, a new compiler is a potential source of new bugs.
On the other hand, newer compilers come with additional tooling:
GCC and Clang both now feature sanitizers which can instrument the runtime to detect undefined behaviors of various sorts (Chandler Carruth, of the Google Compiler team, claimed last year that he expects them to have reached full coverage)
Clang, at least, features hardening, for example Control Flow Integrity is about detecting hi-jacks of control flow, there are also hardening implements to protect against stack smashing attacks (by separating the control-flow part of the stack from the data part); hardening features are generally low overhead (< 1% CPU overhead)
Clang/LLVM is also working on libFuzzer, a tool to create instrumented fuzzing unit-tests that explore the input space of the function under test smartly (by tweaking the input to take not-as-yet explored execution paths)
Instrumenting your binary with the sanitizers (Address Sanitizer, Memory Sanitizer or Undefined Behavior Sanitizer) and then fuzzing it (using American Fuzzy Lop for example) has uncovered vulnerabilities in a number of high-profile softwares, see for example this LWN.net article.
Those new tools, and all future tools, are inaccessible to you unless you upgrade your compiler.
By staying on an underpowered compiler, you are putting your head in the sand and crossing fingers that no vulnerability is found. If your product is a high-value target, I urge you to reconsider.
Note: even if you do NOT upgrade the production compiler, you might want to use a new compiler to check for vulnerability anyway; do be aware that since those are different compilers, the guarantees are lessened though.
Your compiled code contains bugs that could be exploited. The bugs come from three sources: Bugs in your source code, bugs in the compiler and libraries, and undefined behaviour in your source code that the compiler turns into a bug. (Undefined behaviour is a bug, but not a bug in the compiled code yet. As an example, i = i++; in C or C++ is a bug, but in your compiled code it may increase i by 1 and be Ok, or set i to some junk and be a bug).
The rate of bugs in your compiled code is presumably low due to testing and to fixing bugs due to customer bug reports. So there may have been a large number of bugs initially, but that has gone down.
If you upgrade to a newer compiler, you may lose bugs that were introduced by compiler bugs. But these bugs would all be bugs that to your knowledge nobody found and nobody exploited. But the new compiler may have bugs on its own, and importantly newer compilers have a stronger tendency to turn undefined behaviour into bugs in the compiled code.
So you will have a whole lot of new bugs in your compiled code; all bugs that hackers could find and exploit. And unless you do a whole lot of testing, and leave your code with customers to find bugs for a long time, it will be less secure.
If it aint broke, don't fix it
Your boss sounds right in saying this, however, the more important factor, is safeguarding of inputs, outputs, buffer overflows. Lack of those is invariably the weakest link in the chain from that standpoint regardless of the compiler used.
However, if the code base is ancient, and work was put in place to mitigate the weaknesses of the K&R C used, such as lacking of type safety, insecure fgets, etc, weigh up the question "Would upgrading the compiler to more modern C99/C11 standards break everything?"
Provided, that there's a clear path to migrate to the newer C standards, which could induce side effects, might be best to attempt a fork of the old codebase, assess it and put in extra type checks, sanity checks, and determine if upgrading to the newer compiler has any effect on input/output datasets.
Then you can show it to your boss, "Here's the updated code base, refactored, more in line with industry accepted C99/C11 standards...".
That's the gamble that would have to be weighed up on, very carefully, resistence to change might show there in that environment and may refuse to touch the newer stuff.
EDIT
Just sat back for a few minutes, realized this much, K&R generated code could be running on a 16bit platform, chances are, upgrading to more modern compiler could actually break the code base, am thinking in terms of architecture, 32bit code would be generated, this could have funny side effects on the structures used for input/output datasets, that is another huge factor to weigh up carefully.
Also, since OP has mentioned using Visual Studio 2008 to build the codebase, using gcc could induce bringing into the environment either MinGW or Cygwin, that could have an impact change on the environment, unless, the target is for Linux, then it would be worth a shot, may have to include additional switches to the compiler to minimize noise on old K&R code base, the other important thing is to carry out a lot of testing to ensure no functionality is broken, may turn out to be a painful exercise.
There is a security risk where a malicious developer can sneak a back-door through a compiler bug. Depending on the quantity of known bugs in the compiler in use, the backdoor may look more or less inconspicuous (in any case, the point is that the code is correct, even if convoluted, at the source level. Source code reviews and tests using a non-buggy compiler will not find the backdoor, because the backdoor does not exist in these conditions). For extra deniability points, the malicious developer may also look for previously-unknown compiler bugs on their own. Again, the quality of the camouflage will depend on the choice of compiler bugs found.
This attack is illustrated on the program sudo in this article. bcrypt wrote a great follow-up for Javascript minifiers.
Apart from this concern, the evolution of C compilers has been to exploit undefined behavior more and more and more aggressively, so old C code that was written in good faith would actually be more secure compiled with a C compiler from the time, or compiled at -O0 (but some new program-breaking UB-exploiting optimizations are introduced in new versions of compilers even at -O0).
Can using an old C compiler compromise the security of the compiled program?
Of course it can, if the old compiler contains known bugs that you know would affect your program.
The question is, does it? To know for sure, you would have to read the whole change log from your version to present date and check every single bug fixed over the years.
If you find no evidence of compiler bugs that would affect your program, updating GCC just for the sake of it seems a bit paranoid. You would have to keep in mind that newer versions might contain new bugs, that are not yet discovered. Lots of changes were made recently with GCC 5 and C11 support.
That being said, code written in the 80s is most likely already filled to the brim with security holes and reliance on poorly-defined behavior, no matter the compiler. We're talking of pre-standard C here.
Older compilers may not have protection against known hacking attacks. Stack smashing protection, for example, was not introduced until GCC 4.1. So yeah, code compiled with older compilers may be vulnerable in ways that newer compilers protect against.
Another aspect to worry about is the development of new code.
Older compilers may have different behavior for some language features than what is standardized and expected by the programmer. This mismatch can slow development and introduce subtle bugs that can be exploited.
Older compilers offer fewer features (including language features!) and don't optimize as well. Programmers will hack their way around these deficiencies — e.g. by reimplementing missing features, or writing clever code that is obscure but runs faster — creating new opportunities for the creation of subtle bugs.
Nope
The reason is simple, old compiler may have old bugs and exploits, but the new compiler will have new bugs and exploits.
Your not "fixing" any bugs by upgrading to a new compiler. Your switching old bugs and exploits for new bugs and exploits.
Well there is a higher probability that any bugs in the old compiler are well known and documented as opposed to using a new compiler so actions can be taken to avoid those bugs by coding around them. So in a way that is not enough as argument for upgrading. We have the same discussions where I work, we use GCC 4.6.1 on a code base for embedded software and there is a great reluctance (among management) to upgrade to the latest compiler because of fear for new, undocumented bugs.
Your question falls into two parts:
Explicit: “Is there a greater risk in using the older compiler” (more or less as in your title)
Implicit: “How can I persuade management to upgrade”
Perhaps you can answer both by finding an exploitable flaw in your existing code base and showing that a newer compiler would have detected it. Of course your management may say “you found that with the old compiler”, but you can point out that it cost considerable effort. Or you run it through the new compiler to find the vulnerability, then exploit it, if your are able/allowed to compile the code with the new compiler. You may want help from a friendly hacker, but that depends on trusting them and being able/allowed to show them the code (and use the new compiler).
But if your system is not exposed to hackers, you should perhaps be more interested in whether a compiler upgrade would increase your effectiveness: MSVS 2013 Code Analysis quite often finds potential bugs much sooner than MSVS 2010, and it more or less supports C99/C11 – not sure if it does officially, but declarations can follow statements and you can declare variables in for-loops.

I turned on compiler optimization and my multithreaded C program imploded aggressively, any articles I can read on this?

I'm using MinGW, which is gcc for Windows. My program involves multiple windows, two different main threads, and several worker threads in a thread pool for overlapped network I/O.
It works perfectly fine without compiler optimization.
A) Is compiler optimization even necessary? My program's already very fast. Is it at all likely that it will provide a significant improvement?
B) Are there any articles on how to properly build a multthreaded program so compiler optimization can do its job?
“Imploded aggressively” is a bit weird (is your program a controller for a fission bomb?), but I understand that your program behaved as desired without compiler optimizations and mysteriously with compiler optimizations.
The technical term for this is that your program is buggy.
Multithreaded programming is intrinsically hard. Multithreaded programming when the threads share memory is very hard; it's the masochist way of concurrent programming (message passing is a lot easier to get right). You don't just need to read an article or two, you need to read several books and get a few years' programming experience.
You were unlucky that your program seemed to work without optimizations. It probably wouldn't work on a different machine where the timings are a bit different, or with a different compiler, or on a different operating system, either. So you ended up wasting your time thinking your program worked. But it doesn't. A compiler transforms correct source code into correct executables, no matter what optimization level you choose.¹
¹ Barring compiler bugs, sure. But the odds are very strongly stacked against you.
99.9% of all household failures in one optimization mode and not another are due to serious bugs. Multithreading races etc. are very sensitive to code performance. An instruction reorder or loop shortcut can turn a test pass into a debugging nightmare.
I'm assuming that the server runs up OK and detonates under load in aparrently different places, so making conventional debugging useless?
You are going to have to rely on logging and changing the test conditions to narrow down the point of ignition. My guess is this is going to be a Heisenbug that mutates with changes to the code, optimization, options, load profile, buffer sizes etc.
Not fixing the problem is not a good plan since it wil just show up in another form on next years boxes with more cores etc. Even with optimization off, it's still there, lurking, waiting for the opportunity to strike.
I hope I'm providing some comfort.
Seriously - log everything you can with a good logger - one that queues up the logs so as to keep disk latency out of the main app. Change things around to try and make the bug mutate and perhaps show up in the non-optimized build too. Write down, (type in), absolutely everything that you do amd what happens after any change, good or bad. Making the bug worse is actually better than making its symptoms go away, (without knowing exactly why). Try the server on various hardware configs, if you can.
Eventually, you will find the bug!
You have one thing going for you - it seems that you can reliably reproduce the problem. That, in itself, is a massive plus.
Forgot to ask - apart from the nuclear explosive metaphor, what is the main symptom? Is it AV'ing/segfaulting all over the place, or is it locked or livelocked up?
To answer part "A" of your question, the unoptimized version of your code still has the concurrency bugs in it, but the timing of how the threads run is such that the bugs have not yet been exposed with your test workloads. The current version of the unoptimized program will eventually fail in use, so you will need to fix the concurrency bugs before using the program for real work.

What can you gain from looking at the binary opposed to source in c?

My friend said he thinks i may have made a mistake in my programme and wanted to see if i really did. He asked me to send him the binary opposed to the source. As i am new to this i am paranoid that he is doing someting to it? What can you do with the binary that would mean you wouldnt want the source?
thank
Black-box testing. Having the source may skew your view on how the program may be behaving.
Not much, at least not much by staring at it. But you can run it with a debugger attached, so you can set breakpoints, inspect memory areas, investigate crashes ...
However, the soure code remains the primary tool for debugging. The binary by itself is a bit useless for serious debugging (not for testing, you can greatly test software without having access to its source).
I guess if he wants to recompile your code on his machine he may want to be able to check that the binary he gets is the same as the one you get to eliminate compile options or library differences.
Now when I debug, I frequently want to see the assembly - maybe this is what he meant?
He can run it, test it, report any bugs he finds. Not much else, but that itself may be useful; people are notoriously bad at testing their own code, because they tend to believe that it is robust, and don't want to break it. An independent tester will see breaking it as a challenge. Their performance is based on the number of bugs they find; whereas your performance is based on how difficult you make that for them.
Perhaps he wanted to debug. Also, depending on how the compiler is invoked (for instance with -g for gcc) the binary might contain source code information still.

C (or any) compilers deterministic performance

Whilst working on a recent project, I was visited by a customer QA representitive, who asked me a question that I hadn't really considered before:
How do you know that the compiler you are using generates machine code that matches the c code's functionality exactly and that the compiler is fully deterministic?
To this question I had absolutely no reply as I have always taken the compiler for granted. It takes in code and spews out machine code. How can I go about and test that the compiler isn't actually adding functionality that I haven't asked it for? or even more dangerously implementing code in a slightly different manner to that which I expect?
I am aware that this is perhapse not really an issue for everyone, and indeed the answer might just be... "you're over a barrel and deal with it". However, when working in an embedded environment, you trust your compiler implicitly. How can I prove to myself and QA that I am right in doing so?
You can apply that argument at any level: do you trust the third party libraries? do you trust the OS? do you trust the processor?
A good example of why this may be a valid concern of course, is how Ken Thompson put a backdoor into the original 'login' program ... and modified the C compiler so that even if you recompiled login you still got the backdoor. See this posting for more details.
Similar questions have been raised about encryption algorithms -- how do we know there isn't a backdoor in DES for the NSA to snoop through?
At the end of the you have to decide if you trust the infrastructure you are building on enough to not worry about it, otherwise you have to start developing your own silicon chips!
For safety critical embedded application certifying agencies require to satisfy the "proven-in-use" requirement for the compiler. There are typically certain requirements (kind of like "hours of operation") that need to be met and proven by detailed documentation. However, most people either cannot or don't want to meet these requirements because it can be very difficult especially on your first project with a new target/compiler.
One other approach is basically to NOT trust the compiler's output at all. Any compiler and even language-dependent (Appendix G of the C-90 standard, anyone?) deficiencies need to be covered by a strict set of static analysis, unit- and coverage testing in addition to the later functional testing.
A standard like MISRA-C can help to restrict the input to the compiler to a "safe" subset of the C language. Another approach is to restrict the input to a compiler to a subset of a language and test what the output for the entire subset is. If our application is only built of components from the subset it is assumed to be known what the output of the compiler will be. The usually goes by "qualification of the compiler".
The goal of all of this is to be able to answer the QA representative's question with "We don't just rely on determinism of the compiler but this is the way we prove it...".
You know by testing. When you test, you're testing your both code and the compiler.
You will find that the odds that you or the compiler writer have made an error are much smaller than the odds that you would make an error if you wrote the program in question in some assembly language.
There are compiler validation suits available.
The one I remember is "Perennial".
When I worked on a C compiler for a embedded SOC processor we had to validate the compiler against this and two other validation suits (that I forget the name of). Validating the compiler to a certain level of conformance to these test suits was part of the contract.
It all boils down to trust. Does your customer trust any compiler? Use that, or at least compare output code between yours and theirs.
If they don't trust any, is there a reference implementation for the language? Could you convince them to trust it? Then compare yours against the reference or use the reference.
This all assuming you actually verify the actual code you get from the vendor/provider and that you check the compiler has not been tampered with, which should be the first step.
Anyhow this still leaves the question about how would you verify, without having references, a compiler, from scratch. That certainly looks like a ton of work and requires a definition of the language, which not always is available, sometimes the definition is the compiler.
How do you know that the compiler you are using generates machine code that matches the c code's functionality exactly and that the compiler is fully deterministic?
You don't, that's why you test the resultant binary, and why you make sure to ship the same binary you tested with. And why when you make 'minor' software changes, you regression test to make sure none of the old functionality broke.
The only software I've certified is avionics. FAA certification isn't rigorous enough to prove the software works correctly, while at the same time it does force you to jump through a certain amount of hoops. The trick is to structure your 'process' so it improves quality as much as possible, with as little extraneous hoop-jumping as you can get away with. So anything that you know is worthless and won't actually find bugs, you can probably weasel out of. And anything you know you should do because it will find bugs that isn't explicitly asked for by the FAA, your best bet is to twist words until it sounds like you're giving the FAA/your QA people what they asked for.
This actually isn't as dishonest as I've made it sound, in general the FAA cares more about you being conscientious and confident that you're trying to do a good job, than about what exactly you do.
Some intellectual ammunition might be found in Crosstalk, a magazine for defense software engineers. This question is the kind of thing they spend many waking hours on. http://www.stsc.hill.af.mil/crosstalk/2006/08/index.html (If i can find my old notes from an old project, i'll be back here...)
You can never fully trust the compiler, even highly recommended ones. They could release an update that has a bug, and your code compiles the same. This problem is compounded when updating old code with the buggy compiler, doing testing and shipping out the goods only to have the customer ring you 3 months later with a problem.
It all comes back to testing, and if there is one thing I have learnt it is to thouroughly test after any non-trivial change. If the problem seems impossible to find have a look at the compiled assembler and check it's doing what it should be doing.
On several occasions I have found bugs in the compiler. One time there was a bug where 16 bit variables would get incremented but without carry and only if the 16 bit variable was part of an extern struct defined in a header file.
...you trust your compiler implicitly
You'll stop doing that the first time you come across a compiler bug. ;-)
But ultimately this is what testing is for. It doesn't matter to your test regime how the bug got in to your product in the first place, all that matters is that it didn't pass your extensive testing regime.
Well.. you can't simply say that you trust your compiler's output - particularly if you work with embedded code. It is not hard to find discrepancies between the code generated when compiling the very same code with different compilers. This is the case because the C standard itself is too loose. Many details can be implemented differently by different compilers without breaking the standard. How do we deal with this stuff? We avoid compiler dependent constructs whenever possible. We may deal with it by choosing a safer subset of C like Misra-C as previously mentioned by the user cschol. I seldom have to inspect the code generated by the compiler but that has also happened to me at times. But, ultimately, you are relying on your tests in order to make sure that the code behaves as intended.
Is there a better option out there? Some people claim that there is. The other option is to write your code in SPARK/Ada. I have never written code in SPARK but my understanding is that you would still have to link it against routines written in C that would deal with the "bare metal" stuff. The beauty of SPARK/Ada is that you are absolutely guaranteed that the code generated by any compiler is always going to be the same. No ambiguity whatsoever. On top of that, the language allows you to annotate the code with explanations as to how the code is intended to behave. The SPARK toolset will use these annotations to formally prove that the code written does indeed do what the annotations have described. So I have been told that for critical systems, SPARK/Ada is a pretty good bet. I have never tried it myself though.
You don't know for sure that the compiler will do exactly what you expect. The reason is, of course, that a compiler is a peice of software, and is therefore susceptible to bugs.
Compiler writers have the advantage of working from a high quality spec, while the rest of us have to figure out what we're making as we go along. However, compiler specs also have bugs, and complex parts with subtle interactions. So, it's not exactly trivial to figure out what the compiler should be doing.
Still, once you decide what you think the language spec means, you can write a good, fast, automated test for every nuance. This is where compiler writing has a huge advantage over writing other kinds of software: in testing. Every bug becomes an automated test case, and the test suite can very thorough. Compiler vendors have a lot more budget to invest in verifying the correctness of the compiler than you do (you already have a day job, right?).
What does this mean for you? It means that you need to be open to the possibilities of bugs in your compiler, but chances are you won't find any yourself.
I would pick a compiler vendor that is not likely to go out of business any time soon, that has a history of high quality in their compilers, and that has demonstrated their ability to service (patch) their products. Compilers seem to get more correct over time, so I'd choose one that's been around a decade or two.
Focus your attention on getting your code right. If it's clear and simple, then when you do hit a compiler bug, you won't have to think really hard to decide where the problem lies. Write good unit tests, which will ensure that your code does what you expect it to do.
Try unit testing.
If that's not enough, use different compilers and compare the results of your unit tests. Compare strace outputs, run your tests in a VM, keep a log of disk and network I/O, then compare those.
Or propose to write your own compiler and tell them what it's going to cost.
The most you can easily certify is that you are using an untampered compiler from provider X. If they do not trust provider X, it's their problem (if X is reasonably trustworthy). If they do not trust any compiler provider, then they are totally unreasonable.
Answering their question: I make sure I'm using an untampered compiler from X through these means. X is well reputed, plus I have a nice set of tests that show our application behaves as expected.
Everything else is starting to open the can of worms. You have to stop somewhere, as Rob says.
Sometimes you do get behavioural changes when you request aggressive levels of optimisation.
And optimisation and floating point numbers? Forget it!
For most software development (think desktop applications) the answer is probably that you don't know and don't care.
In safety-critical systems (think nuclear power plants and commercial avionics) you do care and regulatory agencies will require you to prove it. In my experience, you can do this one of two ways:
Use a qualified compiler, where "qualified" means that it has been verified according to the standards set out by the regulatory agency.
Perform object code analysis. Essentially, you compile a piece of reference code and then manually analyze the output to demonstrate that the compiler has not inserted any instructions that can't be traced back to your source code.
You get the one Dijkstra wrote.
Select a formally verified compiler, like Compcert C compiler.
Changing the optimization level of the compiler will change the output.
Slight changes to a function may make the compiler inline or no longer inline a function.
Changes to the compiler (gcc versions for example) may change the output
Certain library functions may be instrinic (i.e., emit optimized assembly) while others most are not.
The good news is that for most things it really doesn't matter that much. Where it does, you may want to consider assembly if it really matters (e.g., in an ISR).
If you are concerned about unexpected machine code which doesn't produce visible results, the only way is probably to contact compiler vendor for certification of some sort which will satisfy your customer.
Otherwise you'll know it the same you know about bugs in your code - testing.
Machine code from modern compilers can be vastly different and totally incomprehensible for puny humans.
I think it's possible to reduce this problem to the Halting Problem somehow.
The most obvious problem is that if you use some kind of program to analyze the compiler and its determinism, how do you know that your program gets compiled correctly, and produces the correct result?
If you're using another, "safe" compiler though, I'm not sure. What I'm sure is writing a compiler from scratch would probably be an easier job.
Even a qualified or certified compiler can produce undesirable results. Keep your code simple and test, test, test. That or walk through the machine code by hand while not allowing any human error. PLus the operating system or whatever environment you are running on (preferably no operating system, just your program).
This problem has been solved in mission critical environments since software and compilers began. As many of the others who have responded also know. Each industry has its own rules from certified compilers to programming style (you must always program this way, never use this or that or the other), lots of testing and peer review. Verifying every execution path, etc.
If you are not in one of those industries, then you get what you get. A commercial program on a COTS operating system on COTS hardware. It will fail, that is a guarantee.
If your worried about malicious bugs in the compiler, one recommendation (IIRC, an NSA requirement for some projects) is that the compiler binary predate the writing of the code. At least then you know that no one has added bugs targeted at your program.

Resources