Can a C compiler implement signed right shift "unreasonably"? - c

The title field isn't long enough to a capture a detailed question, so for the record, my actual question defines "unreasonable" in a specific way:
Is it legal1 for a C implementation have an arithmetic right
shift operator that returns different results, over time, for the
identical argument values? That is, must >> be a true function?
Imagine you wanted to write portable code using right-shift >> on signed values in C. Unfortunately, for you, the most efficient implementation of some critical part of your algorithm is fastest when signed right shifts fill the sign bit (i.e., they are arithmetic right shifts). Now since this behavior is implementation defined you are kind of screwed if you want to write portable code that takes advantage of it.
Just reading the compiler's documentation (which it is required to provide in the standard, but may or may not actually be easy to access or even exist) is great if you know all the compilers you will run on and will ever run on, but since that is often impossible, you might look for a way to do this portably.
One way I thought of is to simply test the compiler's behavior at runtime: it it appears to implement arithmetic2 right shift, use the optimized algorithm, but if not use a fallback that doesn't rely on it. Of course, just checking that say (short)0xFF00 >> 4 == 0xFFF0 isn't enough since it doesn't exclude that perhaps char or int values work differently, or even the weird case that it fills for some shift amounts or values but not for others3.
So given that a comprehensive approach would be to exhaustively check all shift values and amounts. For all 8-bit char inputs that's only 28 LHS values and 23 RHS values for a total of 211, while short (typically, but let's say int16_t if you want to be pedantic) totals only 220, which could be validated in a fraction of a second on modern hardware4. Doing all 237 32-bit values would take a few seconds on decent hardware, but still possibly reasonable. 64-bit is out for the foreseeable future however.
Let's say you did that and found that the >> behavior exactly matches the desired arithmetic shift behavior. Are you safe to rely on it according to the standard: does the standard constrain the implementation not to change its behavior at runtime? That is, does the behavior have to be expressible as a function of it's inputs, or can (short)0xFF00 >> 4 be 0xFFF0 one moment and then 0x0FF0 (or any other value) the next?
Now this question is mostly theoretical, but violating this probably isn't as crazy as it might seem, given the presence of hybrid architectures such as big.LITTLE that dynamically move processes from on CPU to another with possible small behavioral differences, or live VM migration between say chips manufactured by Intel and AMD with subtle (usually undocumented) instruction behavior differences.
1 By "legal" I mean according to the C standard, not according to the laws of your country, of course.
2 I.e., it replicates the sign bit for newly shifted in bits.
3 That would be kind of insane, but not that insane: x86 has different behavior (overflow flag bit setting) for shifts of 1 rather than other shifts, and ARM masks the shift amount in a surprising way that a simple test probably wouldn't detect.
4 At least anything better than a microcontroller. The inner validation loop is a few simple instructions, so a 1 GHz CPU could validate all ~1 million short values in ~1 ms at one instruction-per-cycle.

Let's say you did that and found that the >> behavior exactly matches the desired arithmetic shift behavior. Are you safe to rely on it according to the standard: does the standard constrain the implementation not to change its behavior at runtime? That is, does the behavior have to be expressible as a function of it's inputs, or can (short)0xFF00 >> 4 be 0xFFF0 one moment and then 0x0FF0 (or any other value) the next?
The standard does not place any requirements on the form or nature of the (required) definition of the behavior of right shifting a negative integer. In particular, it does not forbid the definition to be conditional on compile-time or run-time properties beyond the operands. Indeed, this is the case for implementation-defined behavior in general. The standard defines the term simply as
unspecified behavior where each implementation documents how the choice is made.
So, for example, an implementation might provide a macro, a global variable, or a function by which to select between arithmetic and logical shifts. Implementations might also define right-shifting a negative number to do less plausible, or even wildly implausible things.
Testing the behavior is a reasonable approach, but it gets you only a probabilistic answer. Nevertheless, in practice I think it's pretty safe to perform just a small number of tests -- maybe as few as one -- for each LHS data type of interest. It is extremely unlikely that you'll run into an implementation that implements anything other than standard arithmetic (always) or logical (always) right shift, or that you'll encounter one where the choice between arithmetic and logical shifting varies for different operands of the same (promoted) type.

The authors of the Standard made no effort to imagine and forbid every unreasonable way implementations might behave. In the days prior to the Standard, implementations almost always defined many behaviors based upon how the underlying platform worked, and the authors of the Standard almost certainly expected that most implementations would do likewise. Since the authors of the Standard didn't want to rule out the possibility that it might sometimes be useful for implementations to behave in other ways (e.g. to support code which targeted other platforms), however, they left many things pretty much wide open.
The Standard is somewhat vague as to the level of precision with which "Implementation Defined" behaviors need to be documented. I think the intention is that a person of reasonable intelligence reading the specification should be able to predict how a piece of code with Implementation-Defined behavior would behave, but I'm not sure that defining an operation as "yielding whatever happens to be in register 7" would be non-conforming.
From a practical perspective, what should matter is whether one is targeting quality implementations for platforms that have been developed within the last quarter century, or whether one is trying to force even deliberately-obtuse compilers to behave in controlled fashion. If the former, it may be worth
using a static assert to ensure the -1>>1==-1, but quality compilers for modern
hardware where that test passes are going to use arithmetic right shift
consistently. While targeting code to obtuse compilers might possibly have
some purpose, it's not generally possible to guard against all the ways that
a pathological-but-conforming compiler could sabotage one's code, and efforts
spent attempting to do so could often be more effectively spent on getting a
quality compiler.

Related

Why do C variables maximum and minimum values touch?

I am working on a tutorial for binary numbers. Something I have wondered for a while is why all the integer maximum and minimum values touch. For example for an unsigned byte 255 + 1 = 0 and 0 - 1 = 255. I understand all the binary math that goes into it, but why was the decision made to have them work this way instead of a straight number line that gives an error when the extremes are breached?
Since your example is unsigned, I assume it's OK to limited the scope to unsigned types.
Allowing wrapping is useful. For example, it's what allows you (and the compiler) to always reorder (and constant-fold) a sequence of additions and subtractions. Even something such as x + 3 - 1 could not be optimized to x + 2 if the language requires trapping, because it changes the conditions under which the expression would trap. Wrapping also mixes better with bit manipulation, with the interpretation of an unsigned number as a vector of bits it makes very little sense if there's trapping. That applies especially to shifts, but addition, subtraction and even multiplication also make sense on bitvectors and combine usefully with the usual bitwise operations.
The algebraic structure you get when allowing wrapping, Z/2kZ, is fairly nice (perhaps not as nice as modulo a prime, but that would interact badly with the bitvector interpretation and it doesn't match typical hardware) and well known, so it's not like anything particularly unexpected or weird will happen, it's not like a wrapped result is a "uselessly arbitrary" result.
And of course testing the carry flag (or whatever may be required) after just about every operation has a big direct overhead as well.
Trapping on "unsigned overflow" is both expensive and undesirable, at least if it is the default behaviour.
Why not "give an error when the extremes are breached"?
Error handling is one of the hardest things in software development. When an error happens, there are many possible ways software could be required to do:
Show an annoying message to the user? Like Hey user, you have just tried to add 1 to this variable, which is already too big. Stop that! - there often is no context to show the user, that would be of any help.
Throw an exception? (BTW C has support for that) - that would show a stack trace, if you happened to execute your code in a debugger. Otherwise, it would just crash - not bad (it won't corrupt your files) but not good either (can be exploited as a denial of service attack).
Write it to a log file? - sometimes it's the best thing to do - record the error and move on, so it can be debugged later.
The right thing to do depends on your code. So a generic programming language like C doesn't want to restrict you by providing any mandatory behavior.
Instead, C provides two guidelines:
For unsigned types like unsigned int or uint8_t or (usually) char - it provides silent wraparound, for best performance.
For signed types like int - it provides "undefined behavior", which makes it possible to "choose", in a very limited way, what will happen on overflow
Throw an exception if using -ftrapv in gcc
Silent wraparound if using -fwrapv in gcc
By default (no fancy command-line options) - the compiler may assume it will never happen, which may help it produce optimized code
The idea here is that you (the programmer) should think where checking for overflow is worth doing, and how to recover from overflow (if the language provided a standard error handling mechanism, it would deny you the latter part). This approach has maximum flexibility, (potentially) maximum performance, and (usually) hardest to do - which fits the philosophy of C.

What replacements are available for formerly-widely-supported behaviors not defined by C standard

In the early days of C prior to standardization, implementations had a variety of ways of handling exceptional and semi-exceptional cases of various actions. Some of them would trigger traps which could cause random code execution if not configured first. Because the behavior of such traps was outside the scope of the C standard (and may in some cases be controlled by an operating system outside the control of the running program), and to avoid requiring that compilers not allow code which had been relying upon such traps to keep on doing so, the behavior of actions that could cause such traps was left entirely up to the discretion of the compiler/platform.
By the end of the 1990s, although not required to do so by the C standard, every mainstream compiler had adopted common behaviors for many of these situations; using such behaviors would allow improvements with respect to code speed, size, and readability.
Since the "obvious" ways of requesting the following operations are no longer supported, how should one go about replacing them in such a way as to not impede readability nor adversely affect code generation when using older compilers? For purposes of descriptions, assume int is 32-bit, ui is a unsigned int, si is signed int, and b is unsigned char.
Given ui and b, compute ui << b for b==0..31, or a value which may arbitrarily behave as ui << (b & 31) or zero for values 32..255. Note that if the left-hand operand is zero whenever the right-hand operand exceeds 31, both behaviors will be identical.
For code that only needs to run on a processor that yields zero when right-shifting or left-shifting by an amount from 32 to 255, compute ui << b for b==0..31 and 0 for b==32..255. While a compiler might be able to optimize out conditional logic designed to skip the shift for values 32..255 (so code would simply perform the shift that will yield the correct behavior), I don't know any way to formulate such conditional logic that would guarantee the compiler won't generate needless code for it.
As with 1 and 2, but for right shifts.
Given si and b such that b0..30 and si*(1<<b) would not overflow, compute si*(1<<b). Note that use of the multiplication operator would grossly impair performance on many older compilers, but if the purpose of the shift is to scale a signed value, casting to unsigned in cases where the operand would remain negative throughout shifting feels wrong.
Given various integer values, perform additions, subtractions, multiplications, and shifts, such fashion that if there are no overflows the results will be correct, and if there are overflows the code will either produce values whose upper bits behave in non-trapping and non-UB but otherwise indeterminate fashion or will trap in recognizable platform-defined fashion (and on platforms which don't support traps, would simply yield indeterminate value).
Given a pointer to an allocated region and some pointers to things within it, use realloc to change the allocation size and adjust the aforementioned pointers to match, while avoiding extra work in cases where realloc returns the original block. Not necessarily possible on all platforms, but 1990s mainstream platforms would all allow code to determine if realloc caused things to move, and determine the what the offset of a pointer into a dead object used to be by subtracting the former base address of that object (note that the adjustment would need to be done by computing the offset associated with each dead pointer, and then adding it the new pointer, rather than by trying to compute the "difference" between old and new pointers--something that would legitimately fail on many segmented architectures).
Do "hyper-modern" compilers provide any good replacements for the above which would not degrade at least one of code size, speed, or readability, while offering no improvements in any of the others? From what I can tell, not only could 99% of compilers throughout the 1990s do all of the above, but for each example one would have been able to write the code the same way on nearly all of them. A few compilers might have tried to optimize left-shifts and right-shifts with an unguarded jump table, but that's the only case I can think of where a 1990s compiler for a 1990s platform would have any problem with the "obvious" way of coding any of the above. If that hyper-modern compilers have ceased to support the classic forms, what do they offer as replacements?
Modern Standard C is specified in such a way that it can be guaranteed to be portable if and only if you write your code with no more expectations about the underlying hardware it will run on than are given by the C abstract machine the standard implicitly and explicitly describes.
You can still write for a specific compiler that has specific behaviour at a given optimization level for a given target CPU and architecture, but then do not expect any other compiler (modern or otherwise, or even a minor revision of the one you wrote for) to go out of its way to try to intuit your expectations if your code violates conditions where the Standard says that it is unreasonable to expect any well defined implementation agnostic behaviour.
Two general principles apply to standard C and standard C++:
Behavior with unsigned numbers is usually better defined than behavior with signed numbers.
Treat optimization as the quality-of-implementation issue that it is. This means that if you're worried about micro-optimization of a particular inner loop, you should read the assembly output of your compiler (such as with gcc -S), and if you discover that it fails to optimize well-defined behavior to appropriate machine instructions, file a defect report with the publisher of your compiler. (This doesn't work, however, when the publisher of the only practical compiler targeting a particular platform isn't very interested in optimization, such as cc65 targeting MOS 6502.)
From these principles, you can usually derive a well-defined way to achieve the same result, and then apply the trust-but-verify principle to the quality of generated code. For example, make shifting functions with well-defined behavior, and let the optimizer remove any unneeded checks that the architecture itself guarantees.
// Performs 2 for unsigned numbers. Also works for signed
// numbers due to rule for casting between signed and unsigned
// integer types.
inline uint32_t lsl32(uint32_t ui, unsigned int b) {
if (b >= 32) return 0;
return ui << b;
}
// Performs 3 for unsigned numbers.
inline uint32_t lsr32(uint32_t ui, unsigned int b) {
if (b >= 32) return 0;
return ui >> b;
}
// Performs 3 for signed numbers.
inline int32_t asr32(int32_t si, unsigned int b) {
if (si >= 0) return lsr32(si, b);
if (b >= 31) return -1;
return ~(~(uint32)si >> b);
}
For 4 and 5, cast to unsigned, do the math, and cast back to signed. This produces non-trapping well-defined behavior.

What's the reason for letting the semantics of a=a++ be undefined?

The line
a = a++;
is undefined behaviour in C. The question I am asking is: why?
I mean, I get that it might be hard to provide a consistent order in which things should be done. But, certain compilers will always do it in one order or the other (at a given optimization level). So why exactly is this left up to the compiler to decide?
To be clear, I want to know if this was a design decision and if so, what prompted it? Or maybe there is a hardware limitation of some kind?
UPDATE: This question was the subject of my blog on June 18th, 2012. Thanks for the great question!
Why? I want to know if this was a design decision and if so, what prompted it?
You are essentially asking for the minutes of the meeting of the ANSI C design committee, and I don't have those handy. If your question can only be answered definitively by someone who was in the room that day, then you're going to have to find someone who was in that room.
However, I can answer a broader question:
What are some of the factors that lead a language design committee to leave the behaviour of a legal program (*) "undefined" or "implementation defined" (**)?
The first major factor is: are there two existing implementations of the language in the marketplace that disagree on the behaviour of a particular program? If FooCorp's compiler compiles M(A(), B()) as "call A, call B, call M", and BarCorp's compiler compiles it as "call B, call A, call M", and neither is the "obviously correct" behaviour then there is strong incentive to the language design committee to say "you're both right", and make it implementation defined behaviour. Particularly this is the case if FooCorp and BarCorp both have representatives on the committee.
The next major factor is: does the feature naturally present many different possibilities for implementation? For example, in C# the compiler's analysis of a "query comprehension" expression is specified as "do a syntactic transformation into an equivalent program that does not have query comprehensions, and then analyze that program normally". There is very little freedom for an implementation to do otherwise.
By contrast, the C# specification says that the foreach loop should be treated as the equivalent while loop inside a try block, but allows the implementation some flexibility. A C# compiler is permitted to say, for example "I know how to implement foreach loop semantics more efficiently over an array" and use the array's indexing feature rather than converting the array to a sequence as the specification suggests it should.
A third factor is: is the feature so complex that a detailed breakdown of its exact behaviour would be difficult or expensive to specify? The C# specification says very little indeed about how anonymous methods, lambda expressions, expression trees, dynamic calls, iterator blocks and async blocks are to be implemented; it merely describes the desired semantics and some restrictions on behaviour, and leaves the rest up to the implementation.
A fourth factor is: does the feature impose a high burden on the compiler to analyze? For example, in C# if you have:
Func<int, int> f1 = (int x)=>x + 1;
Func<int, int> f2 = (int x)=>x + 1;
bool b = object.ReferenceEquals(f1, f2);
Suppose we require b to be true. How are you going to determine when two functions are "the same"? Doing an "intensionality" analysis -- do the function bodies have the same content? -- is hard, and doing an "extensionality" analysis -- do the functions have the same results when given the same inputs? -- is even harder. A language specification committee should seek to minimize the number of open research problems that an implementation team has to solve!
In C# this is therefore left to be implementation-defined; a compiler can choose to make them reference equal or not at its discretion.
A fifth factor is: does the feature impose a high burden on the runtime environment?
For example, in C# dereferencing past the end of an array is well-defined; it produces an array-index-was-out-of-bounds exception. This feature can be implemented with a small -- not zero, but small -- cost at runtime. Calling an instance or virtual method with a null receiver is defined as producing a null-was-dereferenced exception; again, this can be implemented with a small, but non-zero cost. The benefit of eliminating the undefined behaviour pays for the small runtime cost.
A sixth factor is: does making the behaviour defined preclude some major optimization? For example, C# defines the ordering of side effects when observed from the thread that causes the side effects. But the behaviour of a program that observes side effects of one thread from another thread is implementation-defined except for a few "special" side effects. (Like a volatile write, or entering a lock.) If the C# language required that all threads observe the same side effects in the same order then we would have to restrict modern processors from doing their jobs efficiently; modern processors depend on out-of-order execution and sophisticated caching strategies to obtain their high level of performance.
Those are just a few factors that come to mind; there are of course many, many other factors that language design committees debate before making a feature "implementation defined" or "undefined".
Now let's return to your specific example.
The C# language does make that behaviour strictly defined(†); the side effect of the increment is observed to happen before the side effect of the assignment. So there cannot be any "well, it's just impossible" argument there, because it is possible to choose a behaviour and stick to it. Nor does this preclude major opportunities for optimizations. And there are not a multiplicity of possible complex implementation strategies.
My guess, therefore, and I emphasize that this is a guess, is that the C language committee made ordering of side effects into implementation defined behaviour because there were multiple compilers in the marketplace that did it differently, none was clearly "more correct", and the committee was unwilling to tell half of them that they were wrong.
(*) Or, sometimes, its compiler! But let's ignore that factor.
(**) "Undefined" behaviour means that the code can do anything, including erasing your hard disk. The compiler is not required to generate code that has any particular behaviour, and not required to tell you that it is generating code with undefined behaviour. "Implementation defined" behaviour means that the compiler author is given considerable freedom in choice of implementation strategy, but is required to pick a strategy, use it consistently, and document that choice.
(†) When observed from a single thread, of course.
It's undefined because there is no good reason for writing code like that, and by not requiring any specific behaviour for bogus code, compilers can more aggressively optimize well-written code. For example, *p = i++ may be optimized in a way that causes a crash if p happens to point to i, possibly because two cores write to the same memory location at the same time. The fact that this also happens to be undefined in the specific case that *p is explicitly written out as i, to get i = i++, logically follows.
It's ambiguous but not syntactically wrong. What should a be? Both = and ++ have the same "timing." So instead of defining an arbitrary order it was left undefined since either order would be in conflict with one of the two operators definitions.
With a few exceptions, the order in which expressions are evaluated is unspecified; this was a deliberate design decision, and it allows implementations to rearrange the evaluation order from what's written if that will result in more efficient machine code. Similarly, the order in which the side effects of ++ and -- are applied is unspecified beyond the requirement that it happen before the next sequence point, again to give implementations the freedom to arrange operations in an optimal manner.
Unfortunately, this means that the result of an expression like a = a++ will vary based on the compiler, compiler settings, surrounding code, etc. The behavior is specifically called out as undefined in the language standard so that compiler implementors don't have to worry about detecting such cases and issuing a diagnostic against them. Cases like a = a++ are obvious, but what about something like
void foo(int *a, int *b)
{
*a = (*b)++;
}
If that's the only function in the file (or if its caller is in a different file), there's no way to know at compile time whether a and b point to the same object; what do you do?
Note that it's entirely possible to mandate that all expressions be evaluated in a specific order, and that all side effects be applied at a specific point in evaluation; that's what Java and C# do, and in those languages expressions like a = a++ are always well-defined.
The postfix ++ operator returns the value prior to the incrementation. So, at the first step, a gets assigned to its old value (that's what ++ returns). At the next point it is undefined whether the increment or the assignment will take place first, because both operations are applied over the same object (a), and the language says nothing about the order of evaluation of these operators.
Somebody may provide another reason, but from an optimization (better say assembler presentation) point of view, a needs be loaded into a CPU register, and the postfix operator's value should be placed into another register or the same.
So the last assignment can depend on either the optimizer using one register or two.
Updating the same object twice without an intervening sequence point is undefined behaviour ...
because that makes compiler writers happier
because it allows implementations to define it anyway
because it doesn't force a specific constraint when it isn't needed
Suppose a is a pointer with value 0x0001FFFF. And suppose the architecture is segmented so that the compiler needs to apply the increment to the high and low parts separately, with a carry between them. The optimiser could conceivably reorder the writes so that the final value stored is 0x0002FFFF; that is, the low part before the increment and the high part after the increment.
This value is twice either value that you might have expected. It may point to memory not owned by the application, or it may (in general) be a trapping representation. In other words, the CPU may raise a hardware fault as soon as this value is loaded into a register, crashing the application. Even if it doesn't cause an immediate crash, it is a profoundly wrong value for the application to be using.
The same kind of thing can happen with other basic types, and the C language allows even ints to have trapping representations. C tries to allow efficient implementation on a wide range of hardware. Getting efficient code on a segmented machine such as the 8086 is hard. By making this undefined behaviour, a language implementer has a bit more freedom to optimise aggressively. I don't know if it has ever made a performance difference in practice, but evidently the language committee wanted to give every benefit to the optimiser.

Are there well-known "profiles" of the C standard?

I write C code that makes certain assumptions about the implementation, such as:
char is 8 bits.
signed integral types are two's complement.
>> on signed integers sign-extends.
integer division rounds negative quotients towards zero.
double is IEEE-754 doubles and can be type-punned to and from uint64_t with the expected result.
comparisons involving NaN always evaluate to false.
a null pointer is all zero bits.
all data pointers have the same representation, and can be converted to size_t and back again without information loss.
pointer arithmetic on char* is the same as ordinary arithmetic on size_t.
functions pointers can be cast to void* and back again without information loss.
Now, all of these are things that the C standard doesn't guarantee, so strictly speaking my code is non-portable. However, they happen to be true on the architectures and ABIs I'm currently targeting, and after careful consideration I've decided that the risk they will fail to hold on some architecture that I'll need to target in the future is acceptably low compared to the pragmatic benefits I derive from making the assumptions now.
The question is: how do I best document this decision? Many of my assumptions are made by practically everyone (non-octet chars? or sign-magnitude integers? on a future, commercially successful, architecture?). Others are more arguable -- the most risky probably being the one about function pointers. But if I just list everything I assume beyond what the standard gives me, the reader's eyes are just going to glaze over, and he may not notice the ones that actually matter.
So, is there some well-known set of assumptions about being a "somewhat orthodox" architecture that I can incorporate by reference, and then only document explicitly where I go beyond even that? (Effectively such a "profile" would define a new language that is a superset of C, but it might not acknowledge that in so many words -- and it may not be a pragmatically useful way to think of it either).
Clarification: I'm looking for a shorthand way to document my choices, not for a way to test automatically whether a given compiler matches my expectations. The latter is obviously useful too, but does not solve everything. For example, if a business partner contacts us saying, "we're making a device based on Google's new G2015 chip; will your software run on it?" -- then it would be nice to be able to answer "we haven't worked with that arch yet, but it shouldn't be a problem if it has a C compiler that satisfies such-and-such".
Clarify even more since somebody has voted to close as "not constructive": I'm not looking for discussion here, just for pointers to actual, existing, formal documents that can simplify my documentation by being incorporated by reference.
I would introduce a STATIC_ASSERT macro and put all your assumptions in such asserts.
Unfortunately, not only is there a lack of standards for a dialect of C that combines the extensions which have emerged as de facto standards during the 1990s (two's-complement, universally-ranked pointers, etc.) but compilers trends are moving in the opposite direction. Given the following requirements for a function:
* Accept int parameters x,y,z:
* Return 0 if x-y is computable as "int" and is less than Z
* Return 1 if x-y is computable as "int" and is not less than Z
* Return 0 or 1 if x-y is not computable */
The vast majority of compilers in the 1990s would have allowed:
int diffCompare(int x, int y, int z)
{ return (x-y) >= z; }
On some platforms, in cases where the difference between x-y was not computable as int, it would be faster to compute a "wrapped" two's-complement value of x-y and compare that, while on others it would be faster to perform the calculation using a type larger than int and compare that. By the late 1990s, however, nearly every C compiler would implement the above code to use one of whichever one of those approaches would have been more efficient on its hardware platform.
Since 2010, however, compiler writers seem to have taken the attitude that if computations overflow, compilers shouldn't perform the calculations in whatever fashion is normal for their platform and let what happens happens, nor should they recognizably trap (which would break some code, but could prevent certain kinds of errant program behavior), but instead they should overflows as an excuse to negate laws of time and causality. Consequently, even if a programmer would have been perfectly happy with any behavior a 1990s compiler would have produced, the programmer must replace the code with something like:
{ return ((long)x-y) >= z; }
which would greatly reduce efficiency on many platforms, or
{ return x+(INT_MAX+1U)-y >= z+(INT_MAX+1U); }
which requires specifying a bunch of calculations the programmer doesn't actually want in the hopes that the optimizer will omit them (using signed comparison to make them unnecessary), and would reduce efficiency on a number of platforms (especially DSPs) where the form using (long) would have been more efficient.
It would be helpful if there were standard profiles which would allow programmers to avoid the need for nasty horrible kludges like the above using INT_MAX+1U, but if trends continue they will become more and more necessary.
Most compiler documentation includes a section that describes the specific behavior of implementation-dependent features. Can you point to that section of the gcc or msvc docs to describe your assumptions?
You can write a header file "document.h" where you collect all your assumptions.
Then, in every file that you know that non-standard assumptions are made, you can #include such a file.
Perhaps "document.h" would not have real sentences at all, but only commented text and some macros.
// [T] DOCUMENT.H
//
#ifndef DOCUMENT_H
#define DOCUMENT_H
// [S] 1. Basic assumptions.
//
// If this file is included in a compilation unit it means that
// the following assumptions are made:
// [1] A char has 8 bits.
// [#]
#define MY_CHARBITSIZE 8
// [2] IEEE 754 doubles are addopted for type: double.
// ........
// [S] 2. Detailed information
//
#endif
The tags in brackets: [T] [S] [#] [1] [2] stand for:
* [T]: Document Title
* [S]: Section
* [#]: Print the following (non-commented) lines as a code-block.
* [1], [2]: Numbered items of a list.
Now, the idea here is to use the file "document.h" in a different way:
To parse the file in order to convert the comments in "document.h" to some printable document, or some basic HTML.
Thus, the tags [T] [S] [#] etc., are intended to be interpreted by a parser that convert any comment into an HTML line of text (for example), and generate <h1></h1>, <b></b> (or whatever you want), when a tag appears.
If you keep the parser as a simple and small program, this can give you a short hand to handle this kind of documentation.

Are bitwise operations still practical?

Wikipedia, the one true source of knowledge, states:
On most older microprocessors, bitwise
operations are slightly faster than
addition and subtraction operations
and usually significantly faster than
multiplication and division
operations. On modern architectures,
this is not the case: bitwise
operations are generally the same
speed as addition (though still faster
than multiplication).
Is there a practical reason to learn bitwise operation hacks or it is now just something you learn for theory and curiosity?
Bitwise operations are worth studying because they have many applications. It is not their main use to substitute arithmetic operations. Cryptography, computer graphics, hash functions, compression algorithms, and network protocols are just some examples where bitwise operations are extremely useful.
The lines you quoted from the Wikipedia article just tried to give some clues about the speed of bitwise operations. Unfortunately the article fails to provide some good examples of applications.
Bitwise operations are still useful. For instance, they can be used to create "flags" using a single variable, and save on the number of variables you would use to indicate various conditions. Concerning performance on arithmetic operations, it is better to leave the compiler do the optimization (unless you are some sort of guru).
They're useful for getting to understand how binary "works"; otherwise, no. In fact, I'd say that even if the bitwise hacks are faster on a given architecture, it's the compiler's job to make use of that fact — not yours. Write what you mean.
The only case where it makes sense to use them is if you're actually using your numbers as bitvectors. For instance, if you're modeling some sort of hardware and the variables represent registers.
If you want to perform arithmetic, use the arithmetic operators.
Depends what your problem is. If you are controlling hardware you need ways to set single bits within an integer.
Buy an OGD1 PCI board (open graphics card) and talk to it using libpci. http://en.wikipedia.org/wiki/Open_Graphics_Project
It is true that in most cases when you multiply an integer by a constant that happens to be a power of two, the compiler optimises it to use the bit-shift. However, when the shift is also a variable, the compiler cannot deduct it, unless you explicitly use the shift operation.
Funny nobody saw fit to mention the ctype[] array in C/C++ - also implemented in Java. This concept is extremely useful in language processing, especially when using different alphabets, or when parsing a sentence.
ctype[] is an array of 256 short integers, and in each integer, there are bits representing different character types. For example, ctype[;A'] - ctype['Z'] have bits set to show they are upper-case letters of the alphabet; ctype['0']-ctype['9'] have bits set to show they are numeric. To see if a character x is alphanumeric, you can write something like 'if (ctype[x] & (UC | LC | NUM))' which is somewhat faster and much more elegant than writing 'if ('A' = x <= 'Z' || ....'.
Once you start thinking bitwise, you find lots of places to use it. For instance, I had two text buffers. I wrote one to the other, replacing all occurrences of FINDstring with REPLACEstring as I went. Then for the next find-replace pair, I simply switched the buffer indices, so I was always writing from buffer[in] to buffer[out]. 'in' started as 0, 'out' as 1. After completing a copy I simply wrote 'in ^= 1; out ^= 1;'. And after handling all the replacements I just wrote buffer[out] to disk, not needing to know what 'out' was at that time.
If you think this is low-level, consider that certain mental errors such as deja-vu and its twin jamais-vu are caused by cerebral bit errors!
Working with IPv4 addresses frequently requires bit-operations to discover if a peer's address is within a routable network or must be forwarded onto a gateway, or if the peer is part of a network allowed or denied by firewall rules. Bit operations are required to discover the broadcast address of a network.
Working with IPv6 addresses requires the same fundamental bit-level operations, but because they are so long, I'm not sure how they are implemented. I'd wager money that they are still implemented using the bit operators on pieces of the data, sized appropriately for the architecture.
Of course (to me) the answer is yes: there can be practical reasons to learn them. The fact that nowadays, e.g., an add instruction on typical processors is as fast as an or/xor or an and just means that: an add is as fast as, say, an or on those processors.
The improvements in speed of instructions like add, divide, and so on, just means that now on those processors you can use them and being less worried about performance impact; but it is true now as in the past that you usually won't change every adds to bitwise operations to implement an add. That is, in some cases it may depend on which hacks: likely some hack now must be considered educational and not practical anymore; others could have still their practical application.

Resources