Should I disable the C compiler signed/unsigned mismatch warning? - c

The Microsoft C compiler warns when you try to compare two variables, and one is signed, and the other is unsigned. For example:
int a;
unsigned b;
if ( a < b ) { // warning C4018: '<' : signed/unsigned mismatch
}
Has this warning, in the history of the world, ever caught a real bug? Why's it there, anyway?

Never ignore compiler warnings.

Oh it has. But the other way around. Ignoring that warning caused a huge headache to me one day. I was writing a function that plotted a graph, and mixed signed and unsigned variables. In one place, i compared a negative number to a unsigned one:
int32_t t; ...
uint32_t ut; ...
if(t < ut) {
...
}
Guess what happened? The signed number got promoted to the unsigned type, and thus was greater at the end, even though it was below 0 originally. It took me a couple of hours until i found the bug.

If you have to ask the question, you do not know enough about whether it is safe to disable it, so the answer is No.
I wouldn't disable it - I don't assume I always know better than the compiler (not least because I often don't), and more particularly because I sometimes make mistakes by oversight when a compiler does not.

You should change a and b to both use signed types, or both use unsigned types. But that may not be practical (e.g. it maybe outside your control).
The warning is there to catch comparisons between a signed integer with a negative value and an unsigned integer -- if the magnitudes of both numbers are small, the former will (incorrectly) be deemed larger than the latter.

binary operators often convert both types to the same before doing the comparison, since one is unsigned, it will also convert the int to unsigned. Normally this won't cause too much trouble but if your int is a negative number, this will cause errors in comparisons.
e.g. -1 is equal to 4294967295 when converted from signed to unsigned, now compare that with 100 (unsigned)

The warnings are there for a purpose... They cause you to think hard about your code!
Personally, I would always explicitly cast the signed --> unsigned and unsigned --> signed if possible. By doing this you are ensuring that you take ownership of the transaction and that you know whats going to happen. I realise that it might not always be possible, depending on the project to do this, but always aim for 0 compiler warnings ... it can only help!

I've been writing code longer than I'd care to admit. From personal experience ignoring seemingly pedantic compiler warnings can sometimes yield very unpleasent results.
If they annoy you and you accept/understand the situation then set a cast and move on.
Eventually these things move from an overlooked nuance into a conscious decision when designing new code. The result leaves less room for mickmouse corner cases to ruin your or your customers day and overall better quality software.

I have even configured the compiler to make that warning an compile error. for the reasons all the other guys already mentioned.
If I ever encounter a signed/unsigned mismatch I ask myself why I chose different "signedness". It usually is a design error.

#gimel The explanation about shoot the entire leg of found behind your link is really good for this problem.
-"Someone who avoids the simple problems may simply be heading for a not-so-simple one."
This is actually always true when you convert between different types and you don't check for those values that could hurt you.
/Johan
Update: The correct way to convert from uint to int is to check
the values against limits.h, or something like that.
(But I seldom do that my self, even thou I know I should... :-)

I think its best to convert your UNsigned number into a signed number (before the comparison).
Rather than the other way around.

Just one of the many ways in which C allows you to shoot yourself in the foot - you'd better know what you are doing. The C quote is attributed to Bjarne Stroustrup, creator of C++.

Related

Why do C variables maximum and minimum values touch?

I am working on a tutorial for binary numbers. Something I have wondered for a while is why all the integer maximum and minimum values touch. For example for an unsigned byte 255 + 1 = 0 and 0 - 1 = 255. I understand all the binary math that goes into it, but why was the decision made to have them work this way instead of a straight number line that gives an error when the extremes are breached?
Since your example is unsigned, I assume it's OK to limited the scope to unsigned types.
Allowing wrapping is useful. For example, it's what allows you (and the compiler) to always reorder (and constant-fold) a sequence of additions and subtractions. Even something such as x + 3 - 1 could not be optimized to x + 2 if the language requires trapping, because it changes the conditions under which the expression would trap. Wrapping also mixes better with bit manipulation, with the interpretation of an unsigned number as a vector of bits it makes very little sense if there's trapping. That applies especially to shifts, but addition, subtraction and even multiplication also make sense on bitvectors and combine usefully with the usual bitwise operations.
The algebraic structure you get when allowing wrapping, Z/2kZ, is fairly nice (perhaps not as nice as modulo a prime, but that would interact badly with the bitvector interpretation and it doesn't match typical hardware) and well known, so it's not like anything particularly unexpected or weird will happen, it's not like a wrapped result is a "uselessly arbitrary" result.
And of course testing the carry flag (or whatever may be required) after just about every operation has a big direct overhead as well.
Trapping on "unsigned overflow" is both expensive and undesirable, at least if it is the default behaviour.
Why not "give an error when the extremes are breached"?
Error handling is one of the hardest things in software development. When an error happens, there are many possible ways software could be required to do:
Show an annoying message to the user? Like Hey user, you have just tried to add 1 to this variable, which is already too big. Stop that! - there often is no context to show the user, that would be of any help.
Throw an exception? (BTW C has support for that) - that would show a stack trace, if you happened to execute your code in a debugger. Otherwise, it would just crash - not bad (it won't corrupt your files) but not good either (can be exploited as a denial of service attack).
Write it to a log file? - sometimes it's the best thing to do - record the error and move on, so it can be debugged later.
The right thing to do depends on your code. So a generic programming language like C doesn't want to restrict you by providing any mandatory behavior.
Instead, C provides two guidelines:
For unsigned types like unsigned int or uint8_t or (usually) char - it provides silent wraparound, for best performance.
For signed types like int - it provides "undefined behavior", which makes it possible to "choose", in a very limited way, what will happen on overflow
Throw an exception if using -ftrapv in gcc
Silent wraparound if using -fwrapv in gcc
By default (no fancy command-line options) - the compiler may assume it will never happen, which may help it produce optimized code
The idea here is that you (the programmer) should think where checking for overflow is worth doing, and how to recover from overflow (if the language provided a standard error handling mechanism, it would deny you the latter part). This approach has maximum flexibility, (potentially) maximum performance, and (usually) hardest to do - which fits the philosophy of C.

When is it ok to pass integer to function needing unsigned vice versa

I can't find a link now but passing int to function which needs unsigned type should be ok as long as the int is positive right? (and falls in range of unsigned); I got a warning that's why I am asking
assuming no casts
vice versa should also be ok assuming unsigned integers value fits in int range?
Passing a signed integer to a function expecting an unsigned integer has well-defined behavior. If the argument value is within the range of the parameter type, there's a trivial conversion; the value remains the same. If the argument value is outside that range, it will be converted, effectively by discarding the high-order bits.
This is ok if and only if that behavior is consistent with the way your application needs to behave.
Passing an unsigned integer to a function expecting a signed integer has well-defined behavior if the value is within the range of the parameter type. If it isn't, then the implicit conversion will yield an implementation-defined result (or, in principle, it can raise an implementation-defined signal, but I don't know of any implementations that do that). Typically the result is equivalent to discarding the high-order bits, or interpreting the unsigned value as if it were a signed value (treating the high-order bit as a sign bit), but that's not guaranteed by the language. For unsigned-to-signed conversions, it's best to ensure that the value is within the range of the target type.
Very often (but not always) a conversion like this is an indication that the argument should have been of the same type as the parameter in the first place.
This all assumes that the function is properly declared with a prototype, so the compiler knows, when it sees the call, what type the function expects. If there is no prototype, or if the function is variadic (like printf), then you need to be more careful. There is no conversion, but you can rely on int and unsigned int being interchangeable for values within the range of both. (But it's not a great idea to depend on that; it's best to ensure that the argument you pass is of the expected type.)
This can be broken down into two different questions:
Will it work?
Yes, probably. All variable values are just sequences of bits, and you can, to some degree, play fast and loose with types and still get a correct result if you're careful.
Should I do it?
No, for several reasons:
You shouldn't get in the habit of ignoring warnings. They are there for a reason: they let you know that results may not be what you expect. You should put as much effort into correctly resolving them as you would an error.
Your code will be harder to understand to future readers. You should try to make your code as understandable and maintainable to future readers (including yourself) as possible. Part of this is using the correct types. If you need to use a function that expects a certain type and you have to give it data of a different type for some reason, you should cast it to make your intentions clear.
It is hard enough to understand and debug code. You're only making it harder on yourself when you make implicit assumptions in your code. Try to refactor it to use the types that are required for the operations you want to do. Your current assumption is that your value will always be positive, and always be within the range of a signed integer. If either of these conditions are violated (which you should assume is possible, since no code is perfect), then you will have unexpected behavior from your code. You should add some checks for these conditions before using the value where an unsigned int is expected, then explicitly cast it. This will make debugging easier by catching errors, and make your intentions clear to future readers.
That's right. It's up to you to determine if your app is safe this way, and that's all the compiler is warning you about. As long as the value stays in the range you expect, no surprises will happen.
Both directions are ok, as long as the current value of the source fits into the allowed range of the destination. Some tools or compilers (gcc: -Wconversion) still warn about this for a good reason: to get your attention.
If you are absolutely sure the value fits, just use a cast to suppress the warning. But add a comment to that line (or before it) why you use that cast actually. Some editors will highlight special keyword in comments (e.g. // NOTE:), so you will see there is something to be aware of.
Once the value does not fit, you exhibit undefined bahaviour, so be aware!.

About the use of signed integers in C family of languages

When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.
When I'm sure the value will never need to be negative, I then use an unsigned integer.
And I have to say this happen most of the time.
When reading other peoples' code, I rarely see unsigned integers, even if the represented value can't be negative.
So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?
I've search on the subject, here and in other places, and I have to say I can't find a good reason not to use unsigned integers, when it applies.
I came across those questions: «Default int type: Signed or Unsigned?», and «Should you always use 'int' for numbers in C, even if they are non-negative?» which both present the following example:
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
To me, this is just bad design. Of course, it may result in an infinite loop, with unsigned integers.
But is it so hard to check if foo.Length() is 0, before the loop?
So I personally don't think this is a good reason for using signed integers all the way.
Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.
Ok, that's good to have a specific value that means «error».
But then, what's wrong with something like UINT_MAX, for that specific value?
I'm actually asking this question because it may lead to some huge problems, usually when using third-party libraries.
In such a case, you often have to deal with signed and unsigned values.
Most of the time, people just don't care about the signedness, and just assign a, for instance, an unsigned int to a signed int, without checking the range.
I have to say I'm a bit paranoid with the compiler warning flags, so with my setup, such an implicit cast will result in a compiler error.
For that kind of stuff, I usually use a function or macro to check the range, and then assign using an explicit cast, raising an error if needed.
This just seems logical to me.
As a last example, as I'm also an Objective-C developer (note that this question is not related to Objective-C only):
- ( NSInteger )tableView: ( UITableView * )tableView numberOfRowsInSection: ( NSInteger )section;
For those not fluent with Objective-C, NSInteger is a signed integer.
This method actually retrieves the number of rows in a table view, for a specific section.
The result will never be a negative value (as the section number, by the way).
So why use a signed integer for this?
I really don't understand.
This is just an example, but I just always see that kind of stuff, with C, C++ or Objective-C.
So again, I'm just wondering if people just don't care about that kind of problems, or if there is finally a good and valid reason not to use unsigned integers for such cases.
Looking forward to hear your answers : )
a signed return value might yield more information (think error-numbers, 0 is sometimes a valid answer, -1 indicates error, see man read) ... which might be relevant especially for developers of libraries.
if you are worrying about the one extra bit you gain when using unsigned instead of signed then you are probably using the wrong type anyway. (also kind of "premature optimization" argument)
languages like python, ruby, jscript etc are doing just fine without signed vs unsigned. that might be an indicator ...
When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.
When I'm sure the value will never need to be negative, I then use an unsigned integer.
And I have to say this happen most of the time.
To carefully consider which type that is most suitable each time you declare a variable is very good practice! This means you are careful and professional. You should not only consider signedness, but also the potential max value that you expect this type to have.
The reason why you shouldn't use signed types when they aren't needed have nothing to do with performance, but with type safety. There are lots of potential, subtle bugs that can be caused by signed types:
The various forms of implicit promotions that exist in C can cause your type to change signedness in unexpected and possibly dangerous ways. The integer promotion rule that is part of the usual arithmetic conversions, the lvalue conversion upon assignment, the default argument promotions used by for example VA lists, and so on.
When using any form of bitwise operators or similar hardware-related programming, signed types are dangerous and can easily cause various forms of undefined behavior.
By declaring your integers unsigned, you automatically skip past a whole lot of the above dangers. Similarly, by declaring them as large as unsigned int or larger, you get rid of lots of dangers caused by the integer promotions.
Both size and signedness are important when it comes to writing rugged, portable and safe code. This is the reason why you should always use the types from stdint.h and not the native, so-called "primitive data types" of C.
So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?
I don't really think it is because they don't care, nor because they are lazy, even though declaring everything int is sometimes referred to as "sloppy typing" - which means sloppily picked type more than it means too lazy to type.
I rather believe it is because they lack deeper knowledge of the various things I mentioned above. There's a frightening amount of seasoned C programmers who don't know how implicit type promotions work in C, nor how signed types can cause poorly-defined behavior when used together with certain operators.
This is actually a very frequent source of subtle bugs. Many programmers find themselves staring at a compiler warning or a peculiar bug, which they can make go away by adding a cast. But they don't understand why, they simply add the cast and move on.
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
To me, this is just bad design
Indeed it is.
Once upon a time, down-counting loops would yield more effective code, because the compiler pick add a "branch if zero" instruction instead of a "branch if larger/smaller/equal" instruction - the former is faster. But this was at a time when compilers were really dumb and I don't believe such micro-optimizations are relevant any longer.
So there is rarely ever a reason to have a down-counting loop. Whoever made the argument probably just couldn't think outside the box. The example could have been rewritten as:
for(unsigned int i=0; i<foo.Length(); i++)
{
unsigned int index = foo.Length() - i - 1;
thing[index] = something;
}
This code should not have any impact on performance, but the loop itself turned a whole lot easier to read, while at the same time fixing the bug that your example had.
As far as performance is concerned nowadays, one should probably spend the time pondering about which form of data access that is most ideal in terms of data cache use, rather than anything else.
Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.
That's a poor argument. Good API design uses a dedicated error type for error reporting, such as an enum.
Instead of having some hobbyist-level API like
int do_stuff (int a, int b); // returns -1 if a or b were invalid, otherwise the result
you should have something like:
err_t do_stuff (int32_t a, int32_t b, int32_t* result);
// returns ERR_A is a is invalid, ERR_B if b is invalid, ERR_XXX if... and so on
// the result is stored in [result], which is allocated by the caller
// upon errors the contents of [result] remain untouched
The API would then consistently reserve the return of every function for this error type.
(And yes, many of the standard library functions abuse return types for error handling. This is because it contains lots of ancient functions from a time before good programming practice was invented, and they have been preserved the way they are for backwards-compatibility reasons. So just because you find a poorly-written function in the standard library, you shouldn't run off to write an equally poor function yourself.)
Overall, it sounds like you know what you are doing and giving signedness some thought. That probably means that knowledge-wise, you are actually already ahead of the people who wrote those posts and guides you are referring to.
The Google style guide for example, is questionable. Similar could be said about lots of other such coding standards that use "proof by authority". Just because it says Google, NASA or Linux kernel, people blindly swallow them no matter the quality of the actual contents. There are good things in those standards, but they also contain subjective opinions, speculations or blatant errors.
Instead I would recommend referring to real professional coding standards instead, such as MISRA-C. It enforces lots of thought and care for things like signedness, type promotion and type size, where less detailed/less serious documents just skip past it.
There is also CERT C, which isn't as detailed and careful as MISRA, but at least a sound, professional document (and more focused towards desktop/hosted development).
There is one heavy-weight argument against widely unsigned integers:
Premature optimization is the root of all evil.
We all have at least on one occasion been bitten by unsigned integers. Sometimes like in your loop, sometimes in other contexts. Unsigned integers add a hazard, even though a small one, to your program. And you are introducing this hazard to change the meaning of one bit. One little, tiny, insignificant-but-for-its-sign-meaning bit. On the other hand, the integers we work with in bread and butter applications are often far below the range of integers, more in the order of 10^1 than 10^7. Thus, the different range of unsigned integers is in the vast majority of cases not needed. And when it's needed, it is quite likely that this extra bit won't cut it (when 31 is too little, 32 is rarely enough) and you'll need a wider or an arbitrary-wide integer anyway. The pragmatic approach in these cases is to just use the signed integer and spare yourself the occasional underflow bug. Your time as a programmer can be put to much better use.
From the C FAQ:
The first question in the C FAQ is which integer type should we decide to use?
If you might need large values (above 32,767 or below -32,767), use long. Otherwise, if space is very important (i.e. if there are large arrays or many structures), use short. Otherwise, use int. If well-defined overflow characteristics are important and negative values are not, or if you want to steer clear of sign-extension problems when manipulating bits or bytes, use one of the corresponding unsigned types.
Another question concerns types conversions:
If an operation involves both signed and unsigned integers, the situation is a bit more complicated. If the unsigned operand is smaller (perhaps we're operating on unsigned int and long int), such that the larger, signed type could represent all values of the smaller, unsigned type, then the unsigned value is converted to the larger, signed type, and the result has the larger, signed type. Otherwise (that is, if the signed type can not represent all values of the unsigned type), both values are converted to a common unsigned type, and the result has that unsigned type.
You can find it here. So basically using unsigned integers, mostly for arithmetic conversions can complicate the situation since you'll have to either make all your integers unsigned, or be at the risk of confusing the compiler and yourself, but as long as you know what you are doing, this is not really a risk per se. However, it could introduce simple bugs.
And when it is a good to use unsigned integers? one situation is when using bitwise operations:
The << operator shifts its first operand left by a number of bits
given by its second operand, filling in new 0 bits at the right.
Similarly, the >> operator shifts its first operand right. If the
first operand is unsigned, >> fills in 0 bits from the left, but if
the first operand is signed, >> might fill in 1 bits if the high-order
bit was already 1. (Uncertainty like this is one reason why it's
usually a good idea to use all unsigned operands when working with the
bitwise operators.)
taken from here
And I've seen this somewhere:
If it was best to use unsigned integers for values that are never negative, we would have started by using unsigned int in the main function int main(int argc, char* argv[]). One thing is sure, argc is never negative.
EDIT:
As mentioned in the comments, the signature of main is due to historical reasons and apparently it predates the existence of the unsigned keyword.
Unsigned intgers are an artifact from the past. This is from the time, where processors could do unsigned arithmetic a little bit faster.
This is a case of premature optimization which is considered evil.
Actually, in 2005 when AMD introduced x86_64 (or AMD64, how it was then called), the 64 bit architecture for x86, they brought the ghosts of the past back: If a signed integer is used as an index and the compiler can not prove that it is never negative, is has to insert a 32 to 64 bit sign extension instruction - because the default 32 to 64 bit extension is unsigned (the upper half of a 64 bit register gets cleard if you move a 32 bit value into it).
But I would recommend against using unsigned in any arithmetic at all, being it pointer arithmetic or just simple numbers.
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
Any recent compiler will warn about such an construct, with condition ist always true or similar. With using a signed variable you avoid such pitfalls at all. Instead use ptrdiff_t.
A problem might be the c++ library, it often uses an unsigned type for size_t, which is required because of some rare corner cases with very large sizes (between 2^31 and 2^32) on 32 bit systems with certain boot switches ( /3GB windows).
There are many more, comparisons between signed and unsigned come to my mind, where the signed value automagically gets promoted to a unsigned and thus becomes a huge positive number, when it has been a small negative before.
One exception for using unsigned exists: For bit fields, flags, masks it is quite common. Usually it doesn't make sense at all to interpret the value of these variables as a magnitude, and the reader may deduce from the type that this variable is to be interpreted in bits.
The result will never be a negative value (as the section number, by the way). So why use a signed integer for this?
Because you might want to compare the return value to a signed value, which is actually negative. The comparison should return true in that case, but the C standard specifies that the signed get promoted to an unsigned in that case and you will get a false instead. I don't know about ObjectiveC though.

Are there well-known "profiles" of the C standard?

I write C code that makes certain assumptions about the implementation, such as:
char is 8 bits.
signed integral types are two's complement.
>> on signed integers sign-extends.
integer division rounds negative quotients towards zero.
double is IEEE-754 doubles and can be type-punned to and from uint64_t with the expected result.
comparisons involving NaN always evaluate to false.
a null pointer is all zero bits.
all data pointers have the same representation, and can be converted to size_t and back again without information loss.
pointer arithmetic on char* is the same as ordinary arithmetic on size_t.
functions pointers can be cast to void* and back again without information loss.
Now, all of these are things that the C standard doesn't guarantee, so strictly speaking my code is non-portable. However, they happen to be true on the architectures and ABIs I'm currently targeting, and after careful consideration I've decided that the risk they will fail to hold on some architecture that I'll need to target in the future is acceptably low compared to the pragmatic benefits I derive from making the assumptions now.
The question is: how do I best document this decision? Many of my assumptions are made by practically everyone (non-octet chars? or sign-magnitude integers? on a future, commercially successful, architecture?). Others are more arguable -- the most risky probably being the one about function pointers. But if I just list everything I assume beyond what the standard gives me, the reader's eyes are just going to glaze over, and he may not notice the ones that actually matter.
So, is there some well-known set of assumptions about being a "somewhat orthodox" architecture that I can incorporate by reference, and then only document explicitly where I go beyond even that? (Effectively such a "profile" would define a new language that is a superset of C, but it might not acknowledge that in so many words -- and it may not be a pragmatically useful way to think of it either).
Clarification: I'm looking for a shorthand way to document my choices, not for a way to test automatically whether a given compiler matches my expectations. The latter is obviously useful too, but does not solve everything. For example, if a business partner contacts us saying, "we're making a device based on Google's new G2015 chip; will your software run on it?" -- then it would be nice to be able to answer "we haven't worked with that arch yet, but it shouldn't be a problem if it has a C compiler that satisfies such-and-such".
Clarify even more since somebody has voted to close as "not constructive": I'm not looking for discussion here, just for pointers to actual, existing, formal documents that can simplify my documentation by being incorporated by reference.
I would introduce a STATIC_ASSERT macro and put all your assumptions in such asserts.
Unfortunately, not only is there a lack of standards for a dialect of C that combines the extensions which have emerged as de facto standards during the 1990s (two's-complement, universally-ranked pointers, etc.) but compilers trends are moving in the opposite direction. Given the following requirements for a function:
* Accept int parameters x,y,z:
* Return 0 if x-y is computable as "int" and is less than Z
* Return 1 if x-y is computable as "int" and is not less than Z
* Return 0 or 1 if x-y is not computable */
The vast majority of compilers in the 1990s would have allowed:
int diffCompare(int x, int y, int z)
{ return (x-y) >= z; }
On some platforms, in cases where the difference between x-y was not computable as int, it would be faster to compute a "wrapped" two's-complement value of x-y and compare that, while on others it would be faster to perform the calculation using a type larger than int and compare that. By the late 1990s, however, nearly every C compiler would implement the above code to use one of whichever one of those approaches would have been more efficient on its hardware platform.
Since 2010, however, compiler writers seem to have taken the attitude that if computations overflow, compilers shouldn't perform the calculations in whatever fashion is normal for their platform and let what happens happens, nor should they recognizably trap (which would break some code, but could prevent certain kinds of errant program behavior), but instead they should overflows as an excuse to negate laws of time and causality. Consequently, even if a programmer would have been perfectly happy with any behavior a 1990s compiler would have produced, the programmer must replace the code with something like:
{ return ((long)x-y) >= z; }
which would greatly reduce efficiency on many platforms, or
{ return x+(INT_MAX+1U)-y >= z+(INT_MAX+1U); }
which requires specifying a bunch of calculations the programmer doesn't actually want in the hopes that the optimizer will omit them (using signed comparison to make them unnecessary), and would reduce efficiency on a number of platforms (especially DSPs) where the form using (long) would have been more efficient.
It would be helpful if there were standard profiles which would allow programmers to avoid the need for nasty horrible kludges like the above using INT_MAX+1U, but if trends continue they will become more and more necessary.
Most compiler documentation includes a section that describes the specific behavior of implementation-dependent features. Can you point to that section of the gcc or msvc docs to describe your assumptions?
You can write a header file "document.h" where you collect all your assumptions.
Then, in every file that you know that non-standard assumptions are made, you can #include such a file.
Perhaps "document.h" would not have real sentences at all, but only commented text and some macros.
// [T] DOCUMENT.H
//
#ifndef DOCUMENT_H
#define DOCUMENT_H
// [S] 1. Basic assumptions.
//
// If this file is included in a compilation unit it means that
// the following assumptions are made:
// [1] A char has 8 bits.
// [#]
#define MY_CHARBITSIZE 8
// [2] IEEE 754 doubles are addopted for type: double.
// ........
// [S] 2. Detailed information
//
#endif
The tags in brackets: [T] [S] [#] [1] [2] stand for:
* [T]: Document Title
* [S]: Section
* [#]: Print the following (non-commented) lines as a code-block.
* [1], [2]: Numbered items of a list.
Now, the idea here is to use the file "document.h" in a different way:
To parse the file in order to convert the comments in "document.h" to some printable document, or some basic HTML.
Thus, the tags [T] [S] [#] etc., are intended to be interpreted by a parser that convert any comment into an HTML line of text (for example), and generate <h1></h1>, <b></b> (or whatever you want), when a tag appears.
If you keep the parser as a simple and small program, this can give you a short hand to handle this kind of documentation.

Catching overflow of left shift of constant 1 using compiler warning?

We're writing code inside the Linux kernel so, try as I might, I wasn't able to get PC-Lint/Flexelint working on Linux kernel code. Just too many built-in symbols etc. But that's a side issue.
We have any number of compilers, starting with gcc, but others also. Their warnings options have been getting stronger over time, to where they are pretty strong static analysis tools too.
Here is what I want to catch. Yes, I know it violates some things that are easy to catch in code review, such as "no magic numbers", and "beware of bit shifting", but that's only if you happen to look at that section of code. Anyway, here it is:
unsigned long long foo;
unsigned long bar;
[... lots of other code ...]
foo = ~(foo + (1<<bar));
Further UPDATED problem description -- even with bar limited to 16, still a problem. Clarifying, the problem is implicit int type of constant that, unplanned, makes the complex expression violate the rule that all calculations be carried out in the same size and signedness.
Problem: '1' is not long long, but, as a small-value constant, defaults to an int. Therefore even if bar's actual value never exceeds, say, 16, still the (1<<bar) expression will overflow and ruin the entire calculation.
Possibly correct solution: write 1ULL instead.
Is there a well-known compiler and compiler warning flag that will point out this (revised) problem?
I am not sure what criteria you are thinking of to flag
this construction as suspicious. There is clearly
something wrong if the value of bar is as large as than
the size (in bits) of an int, but usually the compiler
wouldn't know that.
From the point of view of a heuristic, bug-finding tool,
having good patterns to separate likely bugs from
normal constructions is key to avoiding too many false
positives (which make users hate the tool and refuse to
use it).
The Open Source tool in my URL flags logical shifts by a number larger
than the size of the type, but it is primarily a verification
tool for critical embedded software and expect a lot of work
to appropriate it if you intend to use it on the Linux kernel
with its linked structures and other difficulties.

Resources