What is gcc doing under a bitwise complement, and a negative left shift count when constrained by integer size?

What is gcc doing under a bitwise complement, and a negative left shift count when constrained by integer size? - c

I'm trying to eliminate as many gcc warnings as possible in some old code, trying to get "cleaner" code by compiling it with a more recent toolchain.
The code writes and reads registers (or memory) on ARMv6 hardware, I can't say I completely understand what it actually does but that's the gist of it for this particular line of the code in question. Side note, all the storage types are uint32.
When looking at it on the C source, it's just a bunch of macros with only 1 value being passed on, for example:
writel(readl(ADDR_GPIO_IOTR1)&(~(3<<IOTR_GPIO(26))),ADDR_GPIO_IOTR1);
That line and many others where that 26 is replaced by other values (30, 58, 59, which I presume are GPIO "pins") are generating the warnings left shift count is negative.
When looking at the preprocessed code, the bit (~(3<<IOTR_GPIO(26))) turns out to be:
(~(3<<(~(3<<((26%16)<<1)))))
That is clearly a negative left shift, no matter the value passed to the macro, that bitwise complement operator is going to turn the result of the shift 3<<anything into a negative number.
Considering that all those 3 numbers are inferred to be of type "int" (they are signed), the result of that operation should always be 0xffffffff, right?
So, IOTR_GPIO(GPIO) is defined as (~(3<<((GPIO%16)<<1))). I wrote a testcase to see what the compiler will do in each step for any value of GPIO I pass on the commandline, this is what I get for a run with 26 as the value of GPIO:
26%16=0x0000000a [0b1010]
0xa << 1=0x00000014 [0b10100]
3 << 0x14=0x00300000 [0b1100000000000000000000]
~(0x300000)=0xffcfffff [0b11111111110011111111111111111111]
So far, a negative int32, as expected.
3 << 0xffcfffff=0x80000000 [0b10000000000000000000000000000000]
Now what is going on here? I'm pretty sure that shift should have zeroed out everything.
~(0x80000000)=0x7fffffff [0b1111111111111111111111111111111]
So, no, I'm not getting 0xffffffff after all, regardless, I still get 0x7fffffff for almost all values (it changes when 0 > GPIO < 3.
However, here is what happens when I print the result of the whole
preprocessed code with a fixed value:
(~(3<<(~(3<<((26%16)<<1)))))=
0xffffffff [0b11111111111111111111111111111111]
The clear difference is that for my step-by-step test the compiler does not know the value of GPIO beforehand, as I'm passing that as an argument to my test program. When printing the result of the preprocessed code the compiler has optimized out the value at compile time and returns what I had expected.
So why isn't that negative shift returning all zeros for my testcase?, besides the fact that negative shifts are undefined behavior?
A question to myself is "how the heck is this actually working?" I truly don't expect an answer to that.
But I would like at least an opinion, considering:
I have replicated the compilation of this bit of code on a testcase 1:1 (same toolchain, same gcc arguments) of the running code.
I even ran the testcase on the ARMv6 hardware in question and I got the exact same results as on a modern gcc-5.3.0 on x86_64 (with or without -m32, as I'm storing everything in uint32_t).
There are no other versions of these lines anywhere to be found in history, as far as I can deduce, they were added to "support a new chip" (guessing from the couple #ifdef around this).
What would the intention of the programmer in this case could have been? Even the original toolchain spits the exact same warning, I don't think it was ignored.
What I may be really asking is "how was this intentional"?.
Might it be that at some other point (linking perhaps?) something changes and a different result is being used? Kind of hard to duplicate/testcase/inspect that I think. But I'm going to put a printf there somewhere and run it just to make sure I'm not going crazy.
The testcase I made: negative_shift_test.c
The original, unmodified messed up code: starts here
The complete, indented preprocessed line (#L3093 in the linked code above):
({
do {
__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0) : "memory");
outer_sync();
} while (0);
(
(void)(
(void)0,
*(volatile unsigned int *)(((((0x088CE000) - 0x00000000 + 0xf0000000) + 0x004))) =
(( u32) (( __le32)(__u32)(
({
u32 __v = ({
u32 __v = (( __u32)(__le32)(( __le32) ((void)0, *(volatile unsigned int *)(((((0x088CE000) - 0x00000000 + 0xf0000000) + 0x004))))));
__v;
});
__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0) : "memory");
__v;
}) & (~(3<<(~(3<<((26%16)<<1))))) /* sits here unmolested */
)))
)
);
});
(read address, bitwise & (AND) the result of the read and write that back to the same address, if I understood it correctly).

One side of the problem is just this:
I wrote a testcase
As you said yourself, you wrote a testcase for an unreadable piece of code that happens to work despite hitting undefined behavior. That's really not a big surprise that your testcase does something unexpected, and different. Touching anything in that code can dispel the magic, you can't just deconstruct it and run it bit by bit. Changing the toolchain can also break it, BTW.
Without digging further into what gcc might do with the code, assuming that the facts are all true, this question is unanswerable because it contains a contradiction:
So why isn't that negative shift returning all zeros for my testcase?,
besides the fact that negative shifts are undefined behavior?
You seem to expect undefined behavior to have some defined behavior...
OTOH, the question below is easy to answer:
A question to myself is "how the heck is this actually working?" I
truly don't expect an answer to that.
The answer is "why not ?". UB can be the behavior that the author expects, as it's "defined" (hum) as any behavior.
So the actual problem is this:
The code writes and reads registers (or memory) on ARMv6 hardware, I
can't say I completely understand what it actually does
You can't refactor it without understanding it. That involves finding the author, and torturing him (or her) if necessary. No need to torture other innocent people.
PS oh and that question is easy too:
What would the intention of the programmer in this case could have
been? Even the original toolchain spits the exact same warning, I
don't think it was ignored.
That's called an evil programmer. One more reason to find him.
PPS I'm betting on a bug, the author forgot that IOTR_GPIO already does the ~(3<< shift and does it twice. The to-infinity-and-beyond 2nd shift doesn't make sense.

First and foremost, the source in question belongs to the opensourced kernel for a specific Samsung Galaxy model (the GT-S5367).
That being said, that model belongs to the bcm21553 family of boards, for which there were many source code zip packages released by Samsung.
In that family the S5360 is a whole "sub" family with many variants, the Totoro board. The S5367 is also a Totoro board.
When looking for different versions of the same file to spot differences in these lines that performed the negative left shifts, I restricted my search to the S5360 alone, suffice to say I found no differences, every single source had the same bug.
After a while testing with many printk() in the kernel and looking at the generated output, I decided to search on github for the dubious macro itself IOTR_GPIO.
In doing so I found many duplicates of the macro definition on source derived from my own sub family (plenty of board-totoro.c).
But then, to my surprise, a different board, the Torino (still based on the bcm21553 had the same macro definition but without the extra negative shift.
So (I'm assuming) this ended up being just a copypasta bug. I believe the intention was to move the mask to (or from) the macro definition, but the programmer forgot to remove the code off the other side.
The code worked fine because all it does is read a value that is flattened against a mask (created by this macro) and then write it back on the same spot.
Since the actual, working, mask just positions two zero bits across depending on the GPIO pin, when the value being read (and bitwise &'ded) has those two bits also cleared then the full, bogus, 0xffffffff mask results in no difference against the proper mask, and thus the code works fine, even with such a nasty bug in place.
TL;DR, as #ilya pointed out in a comment, the correct macro definition is:
#define IOTR_GPIO(GPIO) ((GPIO%16)<<1)
No negative shift there to worry about, the bitwise complement is done afterwards to create the actual mask and not to shift bits again.
With that change the code compiles without warnings and works just fine as before.
PS Thanks #ilya for helping the brainstorming out :+).

Related

C speed of comparison: Equals "==" vs Bitwise and "&"

Suppose I have an integer that is a power of 2, eg. 1024:
int a = 1 << 10; //works with any power of 2 no.
Now I want to check whether another integer b is the same as a. Which is faster/better (especially on weak embedded systems):
if (b == a) {}
or
if (b & a) {}
?
Sorry if this is a noob question, but couldn't find an answer using the search.
edit: thanks for many insightful answers. I could select only one of them, but all of them are welcome.

These operations are not even equivalent, because a & b will be false when both a and b are 0.
So I'd suggest to express the semantics that you want (i.e. a == b) and let the compiler to the optimization.
If you then measuer that you have performance issues at that point, then you can start analyzing/optimizing...

The short answer is this - it depends on what sort of things you're comparing. However, in this case, I'll assume that you're comparing two variables to each other (as opposed to a variable and an immediate, etc.)
This website, although rather old, studied how many clock cycles different instructions took on the x86 platform. The two instructions we're interested in here are the "AND" instruction and the "CMP" instruction (which the compiler uses for & and == respectively). What we can see here is that both of these instructions take about 1/3 of a cycle - that is to say, you can execute 3 of them in 1 cycle on average. Compare this to the "DIV" instruction which (in 1996) took 23 cycles to execute.
However, this omits one important detail. An "AND" instruction is not sufficient to complete the behavior you're looking for. In fact, a brief compilation on x86_64 suggests that you need both an "AND" and a "TEST" instruction for the "&" version, while "==" simply uses the "CMP" instruction. Because all these instructions are otherwise equivalent in IPC, the "==" will in fact be slightly faster...as of 1996.
Nowadays, processors optimize so well at the bare metal layer that you're unlikely to notice a difference. That said, if you wanted to see for sure...simply write a test program and find out for yourself.
As noted above though, even in the case that you have a power of 2, these instructions are still not equivalent, since it doesn't work for 0. Well...I guess technically zero ISN'T a power of 2. :) However you want to spin it though, use "==".

An X86 CPU sets a flag according to how the result of any operation compares to zero.
For the ==, your compiler will either use a dedicated compare instruction or a subtraction, setting this flag in both cases. The if() is then implemented by a jump that is conditional on this bit.
For the &, another instructions is used, the logical bitwise and instruction. That too sets the flag appropriately. So, again, the next instruction will be the conditional branch.
So, the question boils down to: Is there a performance difference between a subtraction and a bitwise and instruction? And the answer is "no" on any sane architecture. Both instructions use the same ALU, both set the same flags, and this ALU is typically designed to perform a subtraction in a single clock cycle.
Bottom line: Write readable code, and don't try to microoptimize what cannot be optimized.

Translate C program to other programming languages

I am trying to translate a C program. The destination language doesn't really matter, I am just trying to understand what every single part of the program is doing.
I cannot find any detail about:
variable=1;
while(variable);
I understand that this is a loop and that is true (I have read similar questions on stack overflow where a code was actually executed) but in this case there is no code related to this while. So I am wondering, is the program "sleeping" - while this while is executing?
Then, another part I don't understand is:
variable=0;
variable=variable^0x800000;
I believe that value should be 24bits but is this really needed in any other programming language that is not low level as C?
Many thanks

while(variable); implements a spin-lock; i.e. this thread will remain at this statement until variable is 0. I've introduced the term to help you search for a good technique in your new language.
It obviously burns the CPU, but can be quite an efficient way of doing this if only used for a few clock cycles. For it to work well, variable needs to be qualified with volatile.
variable = variable ^ 0x800000; is an XOR operation, actually a single bit toggle in this case. (I would have preferred to see variable ^= 0x800000 in multi-threaded code.) Its exact use is probably explainable from its context. Note that the arguments of the XOR are promoted to int if they are smaller than that. It's doubtful that variable^0x800000 is a 24 bit type unless int is that size on your platform (unlikely but possible).

I am trying to translate a C program.
Don't translate a C program, unless you are writing a compiler (sometimes called a transpiler - or source to source compiler -, if translating to some other programming language different of assembler) which would do such task. And you'll need a lot of work (at least several months for a naive compiler à la TinyCC, and more probably many dozens of years)
Think in C and try to understand its semantics (much more important than its syntax).
while(variable);
that loop has an empty body. It is more readable to make that empty body apparent (semantics remain the same):
while(variable) {};
Since the body (and the test) of the loop don't change variable (it has no observable side-effect) the loop will run indefinitely as soon as the initial value of variable is non-zero. This will heat your processor.
But you might have declared that variable as volatile and then have something external changing it.
variable=variable^0x800000;
The ^ is a bitwise XOR. You are toggling (replacing 0 with 1 and 1 with 0) a single bit (the 23rd one, IIRC)

To answer your second question:
variable=0;
variable=variable^0x800000;
This operation is a bitwise operation called XOR.
An XOR operation is usually used to toggle bits regardless of it's previous state:
0 ^ 1 = 1
1 ^ 1 = 0

AVR: if statement

I am new in AVR programming. I would like to control a variable (uint8_t received_msg) if it is equal to 0xFF. would it be correct to do:
if (!(received_msg ^ 0xFF))
or do I need to compare bit by bit
uint8_t test = 0;
test = received_msg ^ 0xFF
for (i =0; i<8; i++){
test = 0 & (1<<received_msg)
}
if(test==0)

If you want to know if a variable is equal to 0xff, just test for equality:
if (received_message == 0xff)

Your question had fairly little to do with the AVR but some mistaken ideas about how compilers and microcontrollers work. That's not a complaint that it's a bad question - any question that helps you learn is good!
(TLDR: "use bitwise operators" is only in contrast to AVR specific stuff, feel absolutely free to use all your normal operations.)
First, you've expressed what you want to do - an equality test - in English. The whole point of a programming language like C is to allow you to express computed operations in a fairly readable manner, so use the most obvious (and thus clear) translation of received_msg == 0xFF - it is the compiler's job to convert this into code for the specific computer (AVR), and even if it does a horrible job of it it will waste no more than a few microseconds. (It doesn't, but if you make the code convoluted enough it can fail to do an excellent job.)
Second, you've attempted to express the same operation - comparing every bit against a set value, and collecting the result to see if they were all equal - in two other manners. This gets tricky both to read and write, as is shown by the bugs in the second version, but more importantly the second version shows a misunderstanding of what C's bitwise operators do. Bitwise here means each bit of a value is processed independent of the other bits; they are still all processed. Therefore splitting it into a loop is not needed, and only makes the job of both programmer and compiler harder. The technique used to make bitwise operators only affect single bits, not to be confused with which they operate on, is known as masking; it relies on properties like "0 or n = n", "1 and n = n", and "0 xor n = n".
I'm also getting the impression this was based around the idea that a microcontroller like the AVR would be working on individual bits all the time. This is extremely rare, but frequently emulated by PLCs. What we do have is operations making single bit work less costly than on general purpose CPUs. For instance, consider "PORTB |= 1<<3". This can be read as a few fundamental operations:
v0 := 1 // load immediate
v1 := 3
v2 := v0 shiftleft v1 // shift left
v3 := PORTB // load I/O register
v4 := v3 or v2
PORTB := v4 // store back to I/O register
This interpretation would be an extremely reduced instruction set, where loads and stores never combine with ALU operations such as shift and or. You may even get such code out of the compiler if you ask it not to optimize at all. But since it's such a common operation for a microcontroller, the AVR has a single instruction to do this without spending registers on holding v0-v4:
SBI PORTB, 3 // (set bit in I/O register)
This brings us from needing two registers (from reusing vN which are no longer needed) and six instructions to zero registers and one instruction. Further gains are possible because once it's a single instruction, one can use a skip instead of a branch. But it relies on a few things being known, such as 1<<3 setting only a single, fixed bit, and PORTB being among the lowest 32 I/O registers. If the compiler did not know these things, it could never use the SBI instructions, and there was such a time. This is why we have the advice "use the bitwise operators" - you no longer need to write sbi(PORTB,PB3);, which is inobvious to people who don't know the AVR instruction set, but can now write PORTB |= 1<<3; which is standard C, and therefore clearer while being just as effective. Arguably better macro naming might make more readable code too, but many of these macros came along as typing shorthands instead - for instance _BV(x) which is equal to 1<<x.
Sadly some of the standard C formulations become rather tricky, like clearing bit N: port &= ~(1<<N); It makes a pretty good case for a "clear_bit(port, bit)" macro, like Arduino's digitalWrite. Some microcontrollers (such as 8051) provide specific addresses for single bit work, and some compilers provide syntax extensions such as port.3. I sometimes wonder why AVR Libc doesn't declare bitfields for bit manipulation. Pardon the rant. There also remain some optimizations the compiler doesn't know of, such as converting PORTB ^= x; into PINB = x; (which really looks weird - PIN registers aren't writable, so they used that operation for another function).
See also the AVR Libc manual section on bit manipulation, particularly "Porting programs that use the deprecated sbi/cbi macros".

You can also try useful switch(){ case } statement like :
#define OTHER_CONST_VALUE 0x19
switch(received_msg){
case 0xff:
do_this();
break;
case 0x0f:
do_that();
break;
case OTHER_CONST_VALUE:
do_other_thing();
break;
case 1:
case 2:
received_1_or_2();
break;
default:
received_somethig_else();
break;
}
this code will execute command depending on value of received_msg, it is important to place constant value after case word, and be careful with break statement it tells when jump off from { } block.

I'm unsure of what received_msg will be representing. If it is a numerical value, than by all means use a switch-case, if-else or other structure of comparison; no need for a bitmask.
However, if received_msg contains binary data and you only want to look at certain elements and exclude others, a bitmask would be the appropriate approach.

Calculating an 8-bit CRC with the C preprocessor?

I'm writing code for a tiny 8-bit microcontroller with only a few bytes of RAM. It has a simple job which is to transmit 7 16-bit words, then the CRC of those words. The values of the words are chosen at compile time. The CRC specifically is "remainder of division of
word 0 to word 6 as unsigned number divided by the polynomial x^8+x²+x+1 (initial value 0xFF)."
Is it possible to calculate the CRC of those bytes at compile time using the C preprocessor?
#define CALC_CRC(a,b,c,d,e,f,g) /* what goes here? */
#define W0 0x6301
#define W1 0x12AF
#define W2 0x7753
#define W3 0x0007
#define W4 0x0007
#define W5 0x5621
#define W6 0x5422
#define CRC CALC_CRC(W0, W1, W2, W3, W4, W5, W6)

It is possible to design a macro which will perform a CRC calculation at compile time. Something like
// Choosing names to be short and hopefully unique.
#define cZX((n),b,v) (((n) & (1 << b)) ? v : 0)
#define cZY((n),b, w,x,y,z) (cZX((n),b,w)^CzX((n),b+1,x)^CzX((n),b+2,y)^cZX((n),b+3,z))
#define CRC(n) (cZY((n),0,cZ0,cZ1,cZ2,cZ3)^cZY((n),4,cZ4,cZ5,cZ6,cZ7))
should probably work, and will be very efficient if (n) can be evaluated as a compile-time constant; it will simply evaluate to a constant itself. On the other hand, if n is an expression, that expression will end up getting recomputed eight times. Even if n is a simple variable, the resulting code will likely be significantly larger than the fastest non-table-based way of writing it, and may be slower than the most compact way of writing it.
BTW, one thing I'd really like to see in the C and C++ standard would be a means of specifying overloads which would be used for functions declared inline only if particular parameters could be evaluated as compile-time constants. The semantics would be such that there would be no 'guarantee' that any such overload would be used in every case where a compiler might be able to determine a value, but there would be a guarantee that (1) no such overload would be used in any case where a "compile-time-const" parameter would have to be evaluated at runtime, and (2) any parameter which is considered a constant in one such overload will be considered a constant in any functions invoked from it. There are a lot of cases where a function could written to evaluate to a compile-time constant if its parameter is constant, but where run-time evaluation would be absolutely horrible. For example:
#define bit_reverse_byte(n) ( (((n) & 128)>>7)|(((n) & 64)>>5)|(((n) & 32)>>3)|(((n) & 16)>>1)|
(((n) & 8)<<1)|(((n) & 4)<<3)|(((n) & 2)<<5)|(((n) & 1)<<7) )
#define bit_reverse_word(n) (bit_reverse_byte((n) >> 8) | (bit_reverse_byte(n) << 8))
A simple rendering of a non-looped single-byte bit-reverse function in C on the PIC would be about 17-19 single-cycle instructions; a word bit-reverse would be 34, or about 10 plus a byte-reverse function (which would execute twice). Optimal assembly code would be about 15 single-cycle instructions for byte reverse or 17 for word-reverse. Computing bit_reverse_byte(b) for some byte variable b would take many dozens of instructions totalling many dozens of cycles. Computing bit_reverse_word(w) for some 16-bit wordw` would probably take hundreds of instructions taking hundreds or thousands of cycles to execute. It would be really nice if one could mark a function to be expanded inline using something like the above formulation in the scenario where it would expand to a total of four instructions (basically just loading the result) but use a function call in scenarios where inline expansion would be heinous.

The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or of all those words. The result is appended to the message as an extra word.
To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum; if the result is not a word with n zeros, the receiver knows that a transmission error occurred.
(souce: wiki)
In your example:
#define CALC_LRC(a,b,c,d,e,f) ((a)^(b)^(c)^(d)^(e)^(f))

Disclaimer: this is not really a direct answer, but rather a series of questions and suggestions that are too long for a comment.
First Question: Do you have control over both ends of the protocol, e.g. can you choose the checksum algorithm by means of either yourself or a coworker controlling the code on the other end?
If YES to question #1:
You need to evaluate why you need the checksum, what checksum is appropriate, and the consequences of receiving a corrupt message with a valid checksum (which factors into both the what & why).
What is your transmission medium, protocol, bitrate, etc? Are you expecting/observing bit errors? So for example, with SPI or I2C from one chip to another on the same board, if you have bit errors, it's probably the HW engineers fault or you need to slow the clock rate, or both. A checksum can't hurt, but shouldn't really be necessary. On the other hand, with an infrared signal in a noisy environment, and you'll have a much higher probability of error.
Consequences of a bad message is always the most important question here. So if you're writing the controller for digital room thermometer and sending a message to update the display 10x a second, one bad value ever 1000 messages has very little if any real harm. No checksum or a weak checksum should be fine.
If these 6 bytes fire a missile, set the position of a robotic scalpel, or cause the transfer of money, you better be damn sure you have the right checksum, and may even want to look into a cryptographic hash (which may require more RAM than you have).
For in-between stuff, with noticeable detriment to performance/satisfaction with the product, but no real harm, its your call. For example, a TV that occasionally changes the volume instead of the channel could annoy the hell out of customers--more so than simply dropping the command if a good CRC detects an error, but if you're in the business of making cheap/knock-off TVs that might be OK if it gets product to market faster.
So what checksum do you need?
If either or both ends have HW support for a checksum built into the peripheral (fairly common in SPI for example), that might be a wise choice. Then it becomes more or less free to calculate.
An LRC, as suggested by vulkanino's answer, is the simplest algorithm.
Wikipedia has some decent info on how/why to choose a polynomial if you really need a CRC:
http://en.wikipedia.org/wiki/Cyclic_redundancy_check
If NO to question #1:
What CRC algorithm/polynomial does the other end require? That's what you're stuck with, but telling us might get you a better/more complete answer.
Thoughts on implementation:
Most of the algorithms are pretty light-weight in terms of RAM/registers, requiring only a couple extra bytes. In general, a function will result in better, cleaner, more readable, debugger-friendly code.
You should think of the macro solution as an optimization trick, and like all optimization tricks, jumping to them to early can be a waste of development time and a cause of more problems than it's worth.
Using a macro also has some strange implications you may not have considered yet:
You are aware that the preprocessor can only do the calculation if all the bytes in a message are fixed at compile time, right? If you have a variable in there, the compiler has to generate code. Without a function, that code will be inlined every time it's used (yes, that could mean lots of ROM usage). If all the bytes are variable, that code might be worse than just writing the function in C. Or with a good compiler, it might be better. Tough to say for certain. On the other hand, if a different number of bytes are variable depending on the message being sent, you might end up with several versions of the code, each optimized for that particular usage.

Catching overflow of left shift of constant 1 using compiler warning?

We're writing code inside the Linux kernel so, try as I might, I wasn't able to get PC-Lint/Flexelint working on Linux kernel code. Just too many built-in symbols etc. But that's a side issue.
We have any number of compilers, starting with gcc, but others also. Their warnings options have been getting stronger over time, to where they are pretty strong static analysis tools too.
Here is what I want to catch. Yes, I know it violates some things that are easy to catch in code review, such as "no magic numbers", and "beware of bit shifting", but that's only if you happen to look at that section of code. Anyway, here it is:
unsigned long long foo;
unsigned long bar;
[... lots of other code ...]
foo = ~(foo + (1<<bar));
Further UPDATED problem description -- even with bar limited to 16, still a problem. Clarifying, the problem is implicit int type of constant that, unplanned, makes the complex expression violate the rule that all calculations be carried out in the same size and signedness.
Problem: '1' is not long long, but, as a small-value constant, defaults to an int. Therefore even if bar's actual value never exceeds, say, 16, still the (1<<bar) expression will overflow and ruin the entire calculation.
Possibly correct solution: write 1ULL instead.
Is there a well-known compiler and compiler warning flag that will point out this (revised) problem?

I am not sure what criteria you are thinking of to flag
this construction as suspicious. There is clearly
something wrong if the value of bar is as large as than
the size (in bits) of an int, but usually the compiler
wouldn't know that.
From the point of view of a heuristic, bug-finding tool,
having good patterns to separate likely bugs from
normal constructions is key to avoiding too many false
positives (which make users hate the tool and refuse to
use it).
The Open Source tool in my URL flags logical shifts by a number larger
than the size of the type, but it is primarily a verification
tool for critical embedded software and expect a lot of work
to appropriate it if you intend to use it on the Linux kernel
with its linked structures and other difficulties.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight