Why is this a warning? I think there are many cases when is more clear to use multi-char int constants instead of "no meaning" numbers or instead of defining const variables with same value. When parsing wave/tiff/other file types is more clear to compare the read values with some 'EVAW', 'data', etc instead of their corresponding values.
Sample code:
int waveHeader = 'EVAW';
Why does this give a warning?
According to the standard (§6.4.4.4/10)
The value of an integer character constant containing more than one
character (e.g., 'ab'), [...] is implementation-defined.
long x = '\xde\xad\xbe\xef'; // yes, single quotes
This is valid ISO 9899:2011 C. It compiles without warning under gcc with -Wall, and a “multi-character character constant” warning with -pedantic.
From Wikipedia:
Multi-character constants (e.g. 'xy') are valid, although rarely
useful — they let one store several characters in an integer (e.g. 4
ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one).
Since the order in which the characters are packed into one int is not
specified, portable use of multi-character constants is difficult.
For portability sake, don't use multi-character constants with integral types.
This warning is useful for programmers that would mistakenly write 'test' where they should have written "test".
This happen much more often than programmers that do actually want multi-char int constants.
If you're happy you know what you're doing and can accept the portability problems, on GCC for example you can disable the warning on the command line:
-Wno-multichar
I use this for my own apps to work with AVI and MP4 file headers for similar reasons to you.
Even if you're willing to look up what behavior your implementation defines, multi-character constants will still vary with endianness.
Better to use a (POD) struct { char[4] }; ... and then use a UDL like "WAVE"_4cc to easily construct instances of that class
Simplest C/C++ any compiler/standard compliant solution, was mentioned by #leftaroundabout in comments above:
int x = *(int*)"abcd";
Or a bit more specific:
int x = *(int32_t*)"abcd";
One more solution, also compliant with C/C++ compiler/standard since C99 (except clang++, which has a known bug):
int x = ((union {char s[5]; int number;}){"abcd"}).number;
/* just a demo check: */
printf("x=%d stored %s byte first\n", x, x==0x61626364 ? "MSB":"LSB");
Here anonymous union is used to give a nice symbol-name to the desired numeric result, "abcd" string is used to initialize the lvalue of compound literal (C99).
If you want to disable this warning it is important to know that there are two related warning parameters in GCC and Clang: GCC Compiler options -wno-four-char-constants and -wno-multichar
Related
I switched to fixed-length integer types in my projects mainly because they help me think about integer sizes more clearly when using them. Including them via #include <inttypes.h> also includes a bunch of other macros like the printing macros PRIu32, PRIu64,...
To assign a constant value to a fixed length variable I can use macros like UINT32_C() and INT32_C(). I started using them whenever I assigned a constant value.
This leads to code similar to this:
uint64_t i;
for (i = UINT64_C(0); i < UINT64_C(10); i++) { ... }
Now I saw several examples which did not care about that. One is the stdbool.h include file:
#define bool _Bool
#define false 0
#define true 1
bool has a size of 1 byte on my machine, so it does not look like an int. But 0 and 1 should be integers which should be turned automatically into the right type by the compiler. If I would use that in my example the code would be much easier to read:
uint64_t i;
for (i = 0; i < 10; i++) { ... }
So when should I use the fixed length constant macros like UINT32_C() and when should I leave that work to the compiler(I'm using GCC)? What if I would write code in MISRA C?
As a rule of thumb, you should use them when the type of the literal matters. There are two things to consider: the size and the signedness.
Regarding size:
An int type is guaranteed by the C standard values up to 32767. Since you can't get an integer literal with a smaller type than int, all values smaller than 32767 should not need to use the macros. If you need larger values, then the type of the literal starts to matter and it is a good idea to use those macros.
Regarding signedness:
Integer literals with no suffix are usually of a signed type. This is potentially dangerous, as it can cause all manner of subtle bugs during implicit type promotion. For example (my_uint8_t + 1) << 31 would cause an undefined behavior bug on a 32 bit system, while (my_uint8_t + 1u) << 31 would not.
This is why MISRA has a rule stating that all integer literals should have an u/U suffix if the intention is to use unsigned types. So in my example above you could use my_uint8_t + UINT32_C(1) but you can as well use 1u, which is perhaps the most readable. Either should be fine for MISRA.
As for why stdbool.h defines true/false to be 1/0, it is because the standard explicitly says so. Boolean conditions in C still use int type, and not bool type like in C++, for backwards compatibility reasons.
It is however considered good style to treat boolean conditions as if C had a true boolean type. MISRA-C:2012 has a whole set of rules regarding this concept, called essentially boolean type. This can give better type safety during static analysis and also prevent various bugs.
It's for using smallish integer literals where the context won't result in the compiler casting it to the correct size.
I've worked on an embedded platform where int is 16 bits and long is 32 bits. If you were trying to write portable code to work on platforms with either 16-bit or 32-bit int types, and wanted to pass a 32-bit "unsigned integer literal" to a variadic function, you'd need the cast:
#define BAUDRATE UINT32_C(38400)
printf("Set baudrate to %" PRIu32 "\n", BAUDRATE);
On the 16-bit platform, the cast creates 38400UL and on the 32-bit platform just 38400U. Those will match the PRIu32 macro of either "lu" or "u".
I think that most compilers would generate identical code for (uint32_t) X as for UINT32_C(X) when X is an integer literal, but that might not have been the case with early compilers.
Why is this a warning? I think there are many cases when is more clear to use multi-char int constants instead of "no meaning" numbers or instead of defining const variables with same value. When parsing wave/tiff/other file types is more clear to compare the read values with some 'EVAW', 'data', etc instead of their corresponding values.
Sample code:
int waveHeader = 'EVAW';
Why does this give a warning?
According to the standard (§6.4.4.4/10)
The value of an integer character constant containing more than one
character (e.g., 'ab'), [...] is implementation-defined.
long x = '\xde\xad\xbe\xef'; // yes, single quotes
This is valid ISO 9899:2011 C. It compiles without warning under gcc with -Wall, and a “multi-character character constant” warning with -pedantic.
From Wikipedia:
Multi-character constants (e.g. 'xy') are valid, although rarely
useful — they let one store several characters in an integer (e.g. 4
ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one).
Since the order in which the characters are packed into one int is not
specified, portable use of multi-character constants is difficult.
For portability sake, don't use multi-character constants with integral types.
This warning is useful for programmers that would mistakenly write 'test' where they should have written "test".
This happen much more often than programmers that do actually want multi-char int constants.
If you're happy you know what you're doing and can accept the portability problems, on GCC for example you can disable the warning on the command line:
-Wno-multichar
I use this for my own apps to work with AVI and MP4 file headers for similar reasons to you.
Even if you're willing to look up what behavior your implementation defines, multi-character constants will still vary with endianness.
Better to use a (POD) struct { char[4] }; ... and then use a UDL like "WAVE"_4cc to easily construct instances of that class
Simplest C/C++ any compiler/standard compliant solution, was mentioned by #leftaroundabout in comments above:
int x = *(int*)"abcd";
Or a bit more specific:
int x = *(int32_t*)"abcd";
One more solution, also compliant with C/C++ compiler/standard since C99 (except clang++, which has a known bug):
int x = ((union {char s[5]; int number;}){"abcd"}).number;
/* just a demo check: */
printf("x=%d stored %s byte first\n", x, x==0x61626364 ? "MSB":"LSB");
Here anonymous union is used to give a nice symbol-name to the desired numeric result, "abcd" string is used to initialize the lvalue of compound literal (C99).
If you want to disable this warning it is important to know that there are two related warning parameters in GCC and Clang: GCC Compiler options -wno-four-char-constants and -wno-multichar
I'm trying to do this :
uint64_t key = 0110000010110110011001101111101000111111111010001011000110001110;
Doesn't work. GCC says
warning: integer constant is too large for its type
Any idea why?
Although neither the draft C99 standard nor the draft C11 standard support binary literals, since you specifically mention gcc it has an extension for binary literals which says:
Integer constants can be written as binary constants, consisting of a sequence of ‘0’ and ‘1’ digits, prefixed by ‘0b’ or ‘0B’. This is particularly useful in environments that operate a lot on the bit level (like microcontrollers).
they gives the following example (see it live):
i = 0b101010;
and it looks like clang also support this as an extension as well (see it live):
[...]binary literals (for instance, 0b10010) are recognized. Clang supports this feature as an extension in all language modes.
This is not available in standard C++ either until C++14 [lex.icon].
Due to the leading 0 that's read as a very large octal literal, which is probably not what you want. Amazingly, C doesn't have binary literals.
As an extension, gcc supports binary literals with the 0b prefix, eg, 0b101010. If you don't want to rely on extensions, hexadecimal is probably the most reasonable alternative.
Use Hex decimal they are more compact and can represent binary example:
0x01 // This is 0001
0x02 // 0010
Using binary the way you want is particularly ugly inside someone's code.
Also starting a number with 0 (in your example) will be interpreted as octal.
To add to #Shafik's answer:
#include <inttypes.h>
#include <stdio.h>
int main(void)
{
uint64_t i = UINT64_C(
0b0110000010110110011001101111101000111111111010001011000110001110);
printf("%" PRIX64 "\n", i) ;
return 0;
}
Worked with: gcc -std=gnu99 - output: 60B666FA3FE8B18E
Really, embedded systems are much more strict about extensions, particularly if you're unfortunate enough to be required to adhere to standards like MISRA-C. Hexadecimal conversion is just the grouping of 4 binary digits to 0x0 -> 0xf.
I think that is because you exceeded the maximum integer that can be stored in an uint datatype.
Try using an array of boolean type.
My friend says he read it on some page on SO that they are different,but how could the two be possibly different?
Case 1
int i=999;
char c=i;
Case 2
char c=999;
In first case,we are initializing the integer i to 999,then initializing c with i,which is in fact 999.In the second case, we initialize c directly with 999.The truncation and loss of information aside, how on earth are these two cases different?
EDIT
Here's the link that I was talking of
why no overflow warning when converting int to char
One member commenting there says --It's not the same thing. The first is an assignment, the second is an initialization
So isn't it a lot more than only a question of optimization by the compiler?
They have the same semantics.
The constant 999 is of type int.
int i=999;
char c=i;
i created as an object of type int and initialized with the int value 999, with the obvious semantics.
c is created as an object of type char, and initialized with the value of i, which happens to be 999. That value is implicitly converted from int to char.
The signedness of plain char is implementation-defined.
If plain char is an unsigned type, the result of the conversion is well defined. The value is reduced modulo CHAR_MAX+1. For a typical implementation with 8-bit bytes (CHAR_BIT==8), CHAR_MAX+1 will be 256, and the value stored will be 999 % 256, or 231.
If plain char is a signed type, and 999 exceeds CHAR_MAX, the conversion yields an implementation-defined result (or, starting with C99, raises an implementation-defined signal, but I know of no implementations that do that). Typically, for a 2's-complement system with CHAR_BIT==8, the result will be -25.
char c=999;
c is created as an object of type char. Its initial value is the int value 999 converted to char -- by exactly the same rules I described above.
If CHAR_MAX >= 999 (which can happen only if CHAR_BIT, the number of bits in a byte, is at least 10), then the conversion is trivial. There are C implementations for DSPs (digital signal processors) with CHAR_BIT set to, for example, 32. It's not something you're likely to run across on most systems.
You may be more likely to get a warning in the second case, since it's converting a constant expression; in the first case, the compiler might not keep track of the expected value of i. But a sufficiently clever compiler could warn about both, and a sufficiently naive (but still fully conforming) compiler could warn about neither.
As I said above, the result of converting a value to a signed type, when the source value doesn't fit in the target type, is implementation-defined. I suppose it's conceivable that an implementation could define different rules for constant and non-constant expressions. That would be a perverse choice, though; I'm not sure even the DS9K does that.
As for the referenced comment "The first is an assignment, the second is an initialization", that's incorrect. Both are initializations; there is no assignment in either code snippet. There is a difference in that one is an initialization with a constant value, and the other is not. Which implies, incidentally, that the second snippet could appear at file scope, outside any function, while the first could not.
Any optimizing compiler will just make the int i = 999 local variable disappear and assign the truncated value directly to c in both cases. (Assuming that you are not using i anywhere else)
It depends on your compiler and optimization settings. Take a look at the actual assembly listing to see how different they are. For GCC and reasonable optimizations, the two blocks of code are probably equivalent.
Aside from the fact that the first also defines an object iof type int, the semantics are identical.
i,which is in fact 999
No, i is a variable. Semantically, it doesn't have a value at the point of the initialization of c ... the value won't be known until runtime (even though we can clearly see what it will be, and so can an optimizing compiler). But in case 2 you're assigning 999 to a char, which doesn't fit, so the compiler issues a warning.
I am currently dealing with code purchased from a third party contractor. One struct has an unsigned char field while the function that they are passing that field to requires a signed char. The compiler does not like this, as it considers them to be mismatched types. However, it apparently compiles for that contractor. Some Googling has told me that "[i]t is implementation-defined whether a char object can hold negative values". Could the contractor's compiler basically ignore the signed/unsigned type and treat them the same? Or is there a compiler flag that will treat them the same?
C is not my strongest language--just look at my tags on my user page--so any help would be much appreciated.
Actually char, signed char and unsigned char are three different types. From the standard (ISO/IEC 9899:1990):
6.1.2.5 Types
...
The three types char, signed char and
unsigned char are collectively called
the character types.
(and in C++ for instance you have to (or at least should) write override functions with three variants of them if you have a char argument)
Plain char might be treated signed or unsigned by the compiler, but the standard says (also in 6.1.2.5):
An object declared as type char is
large enough to store any member of
the basic execution character set. If
a member of the required source
character set in 5.2.1 is stored in a
char object, its value is guarantied
to be positive. If other quantities
are stored in a char object, the
behavior is implementation-defined:
the values are treated as either
signed or nonnegative integers.
and
An object declared as type signed char occupies the same amount of storage as a ''plain'' char object.
The characters referred to in 5.2.1 are A-Z, a-z, 0-9, space, tab, newline and the following 29 graphic characters:
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~
Answer
All of that I interpret to basically mean that ascii characters with value less than 128 are guarantied to be positive. So if the values stored always are less than 128 it should be safe (from a value preserving perspective), although not so good practice.
This is compiler-dependent. For example, in VC++ there's a compiler option and a corresponding _CHAR_UNSIGNED macro defined if that option instructs to use unsigned char by default.
I take it that you're talking about fields of type signed char and unsigned char, so they're explicitly wrong. If one of them was simply char, it might match in whatever compiler the contractor is using (IIRC, it's implementation-defined whether char is signed or unsigned), but not in yours. In that case, you might be able to get by with a command-line option or something to change yours.
Alternatively, the contractor might be using a compiler, or compiler options, that allow him to compile while ignoring errors or warnings. Do you know what sort of compilation environment he has?
In any case, this is not good C. If one of the types is just char, it relies on implementation-defined behavior, and therefore isn't portable. If not, it's flat wrong. I'd take this up with the contractor.