Assign bit field member to char - c

I have some code here that uses bitsets to store many 1 bit values into a char.
Basically
struct BITS_8 {
char _1:1;
(...)
char _8:1;
}
Now i was trying to pass one of these bits as a parameter into a function
void func(char bit){
if(bit){
// do something
}else{
// do something else
}
}
// and the call was
struct BITS_8 bits;
// all bits were set to 0 before
bits._7 = 1;
bits._8 = 0;
func(bits._8);
The solution was to single the bit out when calling the function:
func(bits._8 & 0x80);
But i kept going into //do something because other bits were set. I was wondering if this is the correct behaviour or if my compiler is broken. The compiler is an embedded compiler that produces code for freescale ASICs.
EDIT: 2 mistakes: the passing of the parameter and that bits._8 should have been 0 or else the error would make no sense.
Clarification
I am interested in what the standard has to say about the assignment
struct X{
unsigned int k:6;
unsigned int y:1;
unsigned int z:1;
}
X x;
x.k = 0;
x.y = 1;
x.z = 0;
char t = X.y;
Should now t contain 1 or b00000010?

I don't think you can set a 1-bit char to 1. That one bit is left to determine whether it's signed or not, so you end up with the difference between 0 and -1.
What you want is an unsigned char, to hide this signing bit. Then you can just use 1s and 0s.

Over the years I've grown a bit suspicious when it comes to these specific compilers, quite often their interpretation of what could be described the finer points of the C standard isn't great. In those cases a more pragmatic approach can help you to avoid insanity, and to get the job done. What this means in this case is that, if the compiler isn't behaving rationally (which you could define as behaving totally different from what gcc does), you help it a bit.
For example you could modify func to become:
void func(char bit){
if(bit & 1){ // mask off everything that's none of its business
// do something
}else{
// do something else
}
}
And wrt your addon question: t should contain 1. It can't contain 10 binary because the variable is only 1 bit wide.
Some reading material from the last publicly available draft for C99
Paragraph 6.7.2.1
4 A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other
implementation-defined type.
(which to me means that the compiler should at least handle your second example correctly) and
9 A bit-field is interpreted as a signed or unsigned integer type
consisting of the specified number of bits.107) If the value 0 or 1 is
stored into a nonzero-width bit-field of type
_Bool, the value of the bit-field shall compare equal to the value stored.
As your compiler is meant for embedded use, where bitfields are a rather important feature, I'd expect their implementation to be correct - if they claim to have bitfields implemented.

I can see why you might want to use char to store flags if Bool is typedef'd to an unsigned int or similar or if your compiler does not support packed enums. You can set up some "flags" using macros (I named them for their place values below, but it makes more sense to name them what they mean) If 8 flags is not enough, this can be extended to other types up to sizeof(type)*8 flags.
/* if you cannot use a packed enum, you can try these macros
* in combination with bit operations on the "flags"
* example printf output is: 1,1,6,4,1
* hopefully the example doesn't make it more confusing
* if so see fvu's example */
#include <stdio.h> /* for printf */
#define ONE ((char) 1)
#define TWO (((char) 1)<<1)
#define FOUR (((char) 1)<<2)
#define EIGHT (((char) 1)<<3)
#define SIXTEEN (((char) 1)<<4)
#define THIRTYTWO (((char) 1)<<5)
#define SIXTYFOUR (((char) 1)<<6)
#define ONETWENTYEIGHT (((char) 1)<<7)
int main(void){
printf("%d,%d,%d,%d,%d\n",ONE & ONE, ONE | ONE, TWO | 6, FOUR & 6, sizeof(ONE));
}

Ok, it was a compiler bug. I wrote the people that produce the compiler and they confirmed it. It only happens if you have a bitfield with an inline function.
Their recommended solution is (if i don't want to wait for a patch)
func((char)bits._8));
The other answers are right about the correct behaviour. Still i'm marking this one as the answer as it is the answer to my problem.

Related

What's the point in specifying unsigned integers with "U"?

I have always, for as long as I can remember and ubiquitously, done this:
for (unsigned int i = 0U; i < 10U; ++i)
{
// ...
}
In other words, I use the U specifier on unsigned integers. Now having just looked at this for far too long, I'm wondering why I do this. Apart from signifying intent, I can't think of a reason why it's useful in trivial code like this?
Is there a valid programming reason why I should continue with this convention, or is it redundant?
First, I'll state what is probably obvious to you, but your question leaves room for it, so I'm making sure we're all on the same page.
There are obvious differences between unsigned ints and regular ints: The difference in their range (-2,147,483,648 to 2,147,483,647 for an int32 and 0 to 4,294,967,295 for a uint32). There's a difference in what bits are put at the most significant bit when you use the right bitshift >> operator.
The suffix is important when you need to tell the compiler to treat the constant value as a uint instead of a regular int. This may be important if the constant is outside the range of a regular int but within the range of a uint. The compiler might throw a warning or error in that case if you don't use the U suffix.
Other than that, Daniel Daranas mentioned in comments the only thing that happens: if you don't use the U suffix, you'll be implicitly converting the constant from a regular int to a uint. That's a tiny bit extra effort for the compiler, but there's no run-time difference.
Should you care? Here's my answer, (in bold, for those who only want a quick answer): There's really no good reason to declare a constant as 10U or 0U. Most of the time, you're within the common range of uint and int, so the value of that constant looks exactly the same whether its a uint or an int. The compiler will immediately take your const int expression and convert it to a const uint.
That said, here's the only argument I can give you for the other side: semantics. It's nice to make code semantically coherent. And in that case, if your variable is a uint, it doesn't make sense to set that value to a constant int. If you have a uint variable, it's clearly for a reason, and it should only work with uint values.
That's a pretty weak argument, though, particularly because as a reader, we accept that uint constants usually look like int constants. I like consistency, but there's nothing gained by using the 'U'.
I see this often when using defines to avoid signed/unsigned mismatch warnings. I build a code base for several processors using different tool chains and some of them are very strict.
For instance, removing the ā€˜uā€™ in the MAX_PRINT_WIDTH define below:
#define MAX_PRINT_WIDTH (384u)
#define IMAGE_HEIGHT (480u) // 240 * 2
#define IMAGE_WIDTH (320u) // 160 * 2 double density
Gave the following warning:
"..\Application\Devices\MartelPrinter\mtl_print_screen.c", line 106: cc1123: {D} warning:
comparison of unsigned type with signed type
for ( x = 1; (x < IMAGE_WIDTH) && (index <= MAX_PRINT_WIDTH); x++ )
You will probably also see ā€˜fā€™ for float vs. double.
I extracted this sentence from a comment, because it's a widely believed incorrect statement, and also because it gives some insight into why explicitly marking unsigned constants as such is a good habit.
...it seems like it would only be useful to keep it when I think overflow might be an issue? But then again, haven't I gone some ways to mitigating for that by specifying unsigned in the first place...
Now, let's consider some code:
int something = get_the_value();
// Compute how many 8s are necessary to reach something
unsigned count = (something + 7) / 8;
So, does the unsigned mitigate potential overflow? Not at all.
Let's suppose something turns out to be INT_MAX (or close to that value). Assuming a 32-bit machine, we might expect count to be 229, or 268,435,456. But it's not.
Telling the compiler that the result of the computation should be unsigned has no effect whatsoever on the typing of the computation. Since something is an int, and 7 is an int, something + 7 will be computed as an int, and will overflow. Then the overflowed value will be divided by 8 (also using signed arithmetic), and whatever that works out to be will be converted to an unsigned and assigned to count.
With GCC, arithmetic is actually performed in 2s complement so the overflow will be a very large negative number; after the division it will be a not-so-large negative number, and that ends up being a largish unsigned number, much larger than the one we were expecting.
Suppose we had specified 7U instead (and maybe 8U as well, to be consistent). Now it works.. It works because now something + 7U is computed with unsigned arithmetic, which doesn't overflow (or even wrap around.)
Of course, this bug (and thousands like it) might go unnoticed for quite a lot of time, blowing up (perhaps literally) at the worst possible moment...
(Obviously, making something unsigned would have mitigated the problem. Here, that's pretty obvious. But the definition might be quite a long way from the use.)
One reason you should do this for trivial code1 is that the suffix forces a type on the literal, and the type may be very important to produce the correct result.
Consider this bit of (somewhat silly) code:
#define magic_number(x) _Generic((x), \
unsigned int : magic_number_unsigned, \
int : magic_number_signed \
)(x)
unsigned magic_number_unsigned(unsigned) {
// ...
}
unsigned magic_number_signed(int) {
// ...
}
int main(void) {
unsigned magic = magic_number(10u);
}
It's not hard to imagine those function actually doing something meaningful based on the type of their argument. Had I omitted the suffix, the generic selection would have produced a wrong result for a very trivial call.
1 But perhaps not the particular code in your post.
In this case, it's completely useless.
In other cases, a suffix might be useful. For instance:
#include <stdio.h>
int
main()
{
printf("%zu\n", sizeof(123));
printf("%zu\n", sizeof(123LL));
return 0;
}
On my system, it will print 4 then 8.
But back to your code, yes it makes your code more explicit, nothing more.

When should I use UINT32_C(), INT32_C(),... macros in C?

I switched to fixed-length integer types in my projects mainly because they help me think about integer sizes more clearly when using them. Including them via #include <inttypes.h> also includes a bunch of other macros like the printing macros PRIu32, PRIu64,...
To assign a constant value to a fixed length variable I can use macros like UINT32_C() and INT32_C(). I started using them whenever I assigned a constant value.
This leads to code similar to this:
uint64_t i;
for (i = UINT64_C(0); i < UINT64_C(10); i++) { ... }
Now I saw several examples which did not care about that. One is the stdbool.h include file:
#define bool _Bool
#define false 0
#define true 1
bool has a size of 1 byte on my machine, so it does not look like an int. But 0 and 1 should be integers which should be turned automatically into the right type by the compiler. If I would use that in my example the code would be much easier to read:
uint64_t i;
for (i = 0; i < 10; i++) { ... }
So when should I use the fixed length constant macros like UINT32_C() and when should I leave that work to the compiler(I'm using GCC)? What if I would write code in MISRA C?
As a rule of thumb, you should use them when the type of the literal matters. There are two things to consider: the size and the signedness.
Regarding size:
An int type is guaranteed by the C standard values up to 32767. Since you can't get an integer literal with a smaller type than int, all values smaller than 32767 should not need to use the macros. If you need larger values, then the type of the literal starts to matter and it is a good idea to use those macros.
Regarding signedness:
Integer literals with no suffix are usually of a signed type. This is potentially dangerous, as it can cause all manner of subtle bugs during implicit type promotion. For example (my_uint8_t + 1) << 31 would cause an undefined behavior bug on a 32 bit system, while (my_uint8_t + 1u) << 31 would not.
This is why MISRA has a rule stating that all integer literals should have an u/U suffix if the intention is to use unsigned types. So in my example above you could use my_uint8_t + UINT32_C(1) but you can as well use 1u, which is perhaps the most readable. Either should be fine for MISRA.
As for why stdbool.h defines true/false to be 1/0, it is because the standard explicitly says so. Boolean conditions in C still use int type, and not bool type like in C++, for backwards compatibility reasons.
It is however considered good style to treat boolean conditions as if C had a true boolean type. MISRA-C:2012 has a whole set of rules regarding this concept, called essentially boolean type. This can give better type safety during static analysis and also prevent various bugs.
It's for using smallish integer literals where the context won't result in the compiler casting it to the correct size.
I've worked on an embedded platform where int is 16 bits and long is 32 bits. If you were trying to write portable code to work on platforms with either 16-bit or 32-bit int types, and wanted to pass a 32-bit "unsigned integer literal" to a variadic function, you'd need the cast:
#define BAUDRATE UINT32_C(38400)
printf("Set baudrate to %" PRIu32 "\n", BAUDRATE);
On the 16-bit platform, the cast creates 38400UL and on the 32-bit platform just 38400U. Those will match the PRIu32 macro of either "lu" or "u".
I think that most compilers would generate identical code for (uint32_t) X as for UINT32_C(X) when X is an integer literal, but that might not have been the case with early compilers.

How to define type-less string of 1-bits

I'm writing code for a flash driver and I sometimes have to check whether or not certain locations are empty i.e. contain 0xff.
Sometimes I check bytes, sometimes longer values. So I wanted to have a define that captures a type-less string of 1-bits that I can use to compare against.
It should work for signed and unsigned variables of 8, 16 and 32 bit wide.
I've tried the following two:
#define ERASED (-1)
#define ERASED (~0)
If I then compile the following code (all unsigned variables):
if (some32bitvar == ERASED)
{
....
}
if (some16bitvar == ERASED)
{
....
}
if (some8bitvar == ERASED)
{
....
}
The compiler is happy with the 16 and 32 bits variables, but complains about the 8 bit variable:
[Warning(ccom)] this comparison is always true
Note: The architecture and compiler are 16-bit, so an int is 16 bit.
The compiler is correct because it extends some8bitvar to an int, which yields 0x00nn. It then compares it against ERASED which is also extended to an int and thus yields 0xffff.
A trivial solution is to use:
if (~some8bitvar)
{
....
}
But that uses implicit knowledge of an erased flash, which I want to abstract out and it obscures what the statement does.
Another solution is to use:
if (some8bitvar == (typeof(some8bitvar)) ERASED)
{
....
}
But that makes the code less clear and my compiler doesn't support the typeof macro.
How can I define ERASED to make comparing to it work for all types?
Define a macro like
#define IsErased(val) ((~(val))==0)
and use it like
if (IsErased(someXXbitvar)) ...
This gives you the abstraction you want and keeps the code readable.
(If your have a modern C11 compiler, you can try to implement this by overloaded functions and the _Generic keyword, providing different implementations for each type). But I guess the above solution will be sufficient for your case, it is pretty simple and will also work with older compilers.
I think that your problem is that you use char for 8bit values, and on your compiler char is unsigned. When you do the comparison, the 8bit value is promoted to an int value in the range 1-255 that cannot be equal to -1.
But is your 8bit value is explicitely a signed char, it will be promoted to and int in the range -128, 127 (assuming you have no negative 0). So this code should be accepted by your compiler if it is conformant:
signed char val=-1;
if (val == -1) {
// this branch will be executed
}

storage of bool in c under various compilers and optimization levels

trivial example program:
#include <stdio.h>
main()
{
bool tim = true;
bool rob = false;
bool mike = true;
printf("%d, %d, %d\n", tim, rob, mike);
}
Using the gcc compiler it appearers, based on looking at the assembly output, that each bool is stored as a bit in individual bytes:
0x4004fc <main()+8> movb $0x1,-0x3(%rbp)
0x400500 <main()+12> movb $0x0,-0x2(%rbp)
0x400504 <main()+16> movb $0x1,-0x1(%rbp)
if, however, one turns optimization on, is there a level of optimization that will cause gcc to store these bools as bits in a byte or would one have to put the bools in a union of some bools and a short int? Other compilers? I have tried '-Os' but I must admit I can't make heads or tails of the output disassembly.
A compiler can perform any tranformations it likes, as long as the resulting behavior of the program is unaffected, or at least within the range of permitted behaviors.
This program:
#include <stdio.h>
#include <stdbool.h>
int main(void)
{
bool tim = true;
bool rob = false;
bool mike = true;
printf("%d, %d, %d\n", tim, rob, mike);
}
(which I've modified a bit to make it valid) could be optimized to the equivalent of this, since the behavior is identical:
#include <stdio.h>
int main(void)
{
puts("1, 0, 1");
}
So the three bool objects aren't just stored in single bits, they're not stored at all.
A compiler is free to play games like that as long as they don't affect the visible behavior. For example, since the program never uses the addresses of the three bool variables, and never refers to their sizes, a compiler could choose to store them all as bits within a single byte. (There's little reason to do so; the increase in the size of the code needed to access individual bits would outweigh any savings in data size.)
But that kind of aggressive optimization probably isn't what you're asking about.
In the "abstract machine", a bool object must be a least one byte unless it's a bit field.
A bool object, or any object other than a bit field, must have an unique address, and must have a size that's a whole multiple of 1 byte. If you print the value of sizeof (bool) or sizeof tim, the result will be at least 1. If you print the addresses of the three objects, they will be unique and at least one byte apart.
#Keith Thompson's good answer can explain what happened with the code example in the question. But I'll assume that the compiler doesn't transform the program. According to the standard, a bool (a macro in stdbool.h the same as the keyword _Bool) must have a size of one byte.
C99 6.2.6.1 General
Except for bit-fields, objects are composed of contiguous sequences of one or more bytes,
the number, order, and encoding of which are either explicitly specified or
implementation-defined.
This means that any type(except bit-fields, including bool) of objects must have at least one byte.
C99 6.3.1.1 Boolean, characters, and integers
The rank of_Bool shall be less than the rank of all other standard integer types.
This means bool's size is no more than a char(which is an integer type). And we also know that the size of an char is guaranteed to be one byte. So the size of bool should be at most one byte.
Conclusion: the size of bool must be one byte.
You would not use a union of bools. Instead you can say
struct a
{
unsigned char tim : 1;
unsigned char rob : 1;
unsigned char mike : 1;
} b;
b.tim=1;
b.rob=0;
b.mike=1;
and it would all get stored in a single char. However, you would not have any guarantees about how it's layed out in memory or how it's aligned.

will ~ operator change the data type?

When I read someone's code I find that he bothered to write an explicite type cast.
#define ULONG_MAX ((unsigned long int) ~(unsigned long int) 0)
When I write code
1 #include<stdio.h>
2 int main(void)
3 {
4 unsigned long int max;
5 max = ~(unsigned long int)0;
6 printf("%lx",max);
7 return 0;
8 }
it works as well. Is it just a meaningless coding style?
The code you read is very bad, for several reasons.
First of all user code should never define ULONG_MAX. This is a reserved identifier and must be provided by the compiler implementation.
That definition is not suitable for use in a preprocessor #if. The _MAX macros for the basic integer types must be usable there.
(unsigned long)0 is just crap. Everybody should just use 0UL, unless you know that you have a compiler that is not compliant with all the recent C standards with that respect. (I don't know of any.)
Even ~0UL should not be used for that value, since unsigned long may (theoretically) have padding bits. -1UL is more appropriate, because it doesn't deal with the bit pattern of the value. It uses the guaranteed arithmetic properties of unsigned integer types. -1 will always be the maximum value of an unsigned type. So ~ may only be used in a context where you are absolutely certain that unsigned long has no padding bits. But as such using it makes no sense. -1 serves better.
"recasting" an expression that is known to be unsigned long is just superfluous, as you observed. I can't imagine any compiler that bugs on that.
Recasting of expression may make sense when they are used in the preprocessor, but only under very restricted circumstances, and they are interpreted differently, there.
#if ((uintmax_t)-1UL) == SOMETHING
..
#endif
Here the value on the left evalues to UINTMAX_MAX in the preprocessor and in later compiler phases. So
#define UINTMAX_MAX ((uintmax_t)-1UL)
would be an appropriate definition for a compiler implementation.
To see the value for the preprocessor, observe that there (uintmax_t) is not a cast but an unknown identifier token inside () and that it evaluates to 0. The minus sign is then interpreted as binary minus and so we have 0-1UL which is unsigned and thus the max value of the type. But that trick only works if the cast contains a single identifier token, not if it has three as in your example, and if the integer constant has a - or + sign.
They are trying to ensure that the type of the value 0 is unsigned long. When you assign zero to a variable, it gets cast to the appropriate type.
In this case, if 0 doesn't happen to be an unsigned long then the ~ operator will be applied to whatever other type it happens to be and the result of that will be cast.
This would be a problem if the compiler decided that 0 is a short or char.
However, the type after the ~ operator should remain the same. So they are being overly cautious with the outer cast, but perhaps the inner cast is justified.
They could of course have specified the correct zero type to begin with by writing ~0UL.

Resources