storage of bool in c under various compilers and optimization levels - c

trivial example program:
#include <stdio.h>
main()
{
bool tim = true;
bool rob = false;
bool mike = true;
printf("%d, %d, %d\n", tim, rob, mike);
}
Using the gcc compiler it appearers, based on looking at the assembly output, that each bool is stored as a bit in individual bytes:
0x4004fc <main()+8> movb $0x1,-0x3(%rbp)
0x400500 <main()+12> movb $0x0,-0x2(%rbp)
0x400504 <main()+16> movb $0x1,-0x1(%rbp)
if, however, one turns optimization on, is there a level of optimization that will cause gcc to store these bools as bits in a byte or would one have to put the bools in a union of some bools and a short int? Other compilers? I have tried '-Os' but I must admit I can't make heads or tails of the output disassembly.

A compiler can perform any tranformations it likes, as long as the resulting behavior of the program is unaffected, or at least within the range of permitted behaviors.
This program:
#include <stdio.h>
#include <stdbool.h>
int main(void)
{
bool tim = true;
bool rob = false;
bool mike = true;
printf("%d, %d, %d\n", tim, rob, mike);
}
(which I've modified a bit to make it valid) could be optimized to the equivalent of this, since the behavior is identical:
#include <stdio.h>
int main(void)
{
puts("1, 0, 1");
}
So the three bool objects aren't just stored in single bits, they're not stored at all.
A compiler is free to play games like that as long as they don't affect the visible behavior. For example, since the program never uses the addresses of the three bool variables, and never refers to their sizes, a compiler could choose to store them all as bits within a single byte. (There's little reason to do so; the increase in the size of the code needed to access individual bits would outweigh any savings in data size.)
But that kind of aggressive optimization probably isn't what you're asking about.
In the "abstract machine", a bool object must be a least one byte unless it's a bit field.
A bool object, or any object other than a bit field, must have an unique address, and must have a size that's a whole multiple of 1 byte. If you print the value of sizeof (bool) or sizeof tim, the result will be at least 1. If you print the addresses of the three objects, they will be unique and at least one byte apart.

#Keith Thompson's good answer can explain what happened with the code example in the question. But I'll assume that the compiler doesn't transform the program. According to the standard, a bool (a macro in stdbool.h the same as the keyword _Bool) must have a size of one byte.
C99 6.2.6.1 General
Except for bit-fields, objects are composed of contiguous sequences of one or more bytes,
the number, order, and encoding of which are either explicitly specified or
implementation-defined.
This means that any type(except bit-fields, including bool) of objects must have at least one byte.
C99 6.3.1.1 Boolean, characters, and integers
The rank of_Bool shall be less than the rank of all other standard integer types.
This means bool's size is no more than a char(which is an integer type). And we also know that the size of an char is guaranteed to be one byte. So the size of bool should be at most one byte.
Conclusion: the size of bool must be one byte.

You would not use a union of bools. Instead you can say
struct a
{
unsigned char tim : 1;
unsigned char rob : 1;
unsigned char mike : 1;
} b;
b.tim=1;
b.rob=0;
b.mike=1;
and it would all get stored in a single char. However, you would not have any guarantees about how it's layed out in memory or how it's aligned.

Related

How values are changed in C unions?

#include <stdio.h>
int main()
{
typedef union{
int a ;
char c;
float f;
} myu;
myu sam;
sam.a = 10;
sam.f=(float)5.99;
sam.c= 'H';
printf("%d\n %c\n %f\n",sam.a,sam.c,sam.f);
return 0;
}
Output
1086303816
H
5.990025
How come the value of integer has changed so drastically while the float is almost the same.
The fields of a union all share the same starting memory address. This means that writing to one member will overwrite the contents of another.
When you write one member and then read a different member, the representation of the written member (i.e. how it is laid out in memory) is reinterpreted as the representation of the read member. Integers and floating point types have very different representations, so it makes sense that reading a float as though it were an int can vary greatly.
Things become even more complicated if the two types are not the same size. If a smaller field is written and a larger field is read, the excess bytes might not have event been initialized.
In your example, you first write the value 10 to the int member. Then you write the value 5.99 to the float member. Assuming int and float are both 4 bytes in length, all of the bytes used by the int member are overwritten by the float member.
When you then change the char member, this only changes the first byte. Assuming a float is represented in little-endian IEEE754, this changes just the low-order byte of the mantissa, so only the digits furthest to the right are affected.
Try this. Instead of using printf (which will mainly output nonesens) show the raw memory after each modification.
The code below assumes that int and float are 32 bit types and that your compiler does not add padding bytes in this union.
#include <string.h>
#include <stdio.h>
#include <assert.h>
void showmemory(void* myu)
{
unsigned char memory[4];
memcpy(memory, myu, 4);
for (int i = 0; i < 4; i++)
{
printf("%02x ", memory[i]);
}
printf("\n");
}
int main()
{
typedef union {
int a;
char c;
float f;
} myu;
assert(sizeof(myu) == 4); // assume size of the union is 4 bytes
myu sam;
sam.a = 10;
showmemory(&sam);
sam.f = (float)5.99;
showmemory(&sam);
sam.c = 'H';
showmemory(&sam);
}
Possible output on a little endian system:
0a 00 00 00 // 0a is 10 in hexadecimal
14 ae bf 40 // 5.99 in float
48 ae bf 40 // 48 is 'H'
How come the value of integer has changed so drastically while the float is almost the same.
That is just a coincidence. Your union will be stored in 4 bytes. When you assign the field "a" to 10, the binary representation of the union is 0x0000000A. And then, when you assign the field
f to 5.99, it becomes 0x40bfae14. Finally, you set the c to 'H' (0x48 in Hex), it will overwrite the first byte, which corresponds to the mantissa of the float value. Thus, the float part changes slightly. For more information about floating-point encoding, you can check this handy website out.
In "traditional" C, any object that are not bitfields and do not have a register storage class will represent an association between a sequence of consecutive bytes somewhere in memory and a means of reading or writing values of the object's type. Storing a value into an object of type T will convert the value into a a pattern of sizeof (T) * CHAR_BIT bits, and store that pattern into the associated memory. Reading an object of type T will read the sizeof (T) * CHAR_BIT bits from the object's associated storage and convert that bit pattern into a value of type T, without regard for how the underlying storage came to hold that bit pattern.
A union object serves to reserve space for the largest member, and then creates an association between each member and a region of storage that begins at the start of the union object. Any write to a member of a union will affect the appropriate part of the underlying storage, and any read of a union member will interpret whatever happens to be in the underlying storage as a value of its type. This will be true whether the member is accessed directly, or via pointer or array syntax.
The "traditional C" model is really quite simple. The C Standard, however, is much more complicated because the authors wanted to allow implementations to deviate from that behavior when doing so wouldn't interfere with whatever their customers need to do. This in turn has been interpreted by some compiler writers as an invitation to deviate from the traditional behavior without any regard for whether the traditional behavior might be useful in more circumstances than the bare minimums mandated by the Standard.

How to define type-less string of 1-bits

I'm writing code for a flash driver and I sometimes have to check whether or not certain locations are empty i.e. contain 0xff.
Sometimes I check bytes, sometimes longer values. So I wanted to have a define that captures a type-less string of 1-bits that I can use to compare against.
It should work for signed and unsigned variables of 8, 16 and 32 bit wide.
I've tried the following two:
#define ERASED (-1)
#define ERASED (~0)
If I then compile the following code (all unsigned variables):
if (some32bitvar == ERASED)
{
....
}
if (some16bitvar == ERASED)
{
....
}
if (some8bitvar == ERASED)
{
....
}
The compiler is happy with the 16 and 32 bits variables, but complains about the 8 bit variable:
[Warning(ccom)] this comparison is always true
Note: The architecture and compiler are 16-bit, so an int is 16 bit.
The compiler is correct because it extends some8bitvar to an int, which yields 0x00nn. It then compares it against ERASED which is also extended to an int and thus yields 0xffff.
A trivial solution is to use:
if (~some8bitvar)
{
....
}
But that uses implicit knowledge of an erased flash, which I want to abstract out and it obscures what the statement does.
Another solution is to use:
if (some8bitvar == (typeof(some8bitvar)) ERASED)
{
....
}
But that makes the code less clear and my compiler doesn't support the typeof macro.
How can I define ERASED to make comparing to it work for all types?
Define a macro like
#define IsErased(val) ((~(val))==0)
and use it like
if (IsErased(someXXbitvar)) ...
This gives you the abstraction you want and keeps the code readable.
(If your have a modern C11 compiler, you can try to implement this by overloaded functions and the _Generic keyword, providing different implementations for each type). But I guess the above solution will be sufficient for your case, it is pretty simple and will also work with older compilers.
I think that your problem is that you use char for 8bit values, and on your compiler char is unsigned. When you do the comparison, the 8bit value is promoted to an int value in the range 1-255 that cannot be equal to -1.
But is your 8bit value is explicitely a signed char, it will be promoted to and int in the range -128, 127 (assuming you have no negative 0). So this code should be accepted by your compiler if it is conformant:
signed char val=-1;
if (val == -1) {
// this branch will be executed
}

What does casting char* do to a reference of an int? (Using C)

In my course for intro to operating systems, our task is to determine if a system is big or little endian. There's plenty of results I've found on how to do it, and I've done my best to reconstruct my own version of a code. I suspect it's not the best way of doing it, but it seems to work:
#include <stdio.h>
int main() {
int a = 0x1234;
unsigned char *start = (unsigned char*) &a;
int len = sizeof( int );
if( start[0] > start[ len - 1 ] ) {
//biggest in front (Little Endian)
printf("1");
} else if( start[0] < start[ len - 1 ] ) {
//smallest in front (Big Endian)
printf("0");
} else {
//unable to determine with set value
printf( "Please try a different integer (non-zero). " );
}
}
I've seen this line of code (or some version of) in almost all answers I've seen:
unsigned char *start = (unsigned char*) &a;
What is happening here? I understand casting in general, but what happens if you cast an int to a char pointer? I know:
unsigned int *p = &a;
assigns the memory address of a to p, and that can you affect the value of a through dereferencing p. But I'm totally lost with what's happening with the char and more importantly, not sure why my code works.
Thanks for helping me with my first SO post. :)
When you cast between pointers of different types, the result is generally implementation-defined (it depends on the system and the compiler). There are no guarantees that you can access the pointer or that it correctly aligned etc.
But for the special case when you cast to a pointer to character, the standard actually guarantees that you get a pointer to the lowest addressed byte of the object (C11 6.3.2.3 ยง7).
So the compiler will implement the code you have posted in such a way that you get a pointer to the least significant byte of the int. As we can tell from your code, that byte may contain different values depending on endianess.
If you have a 16-bit CPU, the char pointer will point at memory containing 0x12 in case of big endian, or 0x34 in case of little endian.
For a 32-bit CPU, the int would contain 0x00001234, so you would get 0x00 in case of big endian and 0x34 in case of little endian.
If you de reference an integer pointer you will get 4 bytes of data(depends on compiler,assuming gcc). But if you want only one byte then cast that pointer to a character pointer and de reference it. You will get one byte of data. Casting means you are saying to compiler that read so many bytes instead of original data type byte size.
Values stored in memory are a set of '1's and '0's which by themselves do not mean anything. Datatypes are used for recognizing and interpreting what the values mean. So lets say, at a particular memory location, the data stored is the following set of bits ad infinitum: 01001010 ..... By itself this data is meaningless.
A pointer (other than a void pointer) contains 2 pieces of information. It contains the starting position of a set of bytes, and the way in which the set of bits are to be interpreted. For details, you can see: http://en.wikipedia.org/wiki/C_data_types and references therein.
So if you have
a char *c,
an short int *i,
and a float *f
which look at the bits mentioned above, c, i, and f are the same, but *c takes the first 8 bits and interprets it in a certain way. So you can do things like printf('The character is %c', *c). On the other hand, *i takes the first 16 bits and interprets it in a certain way. In this case, it will be meaningful to say, printf('The character is %d', *i). Again, for *f, printf('The character is %f', *f) is meaningful.
The real differences come when you do math with these. For example,
c++ advances the pointer by 1 byte,
i++ advanced it by 4 bytes,
and f++ advances it by 8 bytes.
More importantly, for
(*c)++, (*i)++, and (*f)++ the algorithm used for doing the addition is totally different.
In your question, when you do a casting from one pointer to another, you already know that the algorithm you are going to use for manipulating the bits present at that location will be easier if you interpret those bits as an unsigned char rather than an unsigned int. The same operatord +, -, etc will act differently depending upon what datatype the operators are looking at. If you have worked in Physics problems wherein doing a coordinate transformation has made the solution very simple, then this is the closest analog to that operation. You are transforming one problem into another that is easier to solve.

Assign bit field member to char

I have some code here that uses bitsets to store many 1 bit values into a char.
Basically
struct BITS_8 {
char _1:1;
(...)
char _8:1;
}
Now i was trying to pass one of these bits as a parameter into a function
void func(char bit){
if(bit){
// do something
}else{
// do something else
}
}
// and the call was
struct BITS_8 bits;
// all bits were set to 0 before
bits._7 = 1;
bits._8 = 0;
func(bits._8);
The solution was to single the bit out when calling the function:
func(bits._8 & 0x80);
But i kept going into //do something because other bits were set. I was wondering if this is the correct behaviour or if my compiler is broken. The compiler is an embedded compiler that produces code for freescale ASICs.
EDIT: 2 mistakes: the passing of the parameter and that bits._8 should have been 0 or else the error would make no sense.
Clarification
I am interested in what the standard has to say about the assignment
struct X{
unsigned int k:6;
unsigned int y:1;
unsigned int z:1;
}
X x;
x.k = 0;
x.y = 1;
x.z = 0;
char t = X.y;
Should now t contain 1 or b00000010?
I don't think you can set a 1-bit char to 1. That one bit is left to determine whether it's signed or not, so you end up with the difference between 0 and -1.
What you want is an unsigned char, to hide this signing bit. Then you can just use 1s and 0s.
Over the years I've grown a bit suspicious when it comes to these specific compilers, quite often their interpretation of what could be described the finer points of the C standard isn't great. In those cases a more pragmatic approach can help you to avoid insanity, and to get the job done. What this means in this case is that, if the compiler isn't behaving rationally (which you could define as behaving totally different from what gcc does), you help it a bit.
For example you could modify func to become:
void func(char bit){
if(bit & 1){ // mask off everything that's none of its business
// do something
}else{
// do something else
}
}
And wrt your addon question: t should contain 1. It can't contain 10 binary because the variable is only 1 bit wide.
Some reading material from the last publicly available draft for C99
Paragraph 6.7.2.1
4 A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other
implementation-defined type.
(which to me means that the compiler should at least handle your second example correctly) and
9 A bit-field is interpreted as a signed or unsigned integer type
consisting of the specified number of bits.107) If the value 0 or 1 is
stored into a nonzero-width bit-field of type
_Bool, the value of the bit-field shall compare equal to the value stored.
As your compiler is meant for embedded use, where bitfields are a rather important feature, I'd expect their implementation to be correct - if they claim to have bitfields implemented.
I can see why you might want to use char to store flags if Bool is typedef'd to an unsigned int or similar or if your compiler does not support packed enums. You can set up some "flags" using macros (I named them for their place values below, but it makes more sense to name them what they mean) If 8 flags is not enough, this can be extended to other types up to sizeof(type)*8 flags.
/* if you cannot use a packed enum, you can try these macros
* in combination with bit operations on the "flags"
* example printf output is: 1,1,6,4,1
* hopefully the example doesn't make it more confusing
* if so see fvu's example */
#include <stdio.h> /* for printf */
#define ONE ((char) 1)
#define TWO (((char) 1)<<1)
#define FOUR (((char) 1)<<2)
#define EIGHT (((char) 1)<<3)
#define SIXTEEN (((char) 1)<<4)
#define THIRTYTWO (((char) 1)<<5)
#define SIXTYFOUR (((char) 1)<<6)
#define ONETWENTYEIGHT (((char) 1)<<7)
int main(void){
printf("%d,%d,%d,%d,%d\n",ONE & ONE, ONE | ONE, TWO | 6, FOUR & 6, sizeof(ONE));
}
Ok, it was a compiler bug. I wrote the people that produce the compiler and they confirmed it. It only happens if you have a bitfield with an inline function.
Their recommended solution is (if i don't want to wait for a patch)
func((char)bits._8));
The other answers are right about the correct behaviour. Still i'm marking this one as the answer as it is the answer to my problem.

Windows: How big is a BOOL?

How big (in bits) is a Windows BOOL data type?
Microsoft defines the BOOL data type as:
BOOL Boolean variable (should be TRUE or FALSE).
This type is declared in WinDef.h as follows:
typedef int BOOL;
Which converts my question into:
How big (in bits) is an int data type?
Edit: In before K&R.
Edit 2: Something to think about
Pretend we're creating a typed programming language, and compiler. You have a type that represents something logically being True or False. If your compiler can also link to Windows DLLs, and you want to call an API that requires a BOOL data type, what data type from your language would you pass/return?
In order to interop with the Windows BOOL data type, you have to know how large a BOOL is. The question gets converted to how big an int is. But that's a C/C++ int, not the Integer data type in our pretend language.
So i need to either find, or create, a data-type that is the same size as an int.
Note: In my original question i'm not creating a compiler. i'm calling Windows from a language that isn't C/C++, so i need to find a data type that is the same size as what Windows expects.
int is officially "an integral type that is larger than or equal to the size of type short int, and shorter than or equal to the size of type long." It can be any size, and is implementation specific.
It is 4 bytes (32 bits), on Microsoft's current compiler implementation (this is compiler specific, not platform specific). You can see this on the Fundamental Types (C++) page of MSDN (near the bottom).
Sizes of Fundamental Types
Type Size
======================= =========
int, unsigned int 4 bytes
it is platform-dependent, but easy to find out:
sizeof(int)*8
In terms of code, you can always work out the size in bits of any type via:
#include <windows.h>
#include <limits.h>
sizeof (BOOL) * CHAR_BIT
However, from a semantic point of view, the number of bits in a BOOL is supposed to be one. That is to say, all non-zero values of BOOL should be treated equally, including the value of TRUE. FALSE (which is 0) is the only other value that should have a distinguished meaning. To follow this rule strictly actually requires a bit of thought. For example, to cast a BOOL down to a char you have do the following:
char a_CHAR_variable = (char) (0 != b_A_BOOL_variable);
(If you were to just cast directly, then values like (1 << 8) would be interpreted as FALSE instead of TRUE.) Or if you just want to avoid the multi-value issues altogether via:
char a_CHAR_variable = !!b_A_BOOL_variable;
If you are trying to use the various different values of BOOL for some other purpose, chances are what you are doing is either wrong or at the very least will lead to something unmaintainable.
Its because of these complications that the C++ language actually added a bonafide bool type.
It depends on the implementation. (Even on the same OS.) Use sizeof(int) to find the size of int on the implementation that you're currently using. You should never hard-code this into your C program.
Better yet, use sizeof(BOOL) so you don't have to worry if MS ever changes their definition of BOOL.
They are both 32 bits big (4 bytes).
In languages that have a native boolean type, booleans are usually 8 bits big (1 byte), not 1 bit as I once thought.
It's as big as sizeof(int) says it is?
(That's in bytes so multiply by 8.)
On windows 7 x64 and C# 2010 sizeof(bool) gives an answer of 1 , whereas sizeof(int) gives an answer of 4.
Therefore the answer to "how big in bits is a bool" is 8, and it is not the same as an int.

Resources