Does it make sense to qualify bit fields as signed / unsigned?
The relevant portion of the standard (ISO/IEC 9899:1999) is 6.7.2.1 #4:
A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other implementation-defined
type.
Yes. An example from here:
struct {
/* field 4 bits wide */
unsigned field1 :4;
/*
* unnamed 3 bit field
* unnamed fields allow for padding
*/
unsigned :3;
/*
* one-bit field
* can only be 0 or -1 in two's complement!
*/
signed field2 :1;
/* align next field on a storage unit */
unsigned :0;
unsigned field3 :6;
}full_of_fields;
Only you know if it makes sense in your projects; typically, it does for fields with more than one bit, if the field can meaningfully be negative.
It's very important to qualify your variables as signed or unsigned. The compiler needs to know how to treat your variables during comparisons and casting. Examine the output of this code:
#include <stdio.h>
typedef struct
{
signed s : 1;
unsigned u : 1;
} BitStruct;
int main(void)
{
BitStruct x;
x.s = 1;
x.u = 1;
printf("s: %d \t u: %d\r\n", x.s, x.u);
printf("s>0: %d \t u>0: %d\r\n", x.s > 0, x.u > 0);
return 0;
}
Output:
s: -1 u: 1
s>0: 0 u>0: 1
The compiler stores the variable using a single bit, 1 or 0. For signed variables, the most significant bit determines the sign (high is treated negative). Thus, the signed variable, while it gets stored as 1 in binary, it gets interpreted as negative one.
Expanding on this topic, an unsigned two bit number has a range of 0 to 3, while a signed two bit number has a range of -2 to 1.
Yes, it can. C bit-fields are essentially just limited-range integers. Frequently hardware interfaces pack bits together in such away that some control can go from, say, -8 to 7, in which case you do want a signed bit-field, or from 0 to 15, in which case you want an unsigned bit-field.
I don't think Andrew is talking about single-bit bit fields. For example, 4-bit fields: 3 bits of numerical information, one bit for sign. This can entirely make sense, though I admit to not being able to come up with such a scenario off the top of my head.
Update: I'm not saying I can't think of a use for multi-bit bit fields (having used them all the time back in 2400bps modem days to compress data as much as possible for transmission), but I can't think of a use for signed bit fields, especially not a quaint, obvious one that would be an "aha" moment for readers.
Most certainly ANSI-C provides for signed and unsigned bit fields. It is required. This is also part of writing debugger overlays for IEEE-754 floating point types [[1][5][10]], [[1][8][23]], and [[1][10][53]]. This is useful in machine type or network translations of such data, or checking conversions double (64 bits for math) to half precision (16 bits for compression) before sending over a link, like video card textures.
// Fields need to be reordered based on machine/compiler endian orientation
typedef union _DebugFloat {
float f;
unsigned long u;
struct _Fields {
signed s : 1;
unsigned e : 8;
unsigned m : 23;
} fields;
} DebugFloat;
Eric
One place where signed bitfields are useful is in emulation, where the emulated machine has fewer bits than your default word.
I'm currently looking at emulating a 48-bit machine and am trying to work out if it's reasonable to use 48 bits out of a 64-bit "long long" via bitfields... the generated code would be the same as if I did all the masking, sign-extending etc explicitly but it would read a lot better...
According to this reference, it's possible:
http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.vacpp6m.doc/language/ref/clrc03defbitf.htm
Bit masking signed types varies from platform hardware to platform hardware due to how it may deal with an overflow from a shift etc.
Any half good QA tool will warn knowingly of such usage.
if a 'bit' is signed, then you have a range of -1, 0, 1, which then becomes a ternary digit. I don't think the standard abbreviation for that would be suitable here, but makes for interesting conversations :)
Related
I'm trying to celebrate 10,000,000 questions on StackOverflow with a simple console application written in C, but I don't want to waste any memory. What's the most efficient way to store the number 10,000,000 in memory?
The type you're looking for is int_least32_t, from stdint.h, which will give you the smallest type with at least 32 bits. This type is guaranteed to exist on C99 implementations.
Exact-width typedefs such as int32_t are not guaranteed to exist, though you'd be hard pressed to find a platform without it.
The number 10000000 (ten million) requires 24 bits to store as an unsigned value.
Most C implementations do not have a 24-bit type. Any implementation that conforms to the 1999 C standard or later must provide the <stdint.h> header, and must define all of:
uint_least8_t
uint_least16_t
uint_least32_t
uint_least64_t
each of which is (an alias for) an unsigned integer type with at least the specified width, such that no narrower integer type has at least the specified width. Of these, uint_least32_t is the narrowest type that's guaranteed to hold the value 10000000.
On the vast majority of C implementations, uint_least32_t is the type you're looking for -- but on an implementation that supports 24-bit integers, there will be a narrower type that satisfies your requirements.
Such an implementation would probably define uint24_t, assuming that it's unsigned 24-bit type has no padding bits. So you could do something like this:
#include <stdint.h>
#ifdef UINT_24_MAX
typedef uint24_t my_type;
#else
typedef uint_least32_t my_type;
#endif
That's still not 100% reliable (for example if there's a 28-bit type but no 24-bit type, this would miss it). In the worst case, it would select uint_least32_t.
If you want to restrict yourself to the predefined types (perhaps because you want to support pre-C99 implementations), you could do this:
#include <limits.h>
#define TEN_MILLION 10000000
#if UCHAR_MAX >= TEN_MILLION
typedef unsigned char my_type;
#elif USHRT_MAX >= TEN_MILLION
typedef unsigned short my_type;
#elif UINT_MAX >= TEN_MILLION
typedef unsigned int my_type;
#else
typedef unsigned long my_type;
#endif
If you merely want the narrowest predefined type that's guaranteed to hold the value 10000000 on all implementations (even if some implementations might have a narrower type that can hold it), use long (int can be a narrow as 16 bits).
If you don't require using an integer type, you can simply define a type that's guaranteed to be 3 bytes wide:
typedef unsigned char my_type[3];
But actually that will be wider than you need of CHAR_BIT > 8:
typedef unsigned char my_type[24 / CHAR_BIT]
but that will fail if 24 is not a multiple of CHAR_BIT.
Finally, your requirement is to represent the number 10000000; you didn't say you need to be able to represent any other numbers:
enum my_type { TEN_MILLION };
Or you can define a 1-bit bitfield with the value 1 denoting 10000000 and the value 0 denoting not 10000000.
Technically a 24-bit integer can store that, but there are no 24-bit primitive types in C. You will have to use a 32-bit int, or a long.
For performance that would be the best approach, wasting 1 unused byte of memory is irrelevant.
Now, if for study purposes you really really want to store 10 million in the smallest piece of memory possible, and you are even willing to make your own data storage method to achieve that, you can store that in 1 byte, by customizing a storage method that follows the example of float. You only need 4 bits to represent 10, and other 3 bits to represent 7, and have in 1 byte all data you need to calculate pow(10, 7);. It even leaves you with an extra free bit you can use as sign.
The largest value that can be stored in an uint16_t is 0xFFFF, which is 65535 in decimal. That is obviously not large enough.
The largest value that can be stored in an uint32_t is 0xFFFFFFFF, which is 4294967295 in decimal. That is obviously large enough.
Looks like you'll need to use uint32_t.
If you want to store a number n > 0 in an unsigned integer data type, then you need at least ⌈lg (n+1)⌉ bits in your integer type. In your case, ⌈lg (10,000,000 + 1)⌉ = 24, so you'd need at least 24 bits in whatever data type you picked.
To the best of my knowledge, the C spec does not include an integer type that holds specifically 24 bits. The closest option would be to use something like uint32_t, but (as was mentioned in a comment) this type might not exist on all compilers. The type int_least32_t is guaranteed to exist, but it might be way larger than necessary.
Alternatively, if you just need it to work on one particular system, your specific compiler may have a compiler-specific data type for 24-bit integers. (Spoiler: it probably doesn't. ^_^)
The non-smart-alecky answer is uint32_t or int32_t.
But if you're really low on memory,
uint8_t millions_of__questions;
But only if you are a fan of fixed-point arithmetic and are willing to deal with some error (or are using some specialized numerical representation scheme). Depends on what you're doing with it after you store it and what other numbers you want to be storable in your "datatype".
struct {
uint16_t low; // Lower 16 bits
uint8_t high; // Upper 8 bits
} my_uint24_t;
This provides 24-bits of storage and can store values up to 16,777,215. It's not a native type and would need to special accessor functions, but it takes less space than a 32-bit integer on most platforms (packing may be required).
uint32_t from <stdint.h> would probably be your go-to data type. Since uint16_t would only be able to store 2^16-1=65535, the next available type is uint32_t.
Note that uint32_t stores unsigned values; use int32_t if you want to store signed numbers as well. You can find more info, e.g., here: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdint.h.html
To determine the endianness of a system, I plan to store a multi-byte integer value in a variable and access the first byte via an unsigned char wrapped in a union; for example:
union{
unsigned int val;
unsigned char first_byte;
} test;
test.val = 1; /* stored in little-endian system as "0x01 0x00 0x00 0x00" */
if(test.first_byte == 1){
printf("Little-endian system!");
}else{
printf("Big-endian system!");
}
I want to make this test portable across platforms, but I'm not sure if the C99 standard guarantees that the unsigned int data type will be greater than one byte in size. Furthermore, since a "C byte" does not technically have to be 8-bits in size, I cannot use exact width integer types (e.g. uint8_t, uint16_t, etc.).
Are there any C data types guaranteed by the C99 standard to be at least two bytes in size?
P.S. Assuming an unsigned int is in fact greater than one byte, would my union behave as I'm expecting (with the variable first_byte accessing the first byte in variable val) across all C99 compatible platforms?
Since int must have a range of at least 16 bits, int will meet your criterion on most practical systems. So would short (and long, and long long). If you want exactly 16 bits, you have to look to see whether int16_t and uint16_t are declared in <stdint.h>.
If you are worried about systems where CHAR_BIT is greater than 8, then you have to work harder. If CHAR_BIT is 32, then only long long is guaranteed to hold two characters.
What the C standard says about sizes of integer types
In a comment, Richard J Ross III says:
The standard says absolutely nothing about the size of an int except that it must be larger than or equal to short, so, for example, it could be 10 bits on some systems I've worked on.
On the contrary, the C standard has specifications on the lower bounds on the ranges that must be supported by different types, and a system with 10-bit int would not be conformant C.
Specifically, in ISO/IEC 9899:2011 §5.2.4.2.1 Sizes of integer types <limits.h>, it says:
¶1 The values given below shall be replaced by constant expressions suitable for use in #if
preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the
following shall be replaced by expressions that have the same type as would an
expression that is an object of the corresponding type converted according to the integer
promotions. Their implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.
— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
[...]
— minimum value for an object of type short int
SHRT_MIN -32767 // −(215 − 1)
— maximum value for an object of type short int
SHRT_MAX +32767 // 215 − 1
— maximum value for an object of type unsigned short int
USHRT_MAX 65535 // 216 − 1
— minimum value for an object of type int
INT_MIN -32767 // −(215 − 1)
— maximum value for an object of type int
INT_MAX +32767 // 215 − 1
— maximum value for an object of type unsigned int
UINT_MAX 65535 // 216 − 1
GCC provides some macros giving the endianness of a system: GCC common predefined macros
example (from the link supplied):
/* Test for a little-endian machine */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
Of course, this is only useful if you use gcc. Furthermore, conditional compilation for endianness can be considered harmful. Here is a nice article about this: The byte order fallacy.
I would prefer to do this using regular condtions to let the compiler check the other case. ie:
if (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
...
No, nothing is guaranteed to be larger than one byte -- but it is guaranteed that no (non-bitfield) type is smaller than one byte and that one byte can hold at 256 distinct values, so if you have an int8_t and an int16_t, then it's guaranteed that int8_t is one byte, so int16_t must be two bytes.
The C standard guarantees only that the size of char <= short <= int <= long <= long long [and likewise for unsigned]. So, theoretically, there can be systems that have only one size for all of the sizes.
If it REALLY is critical that this isn't going wrong on some particular architecture, I would add a piece of code to do something like if (sizeof(char) == sizeof(int)) exit_with_error("Can't do this...."); to the code.
In nearly all machines, int or short should be perfectly fine. I'm not actually aware of any machine where char and int are the same size, but I'm 99% sure that they do exist. Those machines may also have it's native byte != 8 bits, such as 9 or 14 bits, and words that are 14, 18 or 36 or 28 bits...
Take a look at the man page of stdint.h (uint_least16_t for 2 bytes)
At least according to http://en.wikipedia.org/wiki/C_data_types -- the size of an int is guaranteed to be two "char"s long. So, this test should work, although I'm wondering if there is a more appropriate solution. For one, with rare exception, most architectures would have their endianness set compile-time, and not runtime. There are a few architectures that can switch endianness, though (I believe ARM and PPC are configurable, but ARM is traditionally LE, and PPC is mostly BE).
A conforming implementation can have all its fundamental types of size 1 (and hold at least 32 bits worth of data). For such an implementation, however, the notion of endianness is not applicable.
Nothing forbids a conforming implementation to have, say, little-endian shorts and big-endian longs.
So there are three possible outcomes for each integral type: it could be big-endian, little-endian, or of size 1. Check each type separately for maximum theoretical portability. In practice this probably never happens.
Middle-endian types, or e.g. big-endian stuff on even-numbered pages only, are theoretically possible, but I would refrain from even thinking about such an implementation.
While the answer is basically "no", satisfying the interface requirements for the stdio functions requires that the range [0,UCHAR_MAX] fit in int, which creates an implicit requirement that sizeof(int) is greater than 1 on hosted implementations (freestanding implementations are free to omit stdio, and there's no reason they can't have sizeof(int)==1). So I think it's fairly safe to assume sizeof(int)>1.
I have a lot of code that performs bitwise operations on unsigned integers. I wrote my code with the assumption that those operations were on integers of fixed width without any padding bits. For example an array of 32-bit unsigned integers of which all 32 bits available for each integer.
I'm looking to make my code more portable and I'm focused on making sure I'm C89 compliant (in this case). One of the issues that I've come across is possible padded integers. Take this extreme example, taken from the GMP manual:
However on Cray vector systems it may be noted that short and int are always stored in 8 bytes (and with sizeof indicating that) but use only 32 or 46 bits. The nails feature can account for this, by passing for instance 8*sizeof(int)-INT_BIT.
I've also read about this type of padding in other places. I actually read of a post on SO last night (forgive me, I don't have the link and I'm going to cite something similar from memory) where if you have, say, a double with 60 usable bits the other 4 could be used for padding and those padding bits could serve some internal purpose so they cannot be modified.
So let's say for example my code is compiled on a platform where an unsigned int type is sized at 4 bytes, each byte being 8 bits, however the most significant 2 bits are padding bits. Would UINT_MAX in that case be 0x3FFFFFFF (1073741823)?
#include <stdio.h>
#include <stdlib.h>
/* padding bits represented by underscores */
int main( int argc, char **argv )
{
unsigned int a = 0x2AAAAAAA; /* __101010101010101010101010101010 */
unsigned int b = 0x15555555; /* __010101010101010101010101010101 */
unsigned int c = a ^ b; /* ?? __111111111111111111111111111111 */
unsigned int d = c << 5; /* ?? __111111111111111111111111100000 */
unsigned int e = d >> 5; /* ?? __000001111111111111111111111111 */
printf( "a: %X\nb: %X\nc: %X\nd: %X\ne: %X\n", a, b, c, d, e );
return 0;
}
Is it safe to XOR two integers with padding bits?
Wouldn't I XOR whatever the padding bits are?
I can't find this behavior covered in C89.
Furthermore is the c variable guaranteed to be 0x3FFFFFFF or if for example the two padding bits were both on in a or b would c be 0xFFFFFFFF?
Same question with d and e. Am I manipulating the padding bits by shifting?
I would expect to see this below, assuming 32 bits with the 2 most significant bits used for padding, but I want to know if something like this is guaranteed:
a: 2AAAAAAA
b: 15555555
c: 3FFFFFFF
d: 3FFFFFE0
e: 01FFFFFF
Also are padding bits always the most significant bits or could they be the least significant bits?
EDIT 12/19/2010 5PM EST: Christoph has answered my question. Thanks!
I had also asked (above) whether padding bits are always the most significant bits. This is cited in the rationale for the C99 standard, and the answer is no. I am playing it safe and assuming the same for C89. Here is specifically what the C99 rationale says for §6.2.6.2 (Representation of Integer Types):
Padding bits are user-accessible in an unsigned integer type. For example, suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then that padding bit is visible to the user’s program. The C committee was told that there is a machine that works this way, and that is one reason that padding bits were added to C99.
Footnotes 44 and 45 mention that parity bits might be padding bits. The committee does not know of any machines with user-accessible parity bits within an integer. Therefore, the committee is not aware of any machines that treat parity bits as padding bits.
EDIT 12/28/2010 3PM EST: I found an interesting discussion on comp.lang.c from a few months ago.
Bitwise Operator Effects on Padding Bits (VelocityReviews reader)
Bitwise Operator Effects on Padding Bits (Google Groups alternate link)
One point made by Dietmar which I found interesting:
Let's note that padding bits are not necessary for the existence of trap representations; combinations of value bits which do not represent a value of the object type would also do.
Bitwise operations (like arithmetic operations) operate on values and ignore padding. The implementation may or may not modify padding bits (or use them internally, eg as parity bits), but portable C code will never be able to detect this. Any value (including UINT_MAX) will not include the padding.
Where integer padding might lead to problems on is if you use things like sizeof (int) * CHAR_BIT and then try to use shifts to access all these bits. If you want to be portable, either only use (unsigned) char, fixed-sized integers (a C99 addition) or determine the number of value-bits programatically. This can be done at compile-time with the preprocessor by comparing UINT_MAX against powers of 2 or at runtime by using bit-operations.
edit:
C90 does not mention integer padding at all, but as far as I can tell, 'invisible' preceding or trailing integer padding bits shouldn't violate the standard (I didn't go through all relevant sections to make sure this is really the case, though); there probaby are problems with mixed padding and value bits as mentioned in the C99 rationale because otherwise, the standard would not have needed to be changed.
As to the meaning of user-accessible: Padding bits are accessible insofar as you can alwaye get at any bit of foo (including padding) by using bit-operations on ((unsigned char *)&foo)[…]. Be careful when modifying the padding bits, though: the result won't change the value of the integer, but might create be a trap-representation nevertheless. In case of C90, this is implicitly unspecified (as in not mentioned at all), in case of C99, it's implementation-defined.
This was not what the rationale quotation was about, though: the cited architecture represents 32-bit integers via two 16-bit integers. In case of unsigned types, the resulting integer has 32 value bits and a precision of 32; in case of signed integers, it only has 31 value bits and a precision of 30: one of the sign bits of the 16-bit integers is used as the sign bit of the 32-bit integer, the other one is ignored, thus creating a padding bit surrounded by value bits. Now, if you access a 32-bit signed integer as an unsigned integer (which is explicitly allowed and does not violate the C99 aliasing rules), the padding bit becomes a (user-accessible) value bit.
I understand typecasting...but only in retrospect. My process to figure out what requires typecasting in expressions is usually retroactive because I can't predict when it will be required because I don't know how the compiler steps through them. A somewhat trite example:
int8_t x = -50;
uint16_t y = 50;
int32_t z = x * y;
On my 8-bit processor (Freescale HCS08) sets z to 63036 (2^16 - 50^2). I can see how that would be one possible answer (out of maybe 4 others), but I would not have guessed it would be the one.
A better way to ask might be: when types interact with operators (+-*/), what happens?
The compiler is suppsed upcast to the largest type in the expression and then place the result into the size of the location. If you were to look at the assembler output of the above, you could see exactly how the types are being read in native format from memory. Upcastings from a smaller to a larger size is safe and won't generate warnings. It's when you go from a larger type into a smaller type that precision may be lost and the compiler is supposed to warn or error.
There are cases where you want the information to be lost though. Say you are working with a sin/cos lookup table that is 256 entries long. It's very convienent and common (at least in embedded land) to use a u8 value to access the table so that the index is wrapped naturally to the table size while preseving the circular nature of sin/cos. Then a typecast back into a u8 is required, but is exactly what you want.
The folks here that say that values are always converted to the larger type are wrong. We cannot talk about anything if we don't know your platform (I see you have provided some information now). Some examples
int = 32bits, uint16_t = unsigned short, int8_t = signed char
This results in value -2500 because both operands are converted to int, and the operation is carried out signed and the signed result is written to an int32_t.
int = 16bits, uint16_t = unsigned int, int8_t = signed char
This results in value 63036 because the int8_t operand is first converted to unsinged int, resulting in 65536-50. It is then multiplied with it, resulting in 3 274 300 % 65536 (unsigned is modulo arithmetic) which is 63036. That result is then written to int32_t.
Notice that the minimum int bit-size is 16 bits. So on your 8-bit platform, this second scenario is what likely happens.
I'm not going to try and explain the rules here because it doesn't make sense to me to repeat what is written in the Standard / Draft (which is freely available) in great detail and which is usually easily understandable.
You will need type casting when you are down casting.
upcasting is auto and is safe, that is why the compiler never issues a warning/error. But when you are downcasting you are actually placing a value which has higher precision than the type of variable you are storing it in that is why the compiler wants you to be sure and you need to explicitly down cast.
If you want a complete answer, look at other people's suggestions. Read the C standard regarding implicit type conversion. And write test cases for your code...
It is interesting that you say this, because this code:
#include "stdio.h"
#include "stdint.h"
int main(int argc, char* argv[])
{
int8_t x = -50;
uint16_t y = 50;
int32_t z = x * y;
printf("%i\n", z);
return 0;
}
Is giving me the answer -2500.
See: http://codepad.org/JbSR3x4s
This happens for me, both on Codepad.org, and Visual Studio 2010
When the compiler does implicit casting, it follows a standard set of arithmetic conversions. These are documented in the C standard in section 6.3. If you happen to own the K&R book, there is a good summary in appendix section A6.5.
What happens to you, here, is integer promotion. Basically before computation takes place all types that are of a rank smaller than int are promoted to signed or unsigned, here to unsigned since one of your types is an unsigned type.
The computation is than performed with that width and signedness and the result is finally assigned.
On your architecture unsigned is probably 16 bit wide, which corresponds to the value that you see. Then for the assignment the computed value fits in the target type which is even wider, so the value remains the same.
To explain what happens in your example, you've got a signed 8-bit type multiplied by an unsigned 16-bit type, and so the smaller signed type is promoted to the larger unsigned type. Once this value is created, it's assigned to the 32-bit type.
If you're just working with signed or unsigned integer types, it's pretty simple. The system can always convert a smaller integer type to a larger without loss of precision, so it will convert the smaller value to the larger type in an operation. In mixed floating-point and integer calculations, it will convert the integer to the floating-point type, perhaps losing some precision.
It appears you're being confused by mixing signed and unsigned types. The system will convert to the larger type. If that larger type is signed, and can hold all the values of the unsigned type in the operation, then the operation is done as signed, otherwise as unsigned. In general, the system prefers to interpret mixed mode as unsigned.
This can be the cause of confusion (it confused you, for example), and is why I'm not entirely fond of unsigned types and arithmetic in C. I'd advise sticking to signed types when practical, and not trying to control the type size as closely as you're doing.
Can you generally make any assumptions about the minimum size of a data type?
What I have read so far:
char: 1 Byte
short: 2 Byte
int: 2 Byte, typically 4 Byte
long: 4 Byte
float??? double???
Are the values in float.h and limits.h system dependent?
This is covered in the Wikipedia article:
A short int must not be larger than an int.
An int must not be larger than a long int.
A short int must be at least 16 bits long.
An int must be at least 16 bits long.
A long int must be at least 32 bits long.
A long long int must be at least 64 bits long.
The standard does not require that any of these sizes be necessarily different. It is perfectly valid, for example, if all four types are 64 bits long.
Yes, the values in float.h and limits.h are system dependent. You should never make assumptions about the width of a type, but the standard does lay down some minimums. See §6.2.5 and §5.2.4.2.1 in the C99 standard.
For example, the standard only says that a char should be large enough to hold every character in the execution character set. It doesn't say how wide it is.
For the floating-point case, the standard hints at the order in which the widths of the types are given:
§6.2.5.10
There are three real floating types, designated as float, double, and long
double. 32) The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.
They implicitly defined which is wider than the other, but not specifically how wide they are. "Subset" itself is vague, because a long double can have the exact same range of a double and satisfy this clause.
This is pretty typical of how C goes, and a lot is left to each individual environment. You can't assume, you have to ask the compiler.
Nine years and still no direct answer about the minimum size for float, double, long double.
Any guaranteed minimum sizes for types in C?
For floating point type ...
From a practical point-of-view, float minimum size is 32-bits and double is 64- bits. C allows double and long double to share similar characteristics, so a long double could be as small as a double: Example1 or 80-bit or 128-bit or ...
I could imagine a C compliant 48-bit double may have existed – yet do not know of any.
Now, let us imagine our rich uncle dies and left us a fortune to pay for the development and cultural promotion for www.smallest_C_float.com.
C specifies:
float finite range is at least [1E-37… 1E+37]. See FLT_MIN, FLT_MAX
(1.0f + FLT_EPSILON) – 1.0f <= 1E-5.
float supports positive and negative values.
Let X: Digit 1-9
Let Y: Digit 0-9
Let E: value -37 to 36
Let S: + or -
Let b: 0 or 1
Our float could minimally represent all the combinations, using base 10, of SX.YYYYY*10^E.
0.0 and ±1E+37 are also needed (3 more). We do not need -0.0, sub-normals, ±infinity nor not-a-numbers.
That is 2910^5*74 + 3 combinations or 133,200,003 which needs at least 27 bits to encode - somehow. Recall the goal is minimal size.
With a classic base 2 approach, we can assume an implied 1 and get
S1.bbbb_bbbb_bbbb_bbbb_b2^e or 22^17*226 combinations or 26 bits.
If we try base 16, we then need about 21516^(4 or 5)*57 combinations or at least 26 to 30 bits.
Conclusion: A C float needs at least 26 bits of encoding.
A C’s double need not express a greater exponential range than float, it only has a different minimal precision requirement. 1E-9.
S1.bbbb_bbbb_bbbb_bbbb_ bbbb_ bbbb_ bbbb_bb2^e --> 22^30*226 combinations or 39 bits.
On our imagine-if-you-will computer, we could have a 13-bit char and so encode float, double, long double without padding. Thus we can realize a non-padded 26-bit float and 39-bit double, long double.
1: Microsoft Visual C++ for x86, which makes long double a synonym for double
[Edit] 2020
Additional double requirements may require 41 bits. May have to use 42-bit double and 28-bit float. Will need to review. Uncle will not be happy.
However, the new C99 specifies (in stdint.h) optional types of minimal sizes, like uint_least8_t, int_least32_t, and so on..
(see en_wikipedia_Stdint_h)
If you wan't to check the size (in multiples of chars) of any type on your system/platform really is the size you expect, you could do:
enum CHECK_FLOAT_IS_4_CHARS
{
IF_THIS_FAILS_FLOAT_IS_NOT_4_CHARS = 1/(sizeof(float) == 4)
};
Often developers asking this kind of question are dealing with arranging a packed struct to match a defined memory layout (as for a message protocol). The assumption is that the language should directly specify laying out 16-, 24-, 32-bit, etc. fields for the purpose.
That is routine and acceptable for assembly languages and other application-specific languages closely tied to a particular CPU architecture, but is sometimes a problem in a general purpose language which might be targeted at who-knows-what kind of architecture.
In fact, the C language was not intended for a particular hardware implementation. It was specified generally so a C compiler implementer could properly adapt to the realities of a particular CPU. A Frankenstein hardware architecture consisting of 9 bit bytes, 54 bit words, and 72 bit memory addresses is easily—and unambiguously—mapped to C features. (char is 9 bits; short int, int, and long int are 54 bits.)
This generality is why the C specification says something to the effect of "don't expect much about the sizes of ints beyond sizeof (char) <= sizeof (short int) <= sizeof (int) <= sizeof (long int)." That implies that chars could be the same size as longs!
The current reality is—and the future seems to hold—that software demands architectures provide 8-bit bytes and that memory words addressable as individual bytes. This wasn't always so. Not too long ago, I worked on an the CDC Cyber architecture which features 6 bit "bytes" and 60 bit words. A C implementation on that would be interesting. In fact, that architecture is responsible for the weird packing semantics of Pascal—if anyone remembers that.
C99 N1256 standard draft
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
C99 specifies two types of integer guarantees:
minimum size guarantees
relative sizes between the types
Relative guarantees
6.2.5 Types:
8 For any two integer types with the same signedness and different integer conversion rank
(see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a
subrange of the values of the other type.
and 6.3.1.1 Boolean, characters, and integers determines the relative conversion ranks:
1 Every integer type has an integer conversion rank defined as follows:
The rank of long long int shall be greater than the rank of long int, which
shall be greater than the rank of int, which shall be greater than the rank of short
int, which shall be greater than the rank of signed char.
The rank of any unsigned integer type shall equal the rank of the corresponding
signed integer type, if any.
For all integer types T1, T2, and T3, if T1 has greater rank than T2 and T2 has
greater rank than T3, then T1 has greater rank than T3
Absolute minimum sizes
Mentioned by https://stackoverflow.com/a/1738587/895245 , here is the quote for convenience.
5.2.4.2.1 Sizes of integer types <limits.h>:
1 [...] Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown [...]
UCHAR_MAX 255 // 2 8 − 1
USHRT_MAX 65535 // 2 16 − 1
UINT_MAX 65535 // 2 16 − 1
ULONG_MAX 4294967295 // 2 32 − 1
ULLONG_MAX 18446744073709551615 // 2 64 − 1
Floating point
If the __STDC_IEC_559__ macro is defined, then IEEE types are guaranteed for each C type, although long double has a few possibilities: Is it safe to assume floating point is represented using IEEE754 floats in C?
Quoting the standard does give what is defined to be "the correct answer" but it doesn't actually reflect the way programs are generally written.
People make assumptions all the time that char is 8 bits, short is 16, int is 32, long is either 32 or 64, and long long is 64.
Those assumptions are not a great idea but you will not get fired for making them.
In theory, <stdint.h> can be used to specify fixed-bit-width types, but you have to scrounge one up for Microsoft. (See here for a MS stdint.h.) One of the problems here is that C++ technically only needs C89 compatibility to be a conforming implementation; even for plain C, C99 is not fully supported even in 2009.
It's also not accurate to say there is no width specification for char. There is, the standard just avoids saying whether it is signed or not. Here is what C99 actually says:
number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
minimum value for an object of type signed char
SCHAR_MIN -127 // -(27 - 1)
maximum value for an object of type signed char
SCHAR_MAX +127 // 27 - 1
maximum value for an object of type unsigned char
UCHAR_MAX 255 // 28 - 1
Most of the libraries define something like this:
#ifdef MY_ARCHITECTURE_1
typedef unsigned char u_int8_t;
typedef short int16_t;
typedef unsigned short u_int16_t;
typedef int int32_t;
typedef unsigned int u_int32_t;
typedef unsigned char u_char;
typedef unsigned int u_int;
typedef unsigned long u_long;
typedef unsigned short u_short;
#endif
you can then use those typedef in your programs instead of the standard types.