How do I handle a 4-byte char array as a typical int in ANSI C?
Some context:
I'm parsing a binary file, where I need to read 4-bytes unsigned integers.
I want to make sure that, no matter what platform this parser is compiled in, an "int" is always 4 bytes long.
I've read about uint32_t & friends, but I'm limited to ANSI C.
Thanks in advance.
throw some preprocessor commands in there
#include <limits.h>
#if LONG_BIT == 32
long buffer [BUFFER_SIZE];
#elif WORD_BIT == 32
int buffer [BUFFER_SIZE];
#endif
you also could use sizeof(int) during runtime.
I would suggest that rather than trying to overlay data, you may be better off simply writing functions to do things like uint32 GetUInt32LE(unsigned char *dat), which could then be implemented as return ((uint32)(dat[3]) << 24) | ((uint32)(dat[2]) << 16) | ((uint32)(dat[1]) << 8) | dat[0]; for maximal portability (assuming one has defined uint32 suitably), or could be defined somewhat more simply on systems where a int was known to be 32 bits or larger. Note that there are still some embedded compilers for smaller systems where int is only 16 bits, so the casts on the individual bytes are needed. The cast on dat[1] is needed on such systems because otherwise shifting a byte value like 0xFF left by eight would otherwise yield undefined behavior whose most probable actual result would be to interpret it as (int)(0xFF00), which would be sign-extended to 0xFFFFFF00, clobbering the two upper bytes of the result.
Related
Is it possible to declare a bitfield of very large numbers e.g.
struct binfield{
uber_int field : 991735910442856976773698036458045320070701875088740942522886681;
}wordlist;
just to clarify, i'm not trying to represent that number in 256bit, that's how many bits I want to use. Or maybe there aren't that many bits in my computer?
C does not support numeric data-types of arbitrary size. You can only use those integer sizes which are provided by the compiler, and when you want your code to be portable, you better stick to the minimum guaranteed sizes for the standardized types of char (8 bit), short (16 bit), and long (32 bit) and long long (64 bit).
But what you can do instead is create a char[]. A char is always at least 8 bit (and is not more than 8 bit either except on some very exotic platforms). So you can use an array of char to store as many bit-values as you can afford memory. However, when you want to use a char array as a bitfield you will need some boilerplate code to access the correct byte.
For example, to get the value of bit n of a char array, use
bitfield[n/8] >> n%8 & 0x1
I need to do bitewise operations on 32bit integers (that indeed represent chars, but whatever).
Is the following kind of code safe?
uint32_t input;
input = ...;
if(input & 0x03000000) {
output = 0x40000000;
output |= (input & 0xFC000000) >> 2;
I mean, in the "if" statement, I am doing a bitwise operation on, on the left side, a uint32_t, and on the right side... I don't know!
So do you know the type and size (by that I mean on how much bytes is it stored) of hard-coded "0x03000000" ?
Is it possible that some systems consider 0x03000000 as an int and hence code it only on 2 bytes, which would be catastrophic?
Is the following kind of code safe?
Yes, it is.
So do you know the type and size (by that I mean on how much bytes is it stored) of hard-coded "0x03000000" ?
0x03000000 is int on a system with 32-bit int and long on a system with 16-bit int.
(As uint32_t is present here I assume two's complement and CHAR_BIT of 8. Also I don't know any system with 16-bit int and 64-bit long.)
Is it possible that some systems consider 0x03000000 as an int and hence code it only on 2 bytes, which would be catastrophic?
See above on a 16-bit int system, 0x03000000 is a long and is 32-bit. An hexadecimal constant in C is the first type in which it can be represented:
int, unsigned int, long, unsigned long, long long, unsigned long long
To determine the endianness of a system, I plan to store a multi-byte integer value in a variable and access the first byte via an unsigned char wrapped in a union; for example:
union{
unsigned int val;
unsigned char first_byte;
} test;
test.val = 1; /* stored in little-endian system as "0x01 0x00 0x00 0x00" */
if(test.first_byte == 1){
printf("Little-endian system!");
}else{
printf("Big-endian system!");
}
I want to make this test portable across platforms, but I'm not sure if the C99 standard guarantees that the unsigned int data type will be greater than one byte in size. Furthermore, since a "C byte" does not technically have to be 8-bits in size, I cannot use exact width integer types (e.g. uint8_t, uint16_t, etc.).
Are there any C data types guaranteed by the C99 standard to be at least two bytes in size?
P.S. Assuming an unsigned int is in fact greater than one byte, would my union behave as I'm expecting (with the variable first_byte accessing the first byte in variable val) across all C99 compatible platforms?
Since int must have a range of at least 16 bits, int will meet your criterion on most practical systems. So would short (and long, and long long). If you want exactly 16 bits, you have to look to see whether int16_t and uint16_t are declared in <stdint.h>.
If you are worried about systems where CHAR_BIT is greater than 8, then you have to work harder. If CHAR_BIT is 32, then only long long is guaranteed to hold two characters.
What the C standard says about sizes of integer types
In a comment, Richard J Ross III says:
The standard says absolutely nothing about the size of an int except that it must be larger than or equal to short, so, for example, it could be 10 bits on some systems I've worked on.
On the contrary, the C standard has specifications on the lower bounds on the ranges that must be supported by different types, and a system with 10-bit int would not be conformant C.
Specifically, in ISO/IEC 9899:2011 §5.2.4.2.1 Sizes of integer types <limits.h>, it says:
¶1 The values given below shall be replaced by constant expressions suitable for use in #if
preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the
following shall be replaced by expressions that have the same type as would an
expression that is an object of the corresponding type converted according to the integer
promotions. Their implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.
— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
[...]
— minimum value for an object of type short int
SHRT_MIN -32767 // −(215 − 1)
— maximum value for an object of type short int
SHRT_MAX +32767 // 215 − 1
— maximum value for an object of type unsigned short int
USHRT_MAX 65535 // 216 − 1
— minimum value for an object of type int
INT_MIN -32767 // −(215 − 1)
— maximum value for an object of type int
INT_MAX +32767 // 215 − 1
— maximum value for an object of type unsigned int
UINT_MAX 65535 // 216 − 1
GCC provides some macros giving the endianness of a system: GCC common predefined macros
example (from the link supplied):
/* Test for a little-endian machine */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
Of course, this is only useful if you use gcc. Furthermore, conditional compilation for endianness can be considered harmful. Here is a nice article about this: The byte order fallacy.
I would prefer to do this using regular condtions to let the compiler check the other case. ie:
if (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
...
No, nothing is guaranteed to be larger than one byte -- but it is guaranteed that no (non-bitfield) type is smaller than one byte and that one byte can hold at 256 distinct values, so if you have an int8_t and an int16_t, then it's guaranteed that int8_t is one byte, so int16_t must be two bytes.
The C standard guarantees only that the size of char <= short <= int <= long <= long long [and likewise for unsigned]. So, theoretically, there can be systems that have only one size for all of the sizes.
If it REALLY is critical that this isn't going wrong on some particular architecture, I would add a piece of code to do something like if (sizeof(char) == sizeof(int)) exit_with_error("Can't do this...."); to the code.
In nearly all machines, int or short should be perfectly fine. I'm not actually aware of any machine where char and int are the same size, but I'm 99% sure that they do exist. Those machines may also have it's native byte != 8 bits, such as 9 or 14 bits, and words that are 14, 18 or 36 or 28 bits...
Take a look at the man page of stdint.h (uint_least16_t for 2 bytes)
At least according to http://en.wikipedia.org/wiki/C_data_types -- the size of an int is guaranteed to be two "char"s long. So, this test should work, although I'm wondering if there is a more appropriate solution. For one, with rare exception, most architectures would have their endianness set compile-time, and not runtime. There are a few architectures that can switch endianness, though (I believe ARM and PPC are configurable, but ARM is traditionally LE, and PPC is mostly BE).
A conforming implementation can have all its fundamental types of size 1 (and hold at least 32 bits worth of data). For such an implementation, however, the notion of endianness is not applicable.
Nothing forbids a conforming implementation to have, say, little-endian shorts and big-endian longs.
So there are three possible outcomes for each integral type: it could be big-endian, little-endian, or of size 1. Check each type separately for maximum theoretical portability. In practice this probably never happens.
Middle-endian types, or e.g. big-endian stuff on even-numbered pages only, are theoretically possible, but I would refrain from even thinking about such an implementation.
While the answer is basically "no", satisfying the interface requirements for the stdio functions requires that the range [0,UCHAR_MAX] fit in int, which creates an implicit requirement that sizeof(int) is greater than 1 on hosted implementations (freestanding implementations are free to omit stdio, and there's no reason they can't have sizeof(int)==1). So I think it's fairly safe to assume sizeof(int)>1.
Is it possible to store more than a byte value to a char type?
Say for example char c; and I want to store 1000 in c. is it possible to do that?
Technically, no, you can't store more than a byte value to a char type. In C, a char and a byte are the same size, but not necessarily limited to 8 bits. Many standards bodies tend to use the term "octet" for an exactly-8-bit value.
If you look inside limits.h (from memory), you'll see the CHAR_BIT symbol (among others) telling you how many bits are actually used for a char and, if this is large enough then, yes, it can store the value 1000.
The range of values you can store in a C type depends on its size, and that is not specified by C, it depends on the architecture. A char type has a minimum of 8 bits. And typically (almost universally) that's also its maximum (you can check it in your limits.h).
Hence, in a char you will be able to store from -128 to 127, or from 0 to 255 (signed or unsigned).
The minimum size for a char in C is 8 bits, which is not wide enough to hold more than 256 values. It may be wider in a particular implementation such as a word-addressable architecture, but you shouldn't rely on that.
Include limits.h and check the value of CHAR_MAX.
Probably not. The C standard requires that a char can hold at least 8 bits, so you can't depend on being able to store a value longer than 8 bits in a char portably.
(* In most commonly-used systems today, chars are 8 bits).
Char's width is system-dependent. But assuming you're using something reasonably C99-compatible, you should have access to a header stdint.h, which defines types of the formats intN_t and uintN_t where N=8,16,32,64. These are guaranteed to be at least N bits wide. So if you want to be certain to have a type with a certain amount of bits (regardless of system), those are the guys you want.
Example:
#include <stdint.h>
uint32_t foo; /* Unsigned, 32 bits */
int16_t bar; /* Signed, 16 bits */
Is there a standards-complaint method to represent a byte in ANSI (C89/90) C? I know that, most often, a char happens to be a byte, but my understanding is that this is not guaranteed to be the case. Also, there is stdint.h in the C99 standard, but what was used before C99?
I'm curious about both 8 bits specifically, and a "byte" (sizeof(x) == 1).
char is always a byte , but it's not always an octet. A byte is the smallest addressable unit of memory (in most definitions), an octet is 8-bit unit of memory.
That is, sizeof(char) is always 1 for all implementations, but CHAR_BIT macro in limits.h defines the size of a byte for a platform and it is not always 8 bit. There are platforms with 16-bit and 32-bit bytes, hence char will take up more bits, but it is still a byte. Since required range for char is at least -127 to 127 (or 0 to 255), it will be at least 8 bit on all platforms.
ISO/IEC 9899:TC3
6.5.3.4 The sizeof operator
...
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. [...]
When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. [...]
Emphasis mine.
You can always represent a byte (if you mean 8bits) in a unsigned char. It's always at least 8 bits in size, all bits making up the value, so a 8 bit value will always fit into it.
If you want exactly 8 bits, i also think you'll have to use platform dependent ways. POSIX systems seem to be required to support int8_t. That means that on POSIX systems, char (and thus a byte) is always 8 bits.
In ANSI C89/ISO C90 sizeof(char) == 1. However, it is not always the case that 1 byte is 8 bits. If you wish to count the number of bits in 1 byte (and you don't have access to limits.h), I suggest the following:
unsigned int bitnum(void) {
unsigned char c = ~0u; /* Thank you Jonathan. */
unsigned int v;
for(v = 0u; c; ++v)
c &= c - 1u;
return(v);
}
Here we use Kernighan's method to count the number of bits set in c. To better understand the code above (or see others like it), I refer you to "Bit Twiddling Hacks".
Before C99? Platform-dependent code.
But why do you care? Just use stdint.h.
In every implementation of C I have used (from old UNIX to embedded compilers written by hardware engineers to big-vendor compilers) char has always been 8-bit.
You can find pretty reliable macros and typedefs in boost.
I notice that some answered have re-defined the word byte to mean something other than 8 bits.
A byte is 8 bits, however in some c implementations char is 16 bits (2 bytes) or 8 bits (1 byte). The people that are calling a byte 'smallest addressable unit of memory' or some such garbage have lost grasp of the meaning of byte (8 bits).
The reason that some implementations of C have 16 bit chars (2 bytes) and some have 8 bit chars (1 byte), and there is no standard type called 'byte', is due to laziness.
So, we should use int_8