Padding bits in unsigned integers and bitwise operations in C89 - c

I have a lot of code that performs bitwise operations on unsigned integers. I wrote my code with the assumption that those operations were on integers of fixed width without any padding bits. For example an array of 32-bit unsigned integers of which all 32 bits available for each integer.
I'm looking to make my code more portable and I'm focused on making sure I'm C89 compliant (in this case). One of the issues that I've come across is possible padded integers. Take this extreme example, taken from the GMP manual:
However on Cray vector systems it may be noted that short and int are always stored in 8 bytes (and with sizeof indicating that) but use only 32 or 46 bits. The nails feature can account for this, by passing for instance 8*sizeof(int)-INT_BIT.
I've also read about this type of padding in other places. I actually read of a post on SO last night (forgive me, I don't have the link and I'm going to cite something similar from memory) where if you have, say, a double with 60 usable bits the other 4 could be used for padding and those padding bits could serve some internal purpose so they cannot be modified.
So let's say for example my code is compiled on a platform where an unsigned int type is sized at 4 bytes, each byte being 8 bits, however the most significant 2 bits are padding bits. Would UINT_MAX in that case be 0x3FFFFFFF (1073741823)?
#include <stdio.h>
#include <stdlib.h>
/* padding bits represented by underscores */
int main( int argc, char **argv )
{
unsigned int a = 0x2AAAAAAA; /* __101010101010101010101010101010 */
unsigned int b = 0x15555555; /* __010101010101010101010101010101 */
unsigned int c = a ^ b; /* ?? __111111111111111111111111111111 */
unsigned int d = c << 5; /* ?? __111111111111111111111111100000 */
unsigned int e = d >> 5; /* ?? __000001111111111111111111111111 */
printf( "a: %X\nb: %X\nc: %X\nd: %X\ne: %X\n", a, b, c, d, e );
return 0;
}
Is it safe to XOR two integers with padding bits?
Wouldn't I XOR whatever the padding bits are?
I can't find this behavior covered in C89.
Furthermore is the c variable guaranteed to be 0x3FFFFFFF or if for example the two padding bits were both on in a or b would c be 0xFFFFFFFF?
Same question with d and e. Am I manipulating the padding bits by shifting?
I would expect to see this below, assuming 32 bits with the 2 most significant bits used for padding, but I want to know if something like this is guaranteed:
a: 2AAAAAAA
b: 15555555
c: 3FFFFFFF
d: 3FFFFFE0
e: 01FFFFFF
Also are padding bits always the most significant bits or could they be the least significant bits?
EDIT 12/19/2010 5PM EST: Christoph has answered my question. Thanks!
I had also asked (above) whether padding bits are always the most significant bits. This is cited in the rationale for the C99 standard, and the answer is no. I am playing it safe and assuming the same for C89. Here is specifically what the C99 rationale says for §6.2.6.2 (Representation of Integer Types):
Padding bits are user-accessible in an unsigned integer type. For example, suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then that padding bit is visible to the user’s program. The C committee was told that there is a machine that works this way, and that is one reason that padding bits were added to C99.
Footnotes 44 and 45 mention that parity bits might be padding bits. The committee does not know of any machines with user-accessible parity bits within an integer. Therefore, the committee is not aware of any machines that treat parity bits as padding bits.
EDIT 12/28/2010 3PM EST: I found an interesting discussion on comp.lang.c from a few months ago.
Bitwise Operator Effects on Padding Bits (VelocityReviews reader)
Bitwise Operator Effects on Padding Bits (Google Groups alternate link)
One point made by Dietmar which I found interesting:
Let's note that padding bits are not necessary for the existence of trap representations; combinations of value bits which do not represent a value of the object type would also do.

Bitwise operations (like arithmetic operations) operate on values and ignore padding. The implementation may or may not modify padding bits (or use them internally, eg as parity bits), but portable C code will never be able to detect this. Any value (including UINT_MAX) will not include the padding.
Where integer padding might lead to problems on is if you use things like sizeof (int) * CHAR_BIT and then try to use shifts to access all these bits. If you want to be portable, either only use (unsigned) char, fixed-sized integers (a C99 addition) or determine the number of value-bits programatically. This can be done at compile-time with the preprocessor by comparing UINT_MAX against powers of 2 or at runtime by using bit-operations.
edit:
C90 does not mention integer padding at all, but as far as I can tell, 'invisible' preceding or trailing integer padding bits shouldn't violate the standard (I didn't go through all relevant sections to make sure this is really the case, though); there probaby are problems with mixed padding and value bits as mentioned in the C99 rationale because otherwise, the standard would not have needed to be changed.
As to the meaning of user-accessible: Padding bits are accessible insofar as you can alwaye get at any bit of foo (including padding) by using bit-operations on ((unsigned char *)&foo)[…]. Be careful when modifying the padding bits, though: the result won't change the value of the integer, but might create be a trap-representation nevertheless. In case of C90, this is implicitly unspecified (as in not mentioned at all), in case of C99, it's implementation-defined.
This was not what the rationale quotation was about, though: the cited architecture represents 32-bit integers via two 16-bit integers. In case of unsigned types, the resulting integer has 32 value bits and a precision of 32; in case of signed integers, it only has 31 value bits and a precision of 30: one of the sign bits of the 16-bit integers is used as the sign bit of the 32-bit integer, the other one is ignored, thus creating a padding bit surrounded by value bits. Now, if you access a 32-bit signed integer as an unsigned integer (which is explicitly allowed and does not violate the C99 aliasing rules), the padding bit becomes a (user-accessible) value bit.

Related

Swapping an integer with a short using a generic function

Assume I have this generic function that swaps two variables:
void swap(void *v1, void *v2, int size){
char buffer[size];
memcpy(buffer, v1, size);
memcpy(v1, v2, size);
memcpy(v2, buffer, size);
}
It works fine, but I was wondering in what cases this might break. One case that comes to mind is when we have two different data types and the size specified is not enough to capture the bigger data. for example:
int x = 4444;
short y = 5;
swap(&x, &y, sizeof(short));
I'd expect that when I run this it would give an incorrect result, because memcpy would work with only 2 bytes (rather than 4) and part of the data would be lost or changed when dealing with x.
Surprisingly though, when I run it, it gives the correct answer on both my Windows 7 and Ubuntu operating systems. I know that Ubuntu and Windows differ in endianness but apparently that doesn't affect any of the two systems.
I want to know why the generic function works fine in this case.
To understand this fully you have to understand the C standard and the specifics of you machine and compiler. Starting with the C standard, here's some relevant snippets [The standard I'm using is WG14/N1256], summarized a little:
The object representation for a signed integer consists of value bits,
padding bits, and a sign bit. [section 6.2.6.2.2].
These bits are stored in a contiguous sequence of bytes. [section
6.2.6.1].
If there's N value bits, they represent powers of two from 2^0 to
2^{N-1}. [section 6.2.6.2].
The sign bit can have one of three meanings, one of which is that is
has value -2^N (two's complement) [section 6.2.6.2.2].
When you copy bytes from a short to an int, you're copying the value bits, padding bits and the sign bit of the short to bits of the int, but not necessarily preserving the meaning of the bits. Somewhat surprisingly, the standard allows this except it doesn't guarantee that the int you get will be valid if your target implementation has so-called "trap representations" and you're unlucky enough to generate one.
In practice, you've found on your machine and your compiler:
a short is represented by 2 bytes of 8 bits each.
The sign bit is bit 7 of the second byte
The value bits in ascending order of value are bits 0-7 of byte 0, and bits 0-6 of byte 1.
There's no padding bits
an int is represented by 4 bytes of 8 bits each.
The sign bit is bit 7 of the fourth byte
The value bits in ascending order of value are bits 0-7 of byte 0, 0-7 of byte 1, 0-7 of byte 2, and 0-6 of byte 3.
There's no padding bits
You would also find out that both representations use two's complement.
In pictures (where SS is the sign bit, and the numbers N correspond to a bit that has value 2^N):
short:
07-06-05-04-03-02-01-00 | SS-14-13-12-11-10-09-08
int:
07-06-05-04-03-02-01-00 | 15-14-13-12-11-10-09-08 | 23-22-21-20-19-18-17-16 | SS-30-29-28-27-26-25-24
You can see from this that if you copy the bytes of a short to the first two bytes of a zero int, you'll get the same value if the sign bit is zero (that is, the number is positive) because the value bits correspond exactly. As a corollary, you can also predict you'll get a different value if you start with a negative-valued short since the sign bit of the short has value -2^15 but the corresponding bit in the int has value 2^15.
The representation you've found on your machine is often summarized as "two's complement, little-endian", but the C standard provides a lot more flexibility in representations than that description suggests (even allowing a byte to have more than 8 bits), which is why portable code usually avoids relying on bit/byte representations of integral types.
As has already been pointed out in the comments the systems you are using are typically little-endian (least significant byte in the lowest address). Given that the memcpy sets the short to the lowest part of the int.
You might enjoy looking at Bit Twiddling Hacks for 'generic' ways to do swap operations.

Is there a C99 data type guaranteed to be at least two bytes?

To determine the endianness of a system, I plan to store a multi-byte integer value in a variable and access the first byte via an unsigned char wrapped in a union; for example:
union{
unsigned int val;
unsigned char first_byte;
} test;
test.val = 1; /* stored in little-endian system as "0x01 0x00 0x00 0x00" */
if(test.first_byte == 1){
printf("Little-endian system!");
}else{
printf("Big-endian system!");
}
I want to make this test portable across platforms, but I'm not sure if the C99 standard guarantees that the unsigned int data type will be greater than one byte in size. Furthermore, since a "C byte" does not technically have to be 8-bits in size, I cannot use exact width integer types (e.g. uint8_t, uint16_t, etc.).
Are there any C data types guaranteed by the C99 standard to be at least two bytes in size?
P.S. Assuming an unsigned int is in fact greater than one byte, would my union behave as I'm expecting (with the variable first_byte accessing the first byte in variable val) across all C99 compatible platforms?
Since int must have a range of at least 16 bits, int will meet your criterion on most practical systems. So would short (and long, and long long). If you want exactly 16 bits, you have to look to see whether int16_t and uint16_t are declared in <stdint.h>.
If you are worried about systems where CHAR_BIT is greater than 8, then you have to work harder. If CHAR_BIT is 32, then only long long is guaranteed to hold two characters.
What the C standard says about sizes of integer types
In a comment, Richard J Ross III says:
The standard says absolutely nothing about the size of an int except that it must be larger than or equal to short, so, for example, it could be 10 bits on some systems I've worked on.
On the contrary, the C standard has specifications on the lower bounds on the ranges that must be supported by different types, and a system with 10-bit int would not be conformant C.
Specifically, in ISO/IEC 9899:2011 §5.2.4.2.1 Sizes of integer types <limits.h>, it says:
¶1 The values given below shall be replaced by constant expressions suitable for use in #if
preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the
following shall be replaced by expressions that have the same type as would an
expression that is an object of the corresponding type converted according to the integer
promotions. Their implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.
— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
[...]
— minimum value for an object of type short int
SHRT_MIN -32767 // −(215 − 1)
— maximum value for an object of type short int
SHRT_MAX +32767 // 215 − 1
— maximum value for an object of type unsigned short int
USHRT_MAX 65535 // 216 − 1
— minimum value for an object of type int
INT_MIN -32767 // −(215 − 1)
— maximum value for an object of type int
INT_MAX +32767 // 215 − 1
— maximum value for an object of type unsigned int
UINT_MAX 65535 // 216 − 1
GCC provides some macros giving the endianness of a system: GCC common predefined macros
example (from the link supplied):
/* Test for a little-endian machine */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
Of course, this is only useful if you use gcc. Furthermore, conditional compilation for endianness can be considered harmful. Here is a nice article about this: The byte order fallacy.
I would prefer to do this using regular condtions to let the compiler check the other case. ie:
if (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
...
No, nothing is guaranteed to be larger than one byte -- but it is guaranteed that no (non-bitfield) type is smaller than one byte and that one byte can hold at 256 distinct values, so if you have an int8_t and an int16_t, then it's guaranteed that int8_t is one byte, so int16_t must be two bytes.
The C standard guarantees only that the size of char <= short <= int <= long <= long long [and likewise for unsigned]. So, theoretically, there can be systems that have only one size for all of the sizes.
If it REALLY is critical that this isn't going wrong on some particular architecture, I would add a piece of code to do something like if (sizeof(char) == sizeof(int)) exit_with_error("Can't do this...."); to the code.
In nearly all machines, int or short should be perfectly fine. I'm not actually aware of any machine where char and int are the same size, but I'm 99% sure that they do exist. Those machines may also have it's native byte != 8 bits, such as 9 or 14 bits, and words that are 14, 18 or 36 or 28 bits...
Take a look at the man page of stdint.h (uint_least16_t for 2 bytes)
At least according to http://en.wikipedia.org/wiki/C_data_types -- the size of an int is guaranteed to be two "char"s long. So, this test should work, although I'm wondering if there is a more appropriate solution. For one, with rare exception, most architectures would have their endianness set compile-time, and not runtime. There are a few architectures that can switch endianness, though (I believe ARM and PPC are configurable, but ARM is traditionally LE, and PPC is mostly BE).
A conforming implementation can have all its fundamental types of size 1 (and hold at least 32 bits worth of data). For such an implementation, however, the notion of endianness is not applicable.
Nothing forbids a conforming implementation to have, say, little-endian shorts and big-endian longs.
So there are three possible outcomes for each integral type: it could be big-endian, little-endian, or of size 1. Check each type separately for maximum theoretical portability. In practice this probably never happens.
Middle-endian types, or e.g. big-endian stuff on even-numbered pages only, are theoretically possible, but I would refrain from even thinking about such an implementation.
While the answer is basically "no", satisfying the interface requirements for the stdio functions requires that the range [0,UCHAR_MAX] fit in int, which creates an implicit requirement that sizeof(int) is greater than 1 on hosted implementations (freestanding implementations are free to omit stdio, and there's no reason they can't have sizeof(int)==1). So I think it's fairly safe to assume sizeof(int)>1.

What is this C syntax?

I have no idea what to call it, so I have no idea how to search for it.
unsigned int odd : 1;
Edit:
To elaborate, it comes from this snippet:
struct bitField {
unsigned int odd : 1;
unsigned int padding: 15; // to round out to 16 bits
};
I gather this involves bits, but I'm still not all the way understanding.
They are bitfields. odd and padding will be stored in one unsigned int (16 bit) where odd will occupy the lowest bit, and padding the upper 15 bit of the unsigned int.
It's a bitfield - Check the C FAQ.
It's:
1 bit of "odd" (e.g. 1)
15 bits of "padding" (e.g. 0000000000000001)
and (potentially) whatever other bits round out the unsigned int. In modern 32-bit platforms where this is 32 bits, you'll see another 16 0s in memory (but not in the struct). (In this case sizeof returns 4)
Bitfields can save memory but potentially add instructions to computations. In some cases compilers may ignore your bitfield settings. You can't make any assumptions about how the compiler will choose to actually lay out your bit field, and it can depend on the endianness of your platform.
The main thing I use bitfields for is when I know I will be doing a lot of copying of the data, and not necessarily a lot of computation on or reference of the specific fields in the bit field.

Is the size of C "int" 2 bytes or 4 bytes?

Does an Integer variable in C occupy 2 bytes or 4 bytes? What are the factors that it depends on?
Most of the textbooks say integer variables occupy 2 bytes.
But when I run a program printing the successive addresses of an array of integers it shows the difference of 4.
I know it's equal to sizeof(int). The size of an int is really compiler dependent. Back in the day, when processors were 16 bit, an int was 2 bytes. Nowadays, it's most often 4 bytes on a 32-bit as well as 64-bit systems.
Still, using sizeof(int) is the best way to get the size of an integer for the specific system the program is executed on.
EDIT: Fixed wrong statement that int is 8 bytes on most 64-bit systems. For example, it is 4 bytes on 64-bit GCC.
This is one of the points in C that can be confusing at first, but the C standard only specifies a minimum range for integer types that is guaranteed to be supported. int is guaranteed to be able to hold -32767 to 32767, which requires 16 bits. In that case, int, is 2 bytes. However, implementations are free to go beyond that minimum, as you will see that many modern compilers make int 32-bit (which also means 4 bytes pretty ubiquitously).
The reason your book says 2 bytes is most probably because it's old. At one time, this was the norm. In general, you should always use the sizeof operator if you need to find out how many bytes it is on the platform you're using.
To address this, C99 added new types where you can explicitly ask for a certain sized integer, for example int16_t or int32_t. Prior to that, there was no universal way to get an integer of a specific width (although most platforms provided similar types on a per-platform basis).
There's no specific answer. It depends on the platform. It is implementation-defined. It can be 2, 4 or something else.
The idea behind int was that it was supposed to match the natural "word" size on the given platform: 16 bit on 16-bit platforms, 32 bit on 32-bit platforms, 64 bit on 64-bit platforms, you get the idea. However, for backward compatibility purposes some compilers prefer to stick to 32-bit int even on 64-bit platforms.
The time of 2-byte int is long gone though (16-bit platforms?) unless you are using some embedded platform with 16-bit word size. Your textbooks are probably very old.
The answer to this question depends on which platform you are using.
But irrespective of platform, you can reliably assume the following types:
[8-bit] signed char: -127 to 127
[8-bit] unsigned char: 0 to 255
[16-bit]signed short: -32767 to 32767
[16-bit]unsigned short: 0 to 65535
[32-bit]signed long: -2147483647 to 2147483647
[32-bit]unsigned long: 0 to 4294967295
[64-bit]signed long long: -9223372036854775807 to 9223372036854775807
[64-bit]unsigned long long: 0 to 18446744073709551615
C99 N1256 standard draft
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
The size of int and all other integer types are implementation defined, C99 only specifies:
minimum size guarantees
relative sizes between the types
5.2.4.2.1 "Sizes of integer types <limits.h>" gives the minimum sizes:
1 [...] Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown [...]
UCHAR_MAX 255 // 2 8 − 1
USHRT_MAX 65535 // 2 16 − 1
UINT_MAX 65535 // 2 16 − 1
ULONG_MAX 4294967295 // 2 32 − 1
ULLONG_MAX 18446744073709551615 // 2 64 − 1
6.2.5 "Types" then says:
8 For any two integer types with the same signedness and different integer conversion rank
(see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a
subrange of the values of the other type.
and 6.3.1.1 "Boolean, characters, and integers" determines the relative conversion ranks:
1 Every integer type has an integer conversion rank defined as follows:
The rank of long long int shall be greater than the rank of long int, which
shall be greater than the rank of int, which shall be greater than the rank of short
int, which shall be greater than the rank of signed char.
The rank of any unsigned integer type shall equal the rank of the corresponding
signed integer type, if any.
For all integer types T1, T2, and T3, if T1 has greater rank than T2 and T2 has
greater rank than T3, then T1 has greater rank than T3
Does an Integer variable in C occupy 2 bytes or 4 bytes?
That depends on the platform you're using, as well as how your compiler is configured. The only authoritative answer is to use the sizeof operator to see how big an integer is in your specific situation.
What are the factors that it depends on?
Range might be best considered, rather than size. Both will vary in practice, though it's much more fool-proof to choose variable types by range than size as we shall see. It's also important to note that the standard encourages us to consider choosing our integer types based on range rather than size, but for now let's ignore the standard practice, and let our curiosity explore sizeof, bytes and CHAR_BIT, and integer representation... let's burrow down the rabbit hole and see it for ourselves...
sizeof, bytes and CHAR_BIT
The following statement, taken from the C standard (linked to above), describes this in words that I don't think can be improved upon.
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand.
Assuming a clear understanding will lead us to a discussion about bytes. It's commonly assumed that a byte is eight bits, when in fact CHAR_BIT tells you how many bits are in a byte. That's just another one of those nuances which isn't considered when talking about the common two (or four) byte integers.
Let's wrap things up so far:
sizeof => size in bytes, and
CHAR_BIT => number of bits in byte
Thus, Depending on your system, sizeof (unsigned int) could be any value greater than zero (not just 2 or 4), as if CHAR_BIT is 16, then a single (sixteen-bit) byte has enough bits in it to represent the sixteen bit integer described by the standards (quoted below). That's not necessarily useful information, is it? Let's delve deeper...
Integer representation
The C standard specifies the minimum precision/range for all standard integer types (and CHAR_BIT, too, fwiw) here. From this, we can derive a minimum for how many bits are required to store the value, but we may as well just choose our variables based on ranges. Nonetheless, a huge part of the detail required for this answer resides here. For example, the following that the standard unsigned int requires (at least) sixteen bits of storage:
UINT_MAX 65535 // 2¹⁶ - 1
Thus we can see that unsigned int require (at least) 16 bits, which is where you get the two bytes (assuming CHAR_BIT is 8)... and later when that limit increased to 2³² - 1, people were stating 4 bytes instead. This explains the phenomena you've observed:
Most of the textbooks say integer variables occupy 2 bytes. But when I run a program printing the successive addresses of an array of integers it shows the difference of 4.
You're using an ancient textbook and compiler which is teaching you non-portable C; the author who wrote your textbook might not even be aware of CHAR_BIT. You should upgrade your textbook (and compiler), and strive to remember that I.T. is an ever-evolving field that you need to stay ahead of to compete... Enough about that, though; let's see what other non-portable secrets those underlying integer bytes store...
Value bits are what the common misconceptions appear to be counting. The above example uses an unsigned integer type which typically contains only value bits, so it's easy to miss the devil in the detail.
Sign bits... In the above example I quoted UINT_MAX as being the upper limit for unsigned int because it's a trivial example to extract the value 16 from the comment. For signed types, in order to distinguish between positive and negative values (that's the sign), we need to also include the sign bit.
INT_MIN -32768 // -(2¹⁵)
INT_MAX +32767 // 2¹⁵ - 1
Padding bits... While it's not common to encounter computers that have padding bits in integers, the C standard allows that to happen; some machines (i.e. this one) implement larger integer types by combining two smaller (signed) integer values together... and when you combine signed integers, you get a wasted sign bit. That wasted bit is considered padding in C. Other examples of padding bits might include parity bits and trap bits.
As you can see, the standard seems to encourage considering ranges like INT_MIN..INT_MAX and other minimum/maximum values from the standard when choosing integer types, and discourages relying upon sizes as there are other subtle factors likely to be forgotten such as CHAR_BIT and padding bits which might affect the value of sizeof (int) (i.e. the common misconceptions of two-byte and four-byte integers neglects these details).
The only guarantees are that char must be at least 8 bits wide, short and int must be at least 16 bits wide, and long must be at least 32 bits wide, and that sizeof (char) <= sizeof (short) <= sizeof (int) <= sizeof (long) (same is true for the unsigned versions of those types).
int may be anywhere from 16 to 64 bits wide depending on the platform.
Is the size of C “int” 2 bytes or 4 bytes?
The answer is "yes" / "no" / "maybe" / "maybe not".
The C programming language specifies the following: the smallest addressable unit, known by char and also called "byte", is exactly CHAR_BIT bits wide, where CHAR_BIT is at least 8.
So, one byte in C is not necessarily an octet, i.e. 8 bits. In the past one of the first platforms to run C code (and Unix) had 4-byte int - but in total int had 36 bits, because CHAR_BIT was 9!
int is supposed to be the natural integer size for the platform that has range of at least -32767 ... 32767. You can get the size of int in the platform bytes with sizeof(int); when you multiply this value by CHAR_BIT you will know how wide it is in bits.
While 36-bit machines are mostly dead, there are still platforms with non-8-bit bytes. Just yesterday there was a question about a Texas Instruments MCU with 16-bit bytes, that has a C99, C11-compliant compiler.
On TMS320C28x it seems that char, short and int are all 16 bits wide, and hence one byte. long int is 2 bytes and long long int is 4 bytes. The beauty of C is that one can still write an efficient program for a platform like this, and even do it in a portable manner!
Mostly it depends on the platform you are using .It depends from compiler to compiler.Nowadays in most of compilers int is of 4 bytes.
If you want to check what your compiler is using you can use sizeof(int).
main()
{
printf("%d",sizeof(int));
printf("%d",sizeof(short));
printf("%d",sizeof(long));
}
The only thing c compiler promise is that size of short must be equal or less than int and size of long must be equal or more than int.So if size of int is 4 ,then size of short may be 2 or 4 but not larger than that.Same is true for long and int. It also says that size of short and long can not be same.
This depends on implementation, but usually on x86 and other popular architectures like ARM ints take 4 bytes. You can always check at compile time using sizeof(int) or whatever other type you want to check.
If you want to make sure you use a type of a specific size, use the types in <stdint.h>
#include <stdio.h>
int main(void) {
printf("size of int: %d", (int)sizeof(int));
return 0;
}
This returns 4, but it's probably machine dependant.
Is the size of C “int” 2 bytes or 4 bytes?
Does an Integer variable in C occupy 2 bytes or 4 bytes?
C allows "bytes" to be something other than 8 bits per "byte".
CHAR_BIT number of bits for smallest object that is not a bit-field (byte) C11dr §5.2.4.2.1 1
A value of something than 8 is increasingly uncommon. For maximum portability, use CHAR_BIT rather than 8. The size of an int in bits in C is sizeof(int) * CHAR_BIT.
#include <limits.h>
printf("(int) Bit size %zu\n", sizeof(int) * CHAR_BIT);
What are the factors that it depends on?
The int bit size is commonly 32 or 16 bits. C specified minimum ranges:
minimum value for an object of type int INT_MIN -32767
maximum value for an object of type int INT_MAX +32767
C11dr §5.2.4.2.1 1
The minimum range for int forces the bit size to be at least 16 - even if the processor was "8-bit". A size like 64 bits is seen in specialized processors. Other values like 18, 24, 36, etc. have occurred on historic platforms or are at least theoretically possible. Modern coding rarely worries about non-power-of-2 int bit sizes.
The computer's processor and architecture drive the int bit size selection.
Yet even with 64-bit processors, the compiler's int size may be 32-bit for compatibility reasons as large code bases depend on int being 32-bit (or 32/16).
This is a good source for answering this question.
But this question is a kind of a always truth answere "Yes. Both."
It depends on your architecture. If you're going to work on a 16-bit machine or less, it can't be 4 byte (=32 bit). If you're working on a 32-bit or better machine, its length is 32-bit.
To figure out, get you program ready to output something readable and use the "sizeof" function. That returns the size in bytes of your declared datatype. But be carfull using this with arrays.
If you're declaring int t[12]; it will return 12*4 byte. To get the length of this array, just use sizeof(t)/sizeof(t[0]).
If you are going to build up a function, that should calculate the size of a send array, remember that if
typedef int array[12];
int function(array t){
int size_of_t = sizeof(t)/sizeof(t[0]);
return size_of_t;
}
void main(){
array t = {1,1,1}; //remember: t= [1,1,1,0,...,0]
int a = function(t); //remember: sending t is just a pointer and equal to int* t
print(a); // output will be 1, since t will be interpreted as an int itselve.
}
So this won't even return something different. If you define an array and try to get the length afterwards, use sizeof. If you send an array to a function, remember the send value is just a pointer on the first element. But in case one, you always knows, what size your array has. Case two can be figured out by defining two functions and miss some performance. Define function(array t) and define function2(array t, int size_of_t). Call "function(t)" measure the length by some copy-work and send the result to function2, where you can do whatever you want on variable array-sizes.

Type to use to represent a byte in ANSI (C89/90) C?

Is there a standards-complaint method to represent a byte in ANSI (C89/90) C? I know that, most often, a char happens to be a byte, but my understanding is that this is not guaranteed to be the case. Also, there is stdint.h in the C99 standard, but what was used before C99?
I'm curious about both 8 bits specifically, and a "byte" (sizeof(x) == 1).
char is always a byte , but it's not always an octet. A byte is the smallest addressable unit of memory (in most definitions), an octet is 8-bit unit of memory.
That is, sizeof(char) is always 1 for all implementations, but CHAR_BIT macro in limits.h defines the size of a byte for a platform and it is not always 8 bit. There are platforms with 16-bit and 32-bit bytes, hence char will take up more bits, but it is still a byte. Since required range for char is at least -127 to 127 (or 0 to 255), it will be at least 8 bit on all platforms.
ISO/IEC 9899:TC3
6.5.3.4 The sizeof operator
...
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. [...]
When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. [...]
Emphasis mine.
You can always represent a byte (if you mean 8bits) in a unsigned char. It's always at least 8 bits in size, all bits making up the value, so a 8 bit value will always fit into it.
If you want exactly 8 bits, i also think you'll have to use platform dependent ways. POSIX systems seem to be required to support int8_t. That means that on POSIX systems, char (and thus a byte) is always 8 bits.
In ANSI C89/ISO C90 sizeof(char) == 1. However, it is not always the case that 1 byte is 8 bits. If you wish to count the number of bits in 1 byte (and you don't have access to limits.h), I suggest the following:
unsigned int bitnum(void) {
unsigned char c = ~0u; /* Thank you Jonathan. */
unsigned int v;
for(v = 0u; c; ++v)
c &= c - 1u;
return(v);
}
Here we use Kernighan's method to count the number of bits set in c. To better understand the code above (or see others like it), I refer you to "Bit Twiddling Hacks".
Before C99? Platform-dependent code.
But why do you care? Just use stdint.h.
In every implementation of C I have used (from old UNIX to embedded compilers written by hardware engineers to big-vendor compilers) char has always been 8-bit.
You can find pretty reliable macros and typedefs in boost.
I notice that some answered have re-defined the word byte to mean something other than 8 bits.
A byte is 8 bits, however in some c implementations char is 16 bits (2 bytes) or 8 bits (1 byte). The people that are calling a byte 'smallest addressable unit of memory' or some such garbage have lost grasp of the meaning of byte (8 bits).
The reason that some implementations of C have 16 bit chars (2 bytes) and some have 8 bit chars (1 byte), and there is no standard type called 'byte', is due to laziness.
So, we should use int_8

Resources