What is this C syntax?

What is this C syntax? - c

I have no idea what to call it, so I have no idea how to search for it.
unsigned int odd : 1;
Edit:
To elaborate, it comes from this snippet:
struct bitField {
unsigned int odd : 1;
unsigned int padding: 15; // to round out to 16 bits
};
I gather this involves bits, but I'm still not all the way understanding.

They are bitfields. odd and padding will be stored in one unsigned int (16 bit) where odd will occupy the lowest bit, and padding the upper 15 bit of the unsigned int.

It's a bitfield - Check the C FAQ.

It's:
1 bit of "odd" (e.g. 1)
15 bits of "padding" (e.g. 0000000000000001)
and (potentially) whatever other bits round out the unsigned int. In modern 32-bit platforms where this is 32 bits, you'll see another 16 0s in memory (but not in the struct). (In this case sizeof returns 4)
Bitfields can save memory but potentially add instructions to computations. In some cases compilers may ignore your bitfield settings. You can't make any assumptions about how the compiler will choose to actually lay out your bit field, and it can depend on the endianness of your platform.
The main thing I use bitfields for is when I know I will be doing a lot of copying of the data, and not necessarily a lot of computation on or reference of the specific fields in the bit field.

Related

Best way to assign 32 bit value to 64 bit variable and guarantee top 32 bits are 0 in C

I have some C code that has variables which are either ints, or cast to int for a period of time for easy use (what we care about is the bit value). Int will always be 32 bit in this case. At one point some of them are assigned to a 64 bit variable, some implicitly and some explicitly:
long long 64bitfoo = 32bitbar;
long long 64bitfoo = (long long)32bitbar;
This has not been a problem in the past, but recently I ran into a case where after this conversion the top 32 bits of the 64 bit variable are not 0. It seems that some specific version of events can more or less populate the top bits with garbage (or just choose a previously used memory location and not clear it out correctly). This won't do, so I'm looking at solutions.
I can obviosuly do something like this:
long long 64bitfoo = 32bitbar;
64bitfoo &= ~0xFFFFFFFF00000000;
to clear out the top bits, and this should work for what I need, but I feel like there are better options. So far this has only shown up on values that use the implicit casting, so I'm curious if there is a difference between implicit and explicit casting that would allow explicit casting to handle this itself?(unfortunately I currently can't just add the explicit casting and do a test, the conditions to trigger this are complex and not easily replicated, so code changes need to be pretty firm and not guesses).
I'm sure there might be other options as well, doing something instead of just using = to set the value, a different way to clear the top 32 bits that is better, or some way of setting the initial 64 bit value to guarantee the top bits stay clear if only the bottom bits are set (the 64 bit variable sometimes gets other 64 bit variables assigned to it, so it can't have the top bits forced to 0 at all times). Wasn't finding a lot when searching, this doesn't seem to be something that comes up much.
edit: I forgot to mention that there is instances where it being signed doesn't seem like the problem. One example is the initial value was 0xF8452370, then the long long value was shown as -558965697074093200, which is 0xF83E27A8F8452370. So the bottom 32 bits are the same, but the top 32 bits are not just 1's, but a scattering of 1's and 0's. As far as I understand, there's no reason signed vs unsigned would do this (all 1's sure), but I could definitely be mistaken.
Also, the 64 bit variable I think needs to be signed, as at other instances it takes in values that need to be either negative or positive (actual integers) vs in these instances where it just needs to keep track of the bit values. It is a very multi-use variable and I do not have the ability to make it not multi-use.
edit2: Its very possible I am asking the wrong question here, trying to keep an eye on that. But I am working within restrictions, so the actual problem might be something else, and I might just be stuck adding a bandaid for now. The quick rundown is this:
There is a 64 bit variable that is long long (or __int64 on certain systems, but in the instances I am running into it should always be long long). I can not change what it is, or make it unsigned.
I have a function returning a 32 bit memory address. I need to assign that 32 bit memory address (not as a pointer, but as the actual value of the memory location) to this 64 bit variable.
In these cases I need the top 32 bits of the 64 bit variable to be 0, and the bottom 32 bits to be the same as the original value. Sometimes they are not 0, but they aren't always 1.
Because I can't change the 64 bit variable to unsigned I think my best option, with what I have, is to manually clear the top 32 bits, and am looking for the best way to do that.

You're running into sign extension -- casting a negative signed value to a larger type will "extend" the sign bit of the original value to all the upper bits of the new type, so that the numeric value is preserved. For instance, (int8_t) 0xFC = -4 converts to (int16_t) 0xFFFC = -4. The extra bits aren't "garbage"; they have a very specific purpose and meaning.
If you want to avoid this, cast through an unsigned type. For example:
long long sixtyfourbits = (unsigned int) thirtytwobits;
As a side point, I'd advise that you use the <stdint.h> integer types throughout your code if you care about their size -- for instance, use int64_t instead of long long, and uint32_t instead of unsigned int. The names will more clearly indicate your intent, and there are some platforms which use different sizes for standard C types. (For instance, AVR microcontrollers use a 16-bit int.)

what we care about is the bit value
Then you should stay away from signed types and always use unsigned.
When a signed (or unsigned) type is converted to a bigger size of the same type, the value is preserved, i.e. 19 becomes 19 and -19 becomes -19.
But signed types doesn't always preserve the binary pattern by adding zeros in the front when going from a smaller type to a bigger type whereas unsigned types do.
For 2's complement (the most common representation of signed types), all negative values will be signed extended which simply means that ones are added in front instead of zero
SIGNED:
8 bit: -3 -> FD
16 bit: -3 -> FFFD
32 bit: -3 -> FFFFFFFD
64 bit: -3 -> FFFFFFFFFFFFFFFD
UNSIGNED:
8 bit: 253 -> FD
16 bit: 253 -> 00FD
32 bit: 253 -> 000000FD
64 bit: 253 -> 00000000000000FD
It seems that some specific version of events can more or less populate the top bits with garbage
No, either the new extra bits will be all zeros or they will be all ones.
If that isn't the case your system doesn't comply to the C standard.

'Union' is the nature way in C language for this question.
typedef union {
struct {
int low; // low 32 bits
int high; // high 32 bits
} e; // 32bits mimic x86 CPU eax
__int64 r; // 64bits mimic x86 CPU rax
} Union64;
Union64 data;
data.r = 0x1122334455667788;
Then data.e.high will be 0x11223344
and data.e.low will be 0x55667788
Vise versa
data.e.high = 0xaabbccdd;
data.e.low = 0x99eeff00;
Then data.r will be 0xaabbccdd99eeff00
In your case
data.r = 0; // guarantee data.e.high is cleared
data.e.low = 32bitbar; // say 0x11223344
Then data.r will be 0x0000000011223344;
This is exactly what union is for.

Bitfields bigger than a long long?

Is it possible to declare a bitfield of very large numbers e.g.
struct binfield{
uber_int field : 991735910442856976773698036458045320070701875088740942522886681;
}wordlist;
just to clarify, i'm not trying to represent that number in 256bit, that's how many bits I want to use. Or maybe there aren't that many bits in my computer?

C does not support numeric data-types of arbitrary size. You can only use those integer sizes which are provided by the compiler, and when you want your code to be portable, you better stick to the minimum guaranteed sizes for the standardized types of char (8 bit), short (16 bit), and long (32 bit) and long long (64 bit).
But what you can do instead is create a char[]. A char is always at least 8 bit (and is not more than 8 bit either except on some very exotic platforms). So you can use an array of char to store as many bit-values as you can afford memory. However, when you want to use a char array as a bitfield you will need some boilerplate code to access the correct byte.
For example, to get the value of bit n of a char array, use
bitfield[n/8] >> n%8 & 0x1

Swapping an integer with a short using a generic function

Assume I have this generic function that swaps two variables:
void swap(void *v1, void *v2, int size){
char buffer[size];
memcpy(buffer, v1, size);
memcpy(v1, v2, size);
memcpy(v2, buffer, size);
}
It works fine, but I was wondering in what cases this might break. One case that comes to mind is when we have two different data types and the size specified is not enough to capture the bigger data. for example:
int x = 4444;
short y = 5;
swap(&x, &y, sizeof(short));
I'd expect that when I run this it would give an incorrect result, because memcpy would work with only 2 bytes (rather than 4) and part of the data would be lost or changed when dealing with x.
Surprisingly though, when I run it, it gives the correct answer on both my Windows 7 and Ubuntu operating systems. I know that Ubuntu and Windows differ in endianness but apparently that doesn't affect any of the two systems.
I want to know why the generic function works fine in this case.

To understand this fully you have to understand the C standard and the specifics of you machine and compiler. Starting with the C standard, here's some relevant snippets [The standard I'm using is WG14/N1256], summarized a little:
The object representation for a signed integer consists of value bits,
padding bits, and a sign bit. [section 6.2.6.2.2].
These bits are stored in a contiguous sequence of bytes. [section
6.2.6.1].
If there's N value bits, they represent powers of two from 2^0 to
2^{N-1}. [section 6.2.6.2].
The sign bit can have one of three meanings, one of which is that is
has value -2^N (two's complement) [section 6.2.6.2.2].
When you copy bytes from a short to an int, you're copying the value bits, padding bits and the sign bit of the short to bits of the int, but not necessarily preserving the meaning of the bits. Somewhat surprisingly, the standard allows this except it doesn't guarantee that the int you get will be valid if your target implementation has so-called "trap representations" and you're unlucky enough to generate one.
In practice, you've found on your machine and your compiler:
a short is represented by 2 bytes of 8 bits each.
The sign bit is bit 7 of the second byte
The value bits in ascending order of value are bits 0-7 of byte 0, and bits 0-6 of byte 1.
There's no padding bits
an int is represented by 4 bytes of 8 bits each.
The sign bit is bit 7 of the fourth byte
The value bits in ascending order of value are bits 0-7 of byte 0, 0-7 of byte 1, 0-7 of byte 2, and 0-6 of byte 3.
There's no padding bits
You would also find out that both representations use two's complement.
In pictures (where SS is the sign bit, and the numbers N correspond to a bit that has value 2^N):
short:
07-06-05-04-03-02-01-00 | SS-14-13-12-11-10-09-08
int:
07-06-05-04-03-02-01-00 | 15-14-13-12-11-10-09-08 | 23-22-21-20-19-18-17-16 | SS-30-29-28-27-26-25-24
You can see from this that if you copy the bytes of a short to the first two bytes of a zero int, you'll get the same value if the sign bit is zero (that is, the number is positive) because the value bits correspond exactly. As a corollary, you can also predict you'll get a different value if you start with a negative-valued short since the sign bit of the short has value -2^15 but the corresponding bit in the int has value 2^15.
The representation you've found on your machine is often summarized as "two's complement, little-endian", but the C standard provides a lot more flexibility in representations than that description suggests (even allowing a byte to have more than 8 bits), which is why portable code usually avoids relying on bit/byte representations of integral types.

As has already been pointed out in the comments the systems you are using are typically little-endian (least significant byte in the lowest address). Given that the memcpy sets the short to the lowest part of the int.
You might enjoy looking at Bit Twiddling Hacks for 'generic' ways to do swap operations.

Is the size of C "int" 2 bytes or 4 bytes?

Does an Integer variable in C occupy 2 bytes or 4 bytes? What are the factors that it depends on?
Most of the textbooks say integer variables occupy 2 bytes.
But when I run a program printing the successive addresses of an array of integers it shows the difference of 4.

I know it's equal to sizeof(int). The size of an int is really compiler dependent. Back in the day, when processors were 16 bit, an int was 2 bytes. Nowadays, it's most often 4 bytes on a 32-bit as well as 64-bit systems.
Still, using sizeof(int) is the best way to get the size of an integer for the specific system the program is executed on.
EDIT: Fixed wrong statement that int is 8 bytes on most 64-bit systems. For example, it is 4 bytes on 64-bit GCC.

This is one of the points in C that can be confusing at first, but the C standard only specifies a minimum range for integer types that is guaranteed to be supported. int is guaranteed to be able to hold -32767 to 32767, which requires 16 bits. In that case, int, is 2 bytes. However, implementations are free to go beyond that minimum, as you will see that many modern compilers make int 32-bit (which also means 4 bytes pretty ubiquitously).
The reason your book says 2 bytes is most probably because it's old. At one time, this was the norm. In general, you should always use the sizeof operator if you need to find out how many bytes it is on the platform you're using.
To address this, C99 added new types where you can explicitly ask for a certain sized integer, for example int16_t or int32_t. Prior to that, there was no universal way to get an integer of a specific width (although most platforms provided similar types on a per-platform basis).

There's no specific answer. It depends on the platform. It is implementation-defined. It can be 2, 4 or something else.
The idea behind int was that it was supposed to match the natural "word" size on the given platform: 16 bit on 16-bit platforms, 32 bit on 32-bit platforms, 64 bit on 64-bit platforms, you get the idea. However, for backward compatibility purposes some compilers prefer to stick to 32-bit int even on 64-bit platforms.
The time of 2-byte int is long gone though (16-bit platforms?) unless you are using some embedded platform with 16-bit word size. Your textbooks are probably very old.

The answer to this question depends on which platform you are using.
But irrespective of platform, you can reliably assume the following types:
[8-bit] signed char: -127 to 127
[8-bit] unsigned char: 0 to 255
[16-bit]signed short: -32767 to 32767
[16-bit]unsigned short: 0 to 65535
[32-bit]signed long: -2147483647 to 2147483647
[32-bit]unsigned long: 0 to 4294967295
[64-bit]signed long long: -9223372036854775807 to 9223372036854775807
[64-bit]unsigned long long: 0 to 18446744073709551615

C99 N1256 standard draft
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
The size of int and all other integer types are implementation defined, C99 only specifies:
minimum size guarantees
relative sizes between the types
5.2.4.2.1 "Sizes of integer types <limits.h>" gives the minimum sizes:
1 [...] Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown [...]
UCHAR_MAX 255 // 2 8 − 1
USHRT_MAX 65535 // 2 16 − 1
UINT_MAX 65535 // 2 16 − 1
ULONG_MAX 4294967295 // 2 32 − 1
ULLONG_MAX 18446744073709551615 // 2 64 − 1
6.2.5 "Types" then says:
8 For any two integer types with the same signedness and different integer conversion rank
(see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a
subrange of the values of the other type.
and 6.3.1.1 "Boolean, characters, and integers" determines the relative conversion ranks:
1 Every integer type has an integer conversion rank defined as follows:
The rank of long long int shall be greater than the rank of long int, which
shall be greater than the rank of int, which shall be greater than the rank of short
int, which shall be greater than the rank of signed char.
The rank of any unsigned integer type shall equal the rank of the corresponding
signed integer type, if any.
For all integer types T1, T2, and T3, if T1 has greater rank than T2 and T2 has
greater rank than T3, then T1 has greater rank than T3

Does an Integer variable in C occupy 2 bytes or 4 bytes?
That depends on the platform you're using, as well as how your compiler is configured. The only authoritative answer is to use the sizeof operator to see how big an integer is in your specific situation.
What are the factors that it depends on?
Range might be best considered, rather than size. Both will vary in practice, though it's much more fool-proof to choose variable types by range than size as we shall see. It's also important to note that the standard encourages us to consider choosing our integer types based on range rather than size, but for now let's ignore the standard practice, and let our curiosity explore sizeof, bytes and CHAR_BIT, and integer representation... let's burrow down the rabbit hole and see it for ourselves...
sizeof, bytes and CHAR_BIT
The following statement, taken from the C standard (linked to above), describes this in words that I don't think can be improved upon.
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand.
Assuming a clear understanding will lead us to a discussion about bytes. It's commonly assumed that a byte is eight bits, when in fact CHAR_BIT tells you how many bits are in a byte. That's just another one of those nuances which isn't considered when talking about the common two (or four) byte integers.
Let's wrap things up so far:
sizeof => size in bytes, and
CHAR_BIT => number of bits in byte
Thus, Depending on your system, sizeof (unsigned int) could be any value greater than zero (not just 2 or 4), as if CHAR_BIT is 16, then a single (sixteen-bit) byte has enough bits in it to represent the sixteen bit integer described by the standards (quoted below). That's not necessarily useful information, is it? Let's delve deeper...
Integer representation
The C standard specifies the minimum precision/range for all standard integer types (and CHAR_BIT, too, fwiw) here. From this, we can derive a minimum for how many bits are required to store the value, but we may as well just choose our variables based on ranges. Nonetheless, a huge part of the detail required for this answer resides here. For example, the following that the standard unsigned int requires (at least) sixteen bits of storage:
UINT_MAX 65535 // 2¹⁶ - 1
Thus we can see that unsigned int require (at least) 16 bits, which is where you get the two bytes (assuming CHAR_BIT is 8)... and later when that limit increased to 2³² - 1, people were stating 4 bytes instead. This explains the phenomena you've observed:
Most of the textbooks say integer variables occupy 2 bytes. But when I run a program printing the successive addresses of an array of integers it shows the difference of 4.
You're using an ancient textbook and compiler which is teaching you non-portable C; the author who wrote your textbook might not even be aware of CHAR_BIT. You should upgrade your textbook (and compiler), and strive to remember that I.T. is an ever-evolving field that you need to stay ahead of to compete... Enough about that, though; let's see what other non-portable secrets those underlying integer bytes store...
Value bits are what the common misconceptions appear to be counting. The above example uses an unsigned integer type which typically contains only value bits, so it's easy to miss the devil in the detail.
Sign bits... In the above example I quoted UINT_MAX as being the upper limit for unsigned int because it's a trivial example to extract the value 16 from the comment. For signed types, in order to distinguish between positive and negative values (that's the sign), we need to also include the sign bit.
INT_MIN -32768 // -(2¹⁵)
INT_MAX +32767 // 2¹⁵ - 1
Padding bits... While it's not common to encounter computers that have padding bits in integers, the C standard allows that to happen; some machines (i.e. this one) implement larger integer types by combining two smaller (signed) integer values together... and when you combine signed integers, you get a wasted sign bit. That wasted bit is considered padding in C. Other examples of padding bits might include parity bits and trap bits.
As you can see, the standard seems to encourage considering ranges like INT_MIN..INT_MAX and other minimum/maximum values from the standard when choosing integer types, and discourages relying upon sizes as there are other subtle factors likely to be forgotten such as CHAR_BIT and padding bits which might affect the value of sizeof (int) (i.e. the common misconceptions of two-byte and four-byte integers neglects these details).

The only guarantees are that char must be at least 8 bits wide, short and int must be at least 16 bits wide, and long must be at least 32 bits wide, and that sizeof (char) <= sizeof (short) <= sizeof (int) <= sizeof (long) (same is true for the unsigned versions of those types).
int may be anywhere from 16 to 64 bits wide depending on the platform.

Is the size of C “int” 2 bytes or 4 bytes?
The answer is "yes" / "no" / "maybe" / "maybe not".
The C programming language specifies the following: the smallest addressable unit, known by char and also called "byte", is exactly CHAR_BIT bits wide, where CHAR_BIT is at least 8.
So, one byte in C is not necessarily an octet, i.e. 8 bits. In the past one of the first platforms to run C code (and Unix) had 4-byte int - but in total int had 36 bits, because CHAR_BIT was 9!
int is supposed to be the natural integer size for the platform that has range of at least -32767 ... 32767. You can get the size of int in the platform bytes with sizeof(int); when you multiply this value by CHAR_BIT you will know how wide it is in bits.
While 36-bit machines are mostly dead, there are still platforms with non-8-bit bytes. Just yesterday there was a question about a Texas Instruments MCU with 16-bit bytes, that has a C99, C11-compliant compiler.
On TMS320C28x it seems that char, short and int are all 16 bits wide, and hence one byte. long int is 2 bytes and long long int is 4 bytes. The beauty of C is that one can still write an efficient program for a platform like this, and even do it in a portable manner!

Mostly it depends on the platform you are using .It depends from compiler to compiler.Nowadays in most of compilers int is of 4 bytes.
If you want to check what your compiler is using you can use sizeof(int).
main()
{
printf("%d",sizeof(int));
printf("%d",sizeof(short));
printf("%d",sizeof(long));
}
The only thing c compiler promise is that size of short must be equal or less than int and size of long must be equal or more than int.So if size of int is 4 ,then size of short may be 2 or 4 but not larger than that.Same is true for long and int. It also says that size of short and long can not be same.

This depends on implementation, but usually on x86 and other popular architectures like ARM ints take 4 bytes. You can always check at compile time using sizeof(int) or whatever other type you want to check.
If you want to make sure you use a type of a specific size, use the types in <stdint.h>

#include <stdio.h>
int main(void) {
printf("size of int: %d", (int)sizeof(int));
return 0;
}
This returns 4, but it's probably machine dependant.

Is the size of C “int” 2 bytes or 4 bytes?
Does an Integer variable in C occupy 2 bytes or 4 bytes?
C allows "bytes" to be something other than 8 bits per "byte".
CHAR_BIT number of bits for smallest object that is not a bit-field (byte) C11dr §5.2.4.2.1 1
A value of something than 8 is increasingly uncommon. For maximum portability, use CHAR_BIT rather than 8. The size of an int in bits in C is sizeof(int) * CHAR_BIT.
#include <limits.h>
printf("(int) Bit size %zu\n", sizeof(int) * CHAR_BIT);
What are the factors that it depends on?
The int bit size is commonly 32 or 16 bits. C specified minimum ranges:
minimum value for an object of type int INT_MIN -32767
maximum value for an object of type int INT_MAX +32767
C11dr §5.2.4.2.1 1
The minimum range for int forces the bit size to be at least 16 - even if the processor was "8-bit". A size like 64 bits is seen in specialized processors. Other values like 18, 24, 36, etc. have occurred on historic platforms or are at least theoretically possible. Modern coding rarely worries about non-power-of-2 int bit sizes.
The computer's processor and architecture drive the int bit size selection.
Yet even with 64-bit processors, the compiler's int size may be 32-bit for compatibility reasons as large code bases depend on int being 32-bit (or 32/16).

This is a good source for answering this question.
But this question is a kind of a always truth answere "Yes. Both."
It depends on your architecture. If you're going to work on a 16-bit machine or less, it can't be 4 byte (=32 bit). If you're working on a 32-bit or better machine, its length is 32-bit.
To figure out, get you program ready to output something readable and use the "sizeof" function. That returns the size in bytes of your declared datatype. But be carfull using this with arrays.
If you're declaring int t[12]; it will return 12*4 byte. To get the length of this array, just use sizeof(t)/sizeof(t[0]).
If you are going to build up a function, that should calculate the size of a send array, remember that if
typedef int array[12];
int function(array t){
int size_of_t = sizeof(t)/sizeof(t[0]);
return size_of_t;
}
void main(){
array t = {1,1,1}; //remember: t= [1,1,1,0,...,0]
int a = function(t); //remember: sending t is just a pointer and equal to int* t
print(a); // output will be 1, since t will be interpreted as an int itselve.
}
So this won't even return something different. If you define an array and try to get the length afterwards, use sizeof. If you send an array to a function, remember the send value is just a pointer on the first element. But in case one, you always knows, what size your array has. Case two can be figured out by defining two functions and miss some performance. Define function(array t) and define function2(array t, int size_of_t). Call "function(t)" measure the length by some copy-work and send the result to function2, where you can do whatever you want on variable array-sizes.

Padding bits in unsigned integers and bitwise operations in C89

I have a lot of code that performs bitwise operations on unsigned integers. I wrote my code with the assumption that those operations were on integers of fixed width without any padding bits. For example an array of 32-bit unsigned integers of which all 32 bits available for each integer.
I'm looking to make my code more portable and I'm focused on making sure I'm C89 compliant (in this case). One of the issues that I've come across is possible padded integers. Take this extreme example, taken from the GMP manual:
However on Cray vector systems it may be noted that short and int are always stored in 8 bytes (and with sizeof indicating that) but use only 32 or 46 bits. The nails feature can account for this, by passing for instance 8*sizeof(int)-INT_BIT.
I've also read about this type of padding in other places. I actually read of a post on SO last night (forgive me, I don't have the link and I'm going to cite something similar from memory) where if you have, say, a double with 60 usable bits the other 4 could be used for padding and those padding bits could serve some internal purpose so they cannot be modified.
So let's say for example my code is compiled on a platform where an unsigned int type is sized at 4 bytes, each byte being 8 bits, however the most significant 2 bits are padding bits. Would UINT_MAX in that case be 0x3FFFFFFF (1073741823)?
#include <stdio.h>
#include <stdlib.h>
/* padding bits represented by underscores */
int main( int argc, char **argv )
{
unsigned int a = 0x2AAAAAAA; /* __101010101010101010101010101010 */
unsigned int b = 0x15555555; /* __010101010101010101010101010101 */
unsigned int c = a ^ b; /* ?? __111111111111111111111111111111 */
unsigned int d = c << 5; /* ?? __111111111111111111111111100000 */
unsigned int e = d >> 5; /* ?? __000001111111111111111111111111 */
printf( "a: %X\nb: %X\nc: %X\nd: %X\ne: %X\n", a, b, c, d, e );
return 0;
}
Is it safe to XOR two integers with padding bits?
Wouldn't I XOR whatever the padding bits are?
I can't find this behavior covered in C89.
Furthermore is the c variable guaranteed to be 0x3FFFFFFF or if for example the two padding bits were both on in a or b would c be 0xFFFFFFFF?
Same question with d and e. Am I manipulating the padding bits by shifting?
I would expect to see this below, assuming 32 bits with the 2 most significant bits used for padding, but I want to know if something like this is guaranteed:
a: 2AAAAAAA
b: 15555555
c: 3FFFFFFF
d: 3FFFFFE0
e: 01FFFFFF
Also are padding bits always the most significant bits or could they be the least significant bits?
EDIT 12/19/2010 5PM EST: Christoph has answered my question. Thanks!
I had also asked (above) whether padding bits are always the most significant bits. This is cited in the rationale for the C99 standard, and the answer is no. I am playing it safe and assuming the same for C89. Here is specifically what the C99 rationale says for §6.2.6.2 (Representation of Integer Types):
Padding bits are user-accessible in an unsigned integer type. For example, suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then that padding bit is visible to the user’s program. The C committee was told that there is a machine that works this way, and that is one reason that padding bits were added to C99.
Footnotes 44 and 45 mention that parity bits might be padding bits. The committee does not know of any machines with user-accessible parity bits within an integer. Therefore, the committee is not aware of any machines that treat parity bits as padding bits.
EDIT 12/28/2010 3PM EST: I found an interesting discussion on comp.lang.c from a few months ago.
Bitwise Operator Effects on Padding Bits (VelocityReviews reader)
Bitwise Operator Effects on Padding Bits (Google Groups alternate link)
One point made by Dietmar which I found interesting:
Let's note that padding bits are not necessary for the existence of trap representations; combinations of value bits which do not represent a value of the object type would also do.

Bitwise operations (like arithmetic operations) operate on values and ignore padding. The implementation may or may not modify padding bits (or use them internally, eg as parity bits), but portable C code will never be able to detect this. Any value (including UINT_MAX) will not include the padding.
Where integer padding might lead to problems on is if you use things like sizeof (int) * CHAR_BIT and then try to use shifts to access all these bits. If you want to be portable, either only use (unsigned) char, fixed-sized integers (a C99 addition) or determine the number of value-bits programatically. This can be done at compile-time with the preprocessor by comparing UINT_MAX against powers of 2 or at runtime by using bit-operations.
edit:
C90 does not mention integer padding at all, but as far as I can tell, 'invisible' preceding or trailing integer padding bits shouldn't violate the standard (I didn't go through all relevant sections to make sure this is really the case, though); there probaby are problems with mixed padding and value bits as mentioned in the C99 rationale because otherwise, the standard would not have needed to be changed.
As to the meaning of user-accessible: Padding bits are accessible insofar as you can alwaye get at any bit of foo (including padding) by using bit-operations on ((unsigned char *)&foo)[…]. Be careful when modifying the padding bits, though: the result won't change the value of the integer, but might create be a trap-representation nevertheless. In case of C90, this is implicitly unspecified (as in not mentioned at all), in case of C99, it's implementation-defined.
This was not what the rationale quotation was about, though: the cited architecture represents 32-bit integers via two 16-bit integers. In case of unsigned types, the resulting integer has 32 value bits and a precision of 32; in case of signed integers, it only has 31 value bits and a precision of 30: one of the sign bits of the 16-bit integers is used as the sign bit of the 32-bit integer, the other one is ignored, thus creating a padding bit surrounded by value bits. Now, if you access a 32-bit signed integer as an unsigned integer (which is explicitly allowed and does not violate the C99 aliasing rules), the padding bit becomes a (user-accessible) value bit.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight