XChangeProperty uses long for 32-bits - c

The pid variable that is passed to XChangeProperty() is not a long.
The libX11 code deferences the variable as a long and on a 64-bit sparc
this must be aligned on a 8-byte boundary.
Because it is an int, it gets aligned on a 4-byte boundary, causing a
bus error.
pid_t pid = getpid();
XChangeProperty( display, wm_window, net_wm_pid, cardinal, 32,
PropModeReplace,
(const unsigned char*) &pid, 1 );
Its only necessary casts to 'long' or check for max value?
From XGetWindowProperty(3) manual page
format
Specifies whether the data should be viewed as a list of
8-bit, 16-bit, or 32-bit quantities. Possible values are
8,
16, and 32. This information allows the X server to cor‐
rectly perform byte-swap operations as necessary. If the
format is 16-bit or 32-bit, you must explicitly cast your
data pointer to an (unsigned char *) in the call to
XChange‐
Property.
pid_t pid = getpid ();
if (pid <= 0xFFFFFFFFU) {
unsigned long xpid = pid;
XChangeProperty( display, wm_window, net_wm_pid, cardinal, 32,
PropModeReplace,
(const unsigned char*) &xpid, 1 );
}

You have to pass a long to XChangeProperty for 32 bit properties you're setting.
So you need to do
unsigned long xpid = pid;
If your pid_t is 32 bits or less, there's no need to check if it's <= 0xFFFFFFFFU, it will always be true.
If you want your code to be portable to systems that have pid_t values > 32 bits - (though I don't know of any system that has) then you need that check since the _NET_WM_PID property is defined to be 32 bits.
The underlying issue here is that libX11 will dereference the property as a long, but only extract 32 bit from it, so from what I can tell _NET_WM_PID can't be used on a system if the pid_t value is > 32 bits.
Now, the reason for all this is that X11 was made in a time that where long was 32 bits on all systems it needed to run on, and has been kept that way to not break its API and ABI, I'll quote from this post
You have to separate the C API from the underlying X11 objects. The
objects are 32-bit (or 16 or 8), and will always remain so. These
have to be mapped to the language (C) somehow. The types picked
matched up nicely for a few decades, but unfortunately not anymore.
So rather than backpeddling and saying "we didn't really mean long,
we meant whatever-type-is-closest-to-32-bit-right-now", they kept
long. Anything else would probably have meant a very painful
transition period.
That means that everything in the X11 API that deals with 32-bit
objects will use 64-bit variables with half the space wasted.

Related

C pread gives different results

I have two systems. One is Ubuntu 14.04 64bit on a Intel CPU, the other one is Ubuntu 14.04 for ARM on a CubieTruck.
The Intel system has a data file stored on a ext4 formatted HDD. The CubieTruck has the same file on a NTFS HDD, which is mounted with NTFS-3G.
I currently have a problem with pread() on those systems. I read a bunch of bytes from a file, and print out the first 64 byte from this chunk. Later, these bytes are used to calculate some hash using Shabal.
While the data printed on the CubieTruck matches exactly what I see on a Windows system when opening the file with a Hex-editor, the output on the 64bit Ubuntu is different. It looks like it is filled with "FFFFFF", but also different in general. Even more strange is, that while output on the CubieTruck always stays the same, it changes on the 64bit Ubuntu system after a while (I haven't seen a pattern when that happens, I just check from time to time).
But the most annoying thing is, that the x64 system seems to calculate correctly, while the ARM system is wrong.
I have no idea why pread delivers different results for the same file under those systems, but I hope someone can shed some light into it.
edit, the code:
int main(int argc, char **argv) {
unsigned int readsize = 16384 * 32 * 2;
char *cache = (char*) malloc(readsize);
int fh = open("/home/user/somefile", O_RDONLY);
if (fh < 0) {
printf("can't open file");
exit(-1);
}
int bytes = 0, b;
do {
b = pread(fh, &cache[bytes], readsize - bytes, bytes);
bytes += b;
} while(bytes < readsize && b > 0);
int i = 0;
for (i=0; i < 64; i++) {
printf("%02X", cache[i]);
}
close(fh);
free(cache);
return 0;
}
both systems are opening the exact same file.
result on x64:
FFFFFF94FFFFFFF16D25FFFFFFC0FFFFFFA3367D010BFFFFFFEF1E12FFFFFF841CFFFFFFBE4C26FFFFFF92FFFFFF80FFFFFF86FFFFFFA822FFFFFF8A26FFFFFF906CFFFFFFAD05FFFFFFE7FFFFFFB124FFFFFFA8FFFFFFF77B16FFFFFFEAFFFFFFACFFFFFF9DFFFFFF9EFFFFFF81FFFFFFC7FFFFFF92FFFFFFCDFFFFFFB0FFFFFFE86270FFFFFFF974FFFFFFA8420C45FFFFFFFC04FFFFFFF9103F2E3A47FFFFFF990F
result on ARM:
94F16D25C0A3367D010BEF1E12841CBE4C26928086A8228A26906CAD05E7B124A8F77B16EAAC9D9E81C792CDB0E86270F974A8420C45FC04F9103F2E3A47990F
You can see, on x64, the result is filled with "FFFFFF", and it appears, that this is somehow needed later on. But I don't get why it's different on my systems.
One of the fun aspects of C is the amount of implementation-defined behaviour - in this case, whether char is signed or not.
The %x format specifier takes an unsigned int argument, so in the ARM case is the conversion is straightforward - char is unsigned so just gets zero-extended to unsigned int. However for x86 where it's signed, the converion can go one of two ways:
sign-extend the char to a signed int, then cast it to unsigned int
first cast to unsigned char, then zero-extend to unisgned int
It appears the char->int part of the conversion takes precedence over the signed->unsigned part* so you get the former (note how the bytes without the top bit set are unambiguous and print the same on both implementations). I imagine your calculation does a similar conversion somewhere expecting signedness, hence why it breaks on ARM.
In short, if you're dealing with char-sized values rather than characters, always specify signed char or unsigned char as appropriate, never bare char.
* I suppose I could dig out the standard to check if that's actually specified, but at this point it's merely a trivial detail

When to use bit-fields in C

On the question 'why do we need to use bit-fields?', searching on Google I found that bit fields are used for flags.
Now I am curious,
Is it the only way bit-fields are used practically?
Do we need to use bit fields to save space?
A way of defining bit field from the book:
struct {
unsigned int is_keyword : 1;
unsigned int is_extern : 1;
unsigned int is_static : 1;
} flags;
Why do we use int?
How much space is occupied?
I am confused why we are using int, but not short or something smaller than an int.
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
A quite good resource is Bit Fields in C.
The basic reason is to reduce the used size. For example, if you write:
struct {
unsigned int is_keyword;
unsigned int is_extern;
unsigned int is_static;
} flags;
You will use at least 3 * sizeof(unsigned int) or 12 bytes to represent three small flags, that should only need three bits.
So if you write:
struct {
unsigned int is_keyword : 1;
unsigned int is_extern : 1;
unsigned int is_static : 1;
} flags;
This uses up the same space as one unsigned int, so 4 bytes. You can throw 32 one-bit fields into the struct before it needs more space.
This is sort of equivalent to the classical home brew bit field:
#define IS_KEYWORD 0x01
#define IS_EXTERN 0x02
#define IS_STATIC 0x04
unsigned int flags;
But the bit field syntax is cleaner. Compare:
if (flags.is_keyword)
against:
if (flags & IS_KEYWORD)
And it is obviously less error-prone.
Now I am curious, [are flags] the only way bitfields are used practically?
No, flags are not the only way bitfields are used. They can also be used to store values larger than one bit, although flags are more common. For instance:
typedef enum {
NORTH = 0,
EAST = 1,
SOUTH = 2,
WEST = 3
} directionValues;
struct {
unsigned int alice_dir : 2;
unsigned int bob_dir : 2;
} directions;
Do we need to use bitfields to save space?
Bitfields do save space. They also allow an easier way to set values that aren't byte-aligned. Rather than bit-shifting and using bitwise operations, we can use the same syntax as setting fields in a struct. This improves readability. With a bitfield, you could write
directions.alice_dir = WEST;
directions.bob_dir = SOUTH;
However, to store multiple independent values in the space of one int (or other type) without bitfields, you would need to write something like:
#define ALICE_OFFSET 0
#define BOB_OFFSET 2
directions &= ~(3<<ALICE_OFFSET); // clear Alice's bits
directions |= WEST<<ALICE_OFFSET; // set Alice's bits to WEST
directions &= ~(3<<BOB_OFFSET); // clear Bob's bits
directions |= SOUTH<<BOB_OFFSET; // set Bob's bits to SOUTH
The improved readability of bitfields is arguably more important than saving a few bytes here and there.
Why do we use int? How much space is occupied?
The space of an entire int is occupied. We use int because in many cases, it doesn't really matter. If, for a single value, you use 4 bytes instead of 1 or 2, your user probably won't notice. For some platforms, size does matter more, and you can use other data types which take up less space (char, short, uint8_t, etc.).
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
No, that is not correct. The entire unsigned int will exist, even if you're only using 8 of its bits.
Another place where bitfields are common are hardware registers. If you have a 32 bit register where each bit has a certain meaning, you can elegantly describe it with a bitfield.
Such a bitfield is inherently platform-specific. Portability does not matter in this case.
We use bit fields mostly (though not exclusively) for flag structures - bytes or words (or possibly larger things) in which we try to pack tiny (often 2-state) pieces of (often related) information.
In these scenarios, bit fields are used because they correctly model the problem we're solving: what we're dealing with is not really an 8-bit (or 16-bit or 24-bit or 32-bit) number, but rather a collection of 8 (or 16 or 24 or 32) related, but distinct pieces of information.
The problems we solve using bit fields are problems where "packing" the information tightly has measurable benefits and/or "unpacking" the information doesn't have a penalty. For example, if you're exposing 1 byte through 8 pins and the bits from each pin go through their own bus that's already printed on the board so that it leads exactly where it's supposed to, then a bit field is ideal. The benefit in "packing" the data is that it can be sent in one go (which is useful if the frequency of the bus is limited and our operation relies on frequency of its execution), and the penalty of "unpacking" the data is non-existent (or existent but worth it).
On the other hand, we don't use bit fields for booleans in other cases like normal program flow control, because of the way computer architectures usually work. Most common CPUs don't like fetching one bit from memory - they like to fetch bytes or integers. They also don't like to process bits - their instructions often operate on larger things like integers, words, memory addresses, etc.
So, when you try to operate on bits, it's up to you or the compiler (depending on what language you're writing in) to write out additional operations that perform bit masking and strip the structure of everything but the information you actually want to operate on. If there are no benefits in "packing" the information (and in most cases, there aren't), then using bit fields for booleans would only introduce overhead and noise in your code.
To answer the original question »When to use bit-fields in C?« … according to the book "Write Portable Code" by Brian Hook (ISBN 1-59327-056-9, I read the German edition ISBN 3-937514-19-8) and to personal experience:
Never use the bitfield idiom of the C language, but do it by yourself.
A lot of implementation details are compiler-specific, especially in combination with unions and things are not guaranteed over different compilers and different endianness. If there's only a tiny chance your code has to be portable and will be compiled for different architectures and/or with different compilers, don't use it.
We had this case when porting code from a little-endian microcontroller with some proprietary compiler to another big-endian microcontroller with GCC, and it was not fun. :-/
This is how I have used flags (host byte order ;-) ) since then:
# define SOME_FLAG (1 << 0)
# define SOME_OTHER_FLAG (1 << 1)
# define AND_ANOTHER_FLAG (1 << 2)
/* test flag */
if ( someint & SOME_FLAG ) {
/* do this */
}
/* set flag */
someint |= SOME_FLAG;
/* clear flag */
someint &= ~SOME_FLAG;
No need for a union with the int type and some bitfield struct then. If you read lots of embedded code those test, set, and clear patterns will become common, and you spot them easily in your code.
Why do we need to use bit-fields?
When you want to store some data which can be stored in less than one byte, those kind of data can be coupled in a structure using bit fields.
In the embedded word, when one 32 bit world of any register has different meaning for different word then you can also use bit fields to make them more readable.
I found that bit fields are used for flags. Now I am curious, is it the only way bit-fields are used practically?
No, this not the only way. You can use it in other ways too.
Do we need to use bit fields to save space?
Yes.
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
No. Memory only can be occupied in multiple of bytes.
Bit fields can be used for saving memory space (but using bit fields for this purpose is rare). It is used where there is a memory constraint, e.g., while programming in embedded systems.
But this should be used only if extremely required because we cannot have the address of a bit field, so address operator & cannot be used with them.
A good usage would be to implement a chunk to translate to—and from—Base64 or any unaligned data structure.
struct {
unsigned int e1:6;
unsigned int e2:6;
unsigned int e3:6;
unsigned int e4:6;
} base64enc; // I don't know if declaring a 4-byte array will have the same effect.
struct {
unsigned char d1;
unsigned char d2;
unsigned char d3;
} base64dec;
union base64chunk {
struct base64enc enc;
struct base64dec dec;
};
base64chunk b64c;
// You can assign three characters to b64c.enc, and get four 0-63 codes from b64dec instantly.
This example is a bit naive, since Base64 must also consider null-termination (i.e. a string which has not a length l so that l % 3 is 0). But works as a sample of accessing unaligned data structures.
Another example: Using this feature to break a TCP packet header into its components (or other network protocol packet header you want to discuss), although it is a more advanced and less end-user example. In general: this is useful regarding PC internals, SO, drivers, an encoding systems.
Another example: analyzing a float number.
struct _FP32 {
unsigned int sign:1;
unsigned int exponent:8;
unsigned int mantissa:23;
}
union FP32_t {
_FP32 parts;
float number;
}
(Disclaimer: Don't know the file name / type name where this is applied, but in C this is declared in a header; Don't know how can this be done for 64-bit floating-point numbers since the mantissa must have 52 bits and—in a 32 bit target—ints have 32 bits).
Conclusion: As the concept and these examples show, this is a rarely used feature because it's mostly for internal purposes, and not for day-by-day software.
To answer the parts of the question no one else answered:
Ints, not Shorts
The reason to use ints rather than shorts, etc. is that in most cases no space will be saved by doing so.
Modern computers have a 32 or 64 bit architecture and that 32 or 64 bits will be needed even if you use a smaller storage type such as a short.
The smaller types are only useful for saving memory if you can pack them together (for example a short array may use less memory than an int array as the shorts can be packed together tighter in the array). For most cases, when using bitfields, this is not the case.
Other uses
Bitfields are most commonly used for flags, but there are other things they are used for. For example, one way to represent a chess board used in a lot of chess algorithms is to use a 64 bit integer to represent the board (8*8 pixels) and set flags in that integer to give the position of all the white pawns. Another integer shows all the black pawns, etc.
You can use them to expand the number of unsigned types that wrap. Ordinary you would have only powers of 8,16,32,64... , but you can have every power with bit-fields.
struct a
{
unsigned int b : 3 ;
} ;
struct a w = { 0 } ;
while( 1 )
{
printf("%u\n" , w.b++ ) ;
getchar() ;
}
To utilize the memory space, we can use bit fields.
As far as I know, in real-world programming, if we require, we can use Booleans instead of declaring it as integers and then making bit field.
If they are also values we use often, not only do we save space, we can also gain performance since we do not need to pollute the caches.
However, caching is also the danger in using bit fields since concurrent reads and writes to different bits will cause a data race and updates to completely separate bits might overwrite new values with old values...
Bitfields are much more compact and that is an advantage.
But don't forget packed structures are slower than normal structures. They are also more difficult to construct since the programmer must define the number of bits to use for each field. This is a disadvantage.
Why do we use int? How much space is occupied?
One answer to this question that I haven't seen mentioned in any of the other answers, is that the C standard guarantees support for int. Specifically:
A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation defined type.
It is common for compilers to allow additional bit-field types, but not required. If you're really concerned about portability, int is the best choice.
Nowadays, microcontrollers (MCUs) have peripherals, such as I/O ports, ADCs, DACs, onboard the chip along with the processor.
Before MCUs became available with the needed peripherals, we would access some of our hardware by connecting to the buffered address and data buses of the microprocessor. A pointer would be set to the memory address of the device and if the device saw its address along with the R/W signal and maybe a chip select, it would be accessed.
Oftentimes we would want to access individual or small groups of bits on the device.
In our project, we used this to extract a page table entry and page directory entry from a given memory address:
union VADDRESS {
struct {
ULONG64 BlockOffset : 16;
ULONG64 PteIndex : 14;
ULONG64 PdeIndex : 14;
ULONG64 ReservedMBZ : (64 - (16 + 14 + 14));
};
ULONG64 AsULONG64;
};
Now suppose, we have an address:
union VADDRESS tempAddress;
tempAddress.AsULONG64 = 0x1234567887654321;
Now we can access PTE and PDE from this address:
cout << tempAddress.PteIndex;

Why is size_t better?

The title is actually a bit misleading, but I wanted to keep it short. I've read about why I should use size_t and I often found statements like this:
size_t is guaranteed to be able to express the maximum size of any object, including any array
I don't really understand what that means. Is there some kind of cap on how much memory you can allocate at once and size_t is guaranteed to be large enough to count every byte in that memory block?
Follow-up question:
What determines how much memory can be allocated?
Let's say the biggest object your compiler/platform can have is 4 gb. size_t then is 32 bit. Now let's say you recompile your program on a 64 bit platform able to support objects of size 2^43 - 1. size_t will be at least 43 bit long (but normally it will be 64 bit at this point). The point is that you only have to recompile the program. You don't have to change all your ints to long (if int is 32 bit and long is 64 bit) or from int32_t to int64_t.
(if you are asking yourself why 43 bit, let's say that Windows Server 2008 R2 64bit doesn't support objects of size 2^63 nor objects of size 2^62... It supports 8 TB of addressable space... So 43 bit!)
Many programs written for Windows considered a pointer to be as much big as a DWORD (a 32 bit unsigned integer). These programs can't be recompiled on 64 bit without rewriting large swats of code. Had they used DWORD_PTR (an unsigned value guaranteed to be as much big as necessary to contain a pointer) they wouldn't have had this problem.
The size_t "point" is the similar. but different!
size_t isn't guaranteed to be able to contain a pointer!!
(the DWORD_PTR of Microsoft Windows is)
This, in general, is illegal:
void *p = ...
size_t p2 = (size_t)p;
For example, on the old DOS "platform", the maximum size of an object was 64k, so size_t needed to be 16 bit BUT a far pointer needed to be at least 20 bit, because the 8086 had a memory space of 1 mb (in the end a far pointer was 16 + 16 bit, because the memory of an 8086 was segmented)
Basically it means that size_t, is guaranteed to be large enough to index any array and get the size of any data type.
It is preferred over using just int, because the size of int and other integer types can be smaller than what can be indexed. For example int is usually 32-bits long which is not enough to index large arrays on 64-bit machines. (This is actually a very common problem when porting programs to 64-bit.)
That is exactly the reason.
The maximum size of any object in a given programming language is determined by a combination of the OS, the CPU architecture and the compiler/linker in use.
size_t is defined to be big enough to hold the size value of the largest possible object.
This usually means that size_t is typedef'ed to be the same as the largest int type available.
So on a 32 bit environment it would typically be 4 bytes and in a 64 bit system 8 bytes.
size_t is defined for the platform that you are compiling for. Hence it can represent the maximum for that platform.
size_t is the return of the sizeof operator (see 7.17 c99) therefore it must describe the largest possible object the system can represent.
Have a look at
http://en.wikipedia.org/wiki/Size_t

int v/s. long in C

On my system, I get:
sizeof ( int ) = 4
sizeof ( long ) = 4
When I checked with a C program, both int & long overflowed to the negative after:
a = 2147483647;
a++;
If both can represent the same range of numbers, why would I ever use the long keyword?
int has a minimum range of -32767 to 32767, whereas long has a minimum range of -2147483647 to 2147483647.
If you are writing portable code that may have to compile on different C implementations, then you should use long if you need that range. If you're only writing non-portable code for one specific implementation, then you're right - it doesn't matter.
Because sizeof(int)==sizeof(long) isn't always true. int normaly represents the fastest size with at least 2*8 Bit. long on the other hand is at least 4*8 Bit.
C defines a number of integer types and specifies the relation of their sizes. Basically, what it says is that sizeof(long long) >= sizeof(long) >= sizeof(int) >= sizeof(short) >= sizeof(char), and that sizeof(char) == 1.
But the actual sizes are not defined, and depend on the architecture you are running on. On a 32-bit PC, int and long are typically four bytes and long long is 8 bytes. But on a 64-bit system, long is typically 8 bytes, and thus different from int.
There is also a type called uintptr_t (and intptr_t) that is guaranteed to have the same size as data pointers.
The important thing to remember is to not assume that you can, for example, store pointer values in a long or an int. Being portable is probably more important than you think, and it is likely that you will want to compile your code on a 64-bit system in the near future.
I think it's more of a compiler issue nowadays, since computers has gone much faster and demands more numbers, as was the case before.
On different platform or with a different compiler, the int and long may be different.
If you don't plan to port your code to anything else or use a different machine, then pick the one you want, it won't make a difference.
It depends on the compiler, and you might want to check this out: What does the C++ standard state the size of int, long type to be?
The size of built-in data types is variable depending on the C implementation, but they all have minimum ranges. Nowadays, int is typically 4 bytes long (32-bits) because most OS are 32-bit. Note that char will always be 1 bytes.
The size of a data type depends upon the compiler. Different compilers have diffrent size of int and other data types.
So if you make a code which is going to run on diffrent machine you should use long or it is depend on the range of the value tha t ur variable may have.

does 8-bit processor have to face endianness problem?

If I have a int32 type integer in the 8-bit processor's memory, say, 8051, how could I identify the endianess of that integer? Is it compiler specific? I think this is important when sending multybyte data through serial lines etc.
With an 8 bit microcontroller that has no native support for wider integers, the endianness of integers stored in memory is indeed up to the compiler writer.
The SDCC compiler, which is widely used on 8051, stores integers in little-endian format (the user guide for that compiler claims that it is more efficient on that architecture, due to the presence of an instruction for incrementing a data pointer but not one for decrementing).
If the processor has any operations that act on multi-byte values, or has an multi-byte registers, it has the possibility to have an endian-ness.
http://69.41.174.64/forum/printable.phtml?id=14233&thread=14207 suggests that the 8051 mixes different endian-ness in different places.
The endianness is specific to the CPU architecture. Since a compiler needs to target a particular CPU, the compiler would have knowledge of the endianness as well. So if you need to send data over a serial connection, network, etc you may wish to use build-in functions to put data in network byte order - especially if your code needs to support multiple architectures.
For more information, see: http://www.gnu.org/s/libc/manual/html_node/Byte-Order.html
It's not just up to the compiler - '51 has some native 16-bit registers (DPTR, PC in standard, ADC_IN, DAC_OUT and such in variants) of given endianness which the compiler has to obey - but outside of that, the compiler is free to use any endianness it prefers or one you choose in project configuration...
An integer does not have endianness in it. You can't determine just from looking at the bytes whether it's big or little endian. You just have to know: For example if your 8 bit processor is little endian and you're receiving a message that you know to be big endian (because, for example, the field bus system defines big endian), you have to convert values of more than 8 bits. You'll need to either hard-code that or to have some definition on the system on which bytes to swap.
Note that swapping bytes is the easy thing. You may also have to swap bits in bit fields, since the order of bits in bit fields is compiler-specific. Again, you basically have to know this at build time.
unsigned long int x = 1;
unsigned char *px = (unsigned char *) &x;
*px == 0 ? "big endian" : "little endian"
If x is assigned the value 1 then the value 1 will be in the least significant byte.
If we then cast x to be a pointer to bytes, the pointer will point to the lowest memory location of x. If that memory location is 0 it is big endian, otherwise it is little endian.
#include <stdio.h>
union foo {
int as_int;
char as_bytes[sizeof(int)];
};
int main() {
union foo data;
int i;
for (i = 0; i < sizeof(int); ++i) {
data.as_bytes[i] = 1 + i;
}
printf ("%0x\n", data.as_int);
return 0;
}
Interpreting the output is up to you.

Resources