Is it a bad practice to use uint64_t in this context? - c

I've been playing with C sockets recently, I managed to exchange files between a client and server. However I stumbled upon this problem: when sending the file size between my mac (64 bit) and a raspberry pi (32 bit), it fails since size_t is different between the two. I solved by switching to uint64_t.
I'm wondering, is this a bad practice to use it in place of size_t, which is defined in all prototypes of fread(), fwrite(), read(), write(), stat.size?
Is uint64_t going to be slower on the raspberry pi?

This is not only good practice, but ultimately a necessity. You can't exchange data between different computers of different architectures, without defining the format and size of your data, and coming up with portable ways to interpret it. These fixed-width types are literally designed for the purpose.
Will it be slower to use a uint64_t than a uint32_t on a 32-bit platform? Probably, yes. Noticeably? Doubt it. But you can measure it and find out.
Don't forget to account for differences in endianness, too.

Related

Define size of long and time_t to 4bytes

I want to synchronize two Raspberry Pi's with a C program. It is working fine, if the program only is running on the Pi's, but for development I want to use my PC (where its also easier to debug), but I send the timespec struct directly as binary over the wire. A raspberry is using 4bytes for long and time_t, my PC is using 8byte each... So they do not come together.
Is it possible to set long and time_t to 4byte each, only for this C script?
I know that the size of long, short, etc. is defined by the system.
Important: I only want to define it once in the script and not transforming it to uintXX or int each time.
In programming, it is not uncommon to need to treat network transmissions as separate from in-memory handling. In fact, it is pretty much the norm. So converting it to a network format of the proper byte order and size is really recommended and while help with the abstractions for your interfaces.
You might as well consider transforming to plain text, if that is not a time-critical piece of data exchange. It makes for a lot easier debugging.
C is probably not the best tool for the job here. It's much too low level to provide automatic data serialization like JavaScript, Python or similar more abstract languages.
You cannot assume the definitions of timespec will be identical on different platforms. For one thing the size of an int will be different depending on the 32/64 bits architecture, and you can have endianing problems too.
When you want to exchange data structures between heterogeneous platforms, you need to define your own protocol with unambiguous data and a clear endianing convention.
One solution would be to send the numbers as ASCII. Not terribly efficient, but if it's just a couple of values, who cares?
Another would be to create an exchange structure with (u)intXX_t fields.
You can assume a standard raspberry kernel will be little endian like your PC, but if you're writing a small exchange protocol, you might as well add a couple of htonl/ntohl for good measure.

File and networking portability among different byte sizes

In C, the fread function is like this:
size_t fread(void *buf, size_t max, FILE *file);
Usually char* arrays are used as buf. People usually assume that char = 8 bit. But what if it isn't true? What happens if files written in 8 bit byte systems are read on 10 bit byte systems? Is there any single standard on portability of files and network streams between systems with bytes of different size? And most importantly, how to write portable code in this regard?
With regard to network communications, the physical access protocols (like ethernet) define how many bits there go in a "unit of information" and it is up to the implementation to map this to an appropriate type. So, for network communications there is no problem with supporting weird architectures.
For file access, stuff gets more interesting if you want to support weird architectures, because there are no standards to refer to and even the method of putting the files on the system may influence how you can access them.
Fortunately, the only systems currently in use that don't support 8-bit bytes are DSP's and similar small embedded systems that don't support a filesystem at all, so the issue is essentially moot.
Systems with bit sizes other than 8 is pretty rare these days. But there are machines with other sizes, and files are not guaranteed to be portable to those machines.
If uberportability is required, then you will have to have some sort of encoding in your file that copes with char != 8 bits.
Do you have something in mind where this may have to run on a DEC 10 or really old IBM mainframes, in DSP's or some such, or are you just asking for the purpose of "I want to know". If the latter, I would just "ignore the case". It is pretty special machines that don't have 8-bit characters - and you most likely will have OTHER problems than bits per char to use your "files" on the system then - like how to get the file there in the first place, as you probably can't plug in a USB stick or transfer it with FTP (although the latter is perhaps the most likely one)

data type ranges differing with operating systems

The 8-bit,16-bit,32-bit,64-bit operating systems have different data range for
integers,float and double values.
Is this the compiler or the processor that makes difference(8bit,16bit,32bit,64bit).
If in a network if a 16 bit integer data from one system is transferred to a 32 bit system
or vice-versa will the data be correctly represented in memory.Please help me to understand.
Ultimately, it is up to the compiler. The compiler is free to choose any data types it likes*, even if it has to emulate their behaviour with software routines. Of course, typically, for efficiency it will try to replicate the native types of the underlying hardware.
As to your second question, yes, of course, if you transfer the raw representation from one architecture to another, it may be interpreted incorrectly (endianness is another issue). That is why functions like ntohs() are used.
* Well, not literally anything it likes. The C standard places some constraints, such as that an int must be at least as large as a short.
The compiler (more properly the "implementation") is free to choose the sizes, subject to the limits in the C standard. The set of sizes offered by C for its various types depends in part on the hardware it runs on; i.e. the compiler makes the choice but, it (except in cases like Java where datatypes are explicitly independent of underlying hardware) is strongly influenced by what the hardware offers.
It depends not on just the compiler and operating system. It is dictated by the architecture (processor at least).
When passing data between possibly different architectures they use special fixed size data type, e.g. uint64_t, uint32_t instead of int, short etc.
But the size of integers is not the only concern when communicating between computers with different architectures, there's a byte order issue too (try googling about BigEndian and LittleEndian)
The size of a given type depends on the CPU and the conventions on the operating system.
If you want to have an int of a specific size, use the stdint.h header [wikipedia]. It defines the int8_t, int16_t, int32_t, int64_t, some others and their unsigned equivalent.
For communications between different computers, the protocol should define the sizes and byte order to use.
In a network it has to be defined in the protocol which data sizes you have. For endianness, it is highly recommended to use big endian values.
If there weren't the issue with the APIs, a compiler would be free to set its short, int, long as it wants. Bot often, the API calls are connected to these types. E.g. the open() function returns an int, whose size should be correct.
But the types might as well be part of the ABI definition.

Endian dependent code in real application?

I know the following C code is endian-dependent:
short s_endian = 0x4142;
char c_endian = *(char *)&s_endian;
On a big-endian machine, c_endian will be 'A'(0x41); while on a little-endian machine, it will be 'B'(0x42).
But this code seems kind of ugly. So is there endian dependent code in real applications? Or have you came across any application that needs a lot of changes when porting to a different target with a different endian?
Thanks.
Pretty much any code that deals with saving integers with more than 8 bits in binary format, or sends such integers over the network. For one extremely common example, many of the fields in the TCP header fall into this category.
Networking code is endian dependent (it should always transfer across the network as big-endian, even on a little-endian machine), hence the need for functions like htons(), htonl(), ntohs(), and ntohl() in net/hton.h that allow easy conversions from host-to-network byte-order and network-to-host byte-order.
Hope this helps,
Jason
I once collected data using a specialized DAQ card on a PC, and tried to analyze the file on a PowerPC mac. Turns out the "file format" the thing used was a raw memory dump...
Little endian on x86, big endian on Power PC. You figure it out.
The short answer is yes. Anything that reads/writes raw binary to a file or socket needs to keep track of the endianness of the data.
For example, the IP protocol requires big-endian representation.
When manipulating the internal representation of floating-point numbers, you could access the parts (or the full value) using an integer type. For example:
union float_u
{
float f;
unsigned short v[2];
};
int get_sign(float f)
{
union float_u u;
u.f = f;
return (u.v[0] & 0x8000) != 0; // Endian-dependant
}
If your program sends data to another system (either over a serial or network link, or by saving it to a file for something else to read) or reads data from another system, then you can have endianness issues.
I don't know that static analysis would be able to detect such constructs, but having your programmers follow a coding standard, where structure elements and variables were marked up to indicate their endianness could help.
For example, if all network data structures had _be appended to the named of multi-byte members, you could look for instances where you assigned a non-suffixed (host byte order) variable or even a literal value (like 0x1234) to one of those members.
It would be great if we could capture endianness in our datatypes -- uint32_be and uint32_le to go with uint32_t. Then the compiler could disallow assignments or operations between the two. And the signature for htobe32 would be uint32_be htobe32( uint32_t n);.

When to worry about endianness?

I have seen countless references about endianness and what it means. I got no problems about that...
However, my coding project is a simple game to run on linux and windows, on standard "gamer" hardware.
Do I need to worry about endianness in this case? When should I need to worry about it?
My code is simple C and SDL+GL, the only complex data are basic media files (png+wav+xm) and the game data is mostly strings, integer booleans (for flags and such) and static-sized arrays. So far no user has had issues, so I am wondering if adding checks is necessary (will be done later, but there are more urgent issues IMO).
The times when you need to worry about endianess:
you are sending binary data between machines or processes (using a network or file). If the machines may have different byte order or the protocol used specifies a particular byte order (which it should), you'll need to deal with endianess.
you have code that access memory though pointers of different types (say you access a unsigned int variable through a char*).
If you do these things you're dealing with byte order whether you know it or not - it might be that you're dealing with it by assuming it's one way or the other, which may work fine as long as your code doesn't have to deal with a different platform.
In a similar vein, you generally need to deal with alignment issues in those same cases and for similar reasons. Once again, you might be dealing with it by doing nothing and having everything work fine because you don't have to cross platform boundaries (which may come back to bite you down the road if that does become a requirement).
If you mean a PC by "standard gamer hardware", then you don't have to worry about endianness as it will always be little endian on x86/x64. But if you want to port the project to other architectures, then you should design it endianness-independently.
Whenever you recieve/transmit data from a network, remeber to convert to/from network and host byte order. The C functions htons, htonl etc, or equivalients in your language, should be used here.
Whenever you read multi-byte values (like UTF-16 characters or 32 bit ints) from a file, since that file might have originated on a system with different endianness. If the file is UTF 16 or 32 it probably has a BOM (byte-order mark). Otherwise, the file format will have to specify endianness in some way.
You only need to worry about it if your game needs to run on different hardware architectures. If you are positive that it will always run on Intel hardware then you can forget about it. If it will run on Linux though many people use different architectures than Intel and you may end up having to think about it.
Are you distributing you game in source code form?
Because if you are distributing you game as a binary only, then you know exactly which processor families your game will run on. Also, the media files, are they user generated (possibly via a level editor) or are they really only ment to be supplied by yourself?
If this is a truly closed environment (your distribute binaries and the game assets are not intended to be customized) then you know your own risks to endians and I personally wouldn't fool with it.
However, if you are either distributing source and/or hoping people will customize their game, then you have a potential for concern. However, with most of the desktop/laptop computers around these days moving to x86 I would think this is a diminishing concern.
The problem occurs with networking and how the data is sent and when you are doing bit fiddling on different processors since different processors may store the data differently in memory.
I believe Power PC has the opposite endianness of the Intel boards. Might be able to have a routine that sets the endianness dependant on the architecture? I'm not sure if you can actually tell what the hardware architecture is in code...maybe someone smarter then me does know the answer to that question.
Now in reference to your statement "standard" Gamer H/W, I would say typically you're going to look at Consumer Off the Shelf solutions are really what most any Standard Gamer is using, so you're almost going to for sure get the same endian across the board. I'm sure someone will disagree with me but that's my $.02
Ha...I just noticed on the right there is a link that is showing up related to the suggestion I had above.
Find Endianness through a c program

Resources