Different types - different endianness?

Different types - different endianness? - c

I'm working on rather old code atm, and this code tests the endianness of types like short, int, long and long long separately.
Are there systems "still in use" that actually have different endianness for different types (due to different sizes of these types)? The only example that I know of is the PDP-11, where the two 16 bit halves of 32 bit values are stored in "big endian order" whereas the two 8 bit halves of each of these 16 bit are stored in "little endian order".
Due to undefined behavior in the mentioned tests I probably need to rewrite parts of this and want to know if it's worth the effort to keep that complexity. I know that (and how) I can write code that's independent of the system endianness, but this would be a lot of changes that I currently don't have the time for.

Big endian machines are still in use, in digital signal processors (DSP) where TI provides numerous examples, and in general purpose processors where the Motorola 68000 is an example. Notably, in some DSP and RISC processors (c.f. ARM and Power), endianess is configurable and sometimes at multiple levels.
Here is an example by TI that combines big-endian and little-endian processors for particular functionality, "OMAP910 Device"
The history of endianess in general purpose processors is described in the following IEEE article, Endianess in personal computers
Reasons for using a DSP or ARM in a design include that the device may be optimized for a particular functionality, more cost effective, require less supporting circuity, or use less power compared to a general purpose processor. The OMAP910 demonstrates endianess for an intended functionality.
Code developed to run on platforms with different endianess, is often conditionalized for the endianess of the platform and where configurable and relevant, the rule is generally to explicitly set or detect the endianess.

Related

Migrating from big to little endian: How to predetermine problematic code?

I am about to migrate a small project of C code (30+ kSLOC) from a 32-bit big to a 32-bit little endian platform. I would like to check ante festum, how much work this will be, so I would like to spot code that relies on the original endianess.
I am looking for an as comprehensive as possible collection of C code idioms, which are depending on big endian. Do not bother with the effort needed to detect the use of such idioms in real code, I have some code analysis tool support available.

Some things to look out for:
Fishy pointer casts and fishy type conversions between integer types of different sizes. These may also be latent alignment or strict aliasing bugs.
Serialization/de-serialization code, where data is read from/written to byte arrays.
Data communication interfaces without serialization/de-serialization code. That is: CPU just happened to have same endianess as the network, which is common for big endian systems in particular. Ethernet, CAN, UART and so on.
Structs with bit-fields.
Union type punning.

What was with the historical typedef soup for integers in C programs?

This is a possibly inane question whose answer I should probably know.
Fifteen years ago or so, a lot of C code I'd look at had tons of integer typedefs in platform-specific #ifdefs. It seemed every program or library I looked at had their own, mutually incompatible typedef soup. I didn't know a whole lot about programming at the time and it seemed like a bizarre bunch of hoops to jump through just to tell the compiler what kind of integer you wanted to use.
I've put together a story in my mind to explain what those typedefs were about, but I don't actually know whether it's true. My guess is basically that when C was first developed and standardized, it wasn't realized how important it was to be able to platform-independently get an integer type of a certain size, and thus all the original C integer types may be of different sizes on different platforms. Thus everyone trying to write portable C code had to do it themselves.
Is this correct? If so, how were programmers expected to use the C integer types? I mean, in a low level language with a lot of bit twiddling, isn't it important to be able to say "this is a 32 bit integer"? And since the language was standardized in 1989, surely there was some thought that people would be trying to write portable code?

When C began computers were less homogenous and a lot less connected than today. It was seen as more important for portability that the int types be the natural size(s) for the computer. Asking for an exactly 32-bit integer type on a 36-bit system is probably going to result in inefficient code.
And then along came pervasive networking where you are working with specific on-the-wire size fields. Now interoperability looks a whole lot different. And the 'octet' becomes the de facto quanta of data types.
Now you need ints of exact multiples of 8-bits, so now you get typedef soup and then eventually the standard catches up and we have standard names for them and the soup is not as needed.

C's earlier success was due to it flexibility to adapt to nearly all existing variant architectures #John Hascall with:
1) native integer sizes of 8, 16, 18, 24, 32, 36, etc. bits,
2) variant signed integer models: 2's complement, 1's complement, signed integer and
3) various endian, big, little and others.
As coding developed, algorithms and interchange of data pushed for greater uniformity and so the need for types that met 1 & 2 above across platforms. Coders rolled their own like typedef int int32 inside a #if .... The many variations of that created the soup as noted by OP.
C99 introduced (u)int_leastN_t, (u)int_fastN_t, (u)intmax_t to make portable yet somewhat of minimum bit-width-ness types. These types are required for N = 8,16,32,64.
Also introduced are semi-optional types (see below **) like (u)intN_t which has the additional attributes of they must be 2's complement and no padding. It is these popular types that are so widely desired and used to thin out the integer soup.
how were programmers expected to use the C integer types?
By writing flexible code that did not strongly rely on bit width. Is is fairly easy to code strtol() using only LONG_MIN, LONG_MAX without regard to bit-width/endian/integer encoding.
Yet many coding tasks oblige precise width types and 2's complement for easy high performance coding. It is better in that case to forego portability to 36-bit machines and 32-bit sign-magnitudes ones and stick with 2N wide (2's complement for signed) integers. Various CRC & crypto algorithms and file formats come to mind. Thus the need for fixed-width types and a specified (C99) way to do it.
Today there are still gotchas that still need to be managed. Example: The usual promotions int/unsigned lose some control as those types may be 16, 32 or 64.
**
These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall deﬁne the corresponding typedef names. C11 7.20.1.1 Exact-width integer types 3

I remember that period and I'm guilty of doing the same!
One issue was the size of int, it could be the same as short, or long or in between. For example, if you were working with binary file formats, it was imperative that everything align. Byte ordering complicated things as well. Many developer went the lazy route and just did fwrite of whatever, instead of picking numbers apart byte-by-byte. When the machines upgraded to longer word lengths, all hell broke loose. So typedef was an easy hack to fix that.
If performance was an issue, as it often was back then, int was guaranteed to be the machine's fastest natural size, but if you needed 32 bits, and int was shorter than that, you were in danger of rollover.
In the C language, sizeof() is not supposed to be resolved at the preprocessor stage, which made things complicated because you couldn't do #if sizeof(int) == 4 for example.
Personally, some of the rationale was also just working from an assembler language mindset and not being willing to abstract out the notion of what short, int and long are for. Back then, assembler was used in C quite frequently.
Nowadays, there are plenty of non-binary file formats, JSON, XML, etc. where it doesn't matter what the binary representation is. As well, many popular platforms have settled on a 32-bit int or longer, which is usually enough for most purposes, so there's less of an issue with rollover.

C is a product of the early 1970s, when the computing ecosystem was very different. Instead of millions of computers all talking to each other over an extended network, you had maybe a hundred thousand systems worldwide, each running a few monolithic apps, with almost no communication between systems. You couldn't assume that any two architectures had the same word sizes, or represented signed integers in the same way. The market was still small enough that there wasn't any percieved need to standardize, computers didn't talk to each other (much), and nobody though much about portability.
If so, how were programmers expected to use the C integer types?
If you wanted to write maximally portable code, then you didn't assume anything beyond what the Standard guaranteed. In the case of int, that meant you didn't assume that it could represent anything outside of the range [-32767,32767], nor did you assume that it would be represented in 2's complement, nor did you assume that it was a specific width (it could be wider than 16 bits, yet still only represent a 16 bit range if it contained any padding bits).
If you didn't care about portability, or you were doing things that were inherently non-portable (which bit twiddling usually is), then you used whatever type(s) met your requirements.
I did mostly high-level applications programming, so I was less worried about representation than I was about range. Even so, I occasionally needed to dip down into binary representations, and it always bit me in the ass. I remember writing some code in the early '90s that had to run on classic MacOS, Windows 3.1, and Solaris. I created a bunch of enumeration constants for 32-bit masks, which worked fine on the Mac and Unix boxes, but failed to compile on the Windows box because on Windows an int was only 16 bits wide.

C was designed as a language that could be ported to as wide a range of machines as possible, rather than as a language that would allow most kinds of programs to be run without modification on such a range of machines. For most practical purposes, C's types were:
An 8-bit type if one is available, or else the smallest type that's at least 8 bits.
A 16-bit type, if one is available, or else the smallest type that's at least 16 bits.
A 32-bit type, if one is available, or else some type that's at least 32 bits.
A type which will be 32 bits if systems can handle such things as efficiently as 16-bit types, or 16 bits otherwise.
If code needed 8, 16, or 32-bit types and would be unlikely to be usable on machines which did not support them, there wasn't any particular problem with such code regarding char, short, and long as 8, 16, and 32 bits, respectively. The only systems that didn't map those names to those types would be those which couldn't support those types and wouldn't be able to usefully handle code that required them. Such systems would be limited to writing code which had been written to be compatible with the types that they use.
I think C could perhaps best be viewed as a recipe for converting system specifications into language dialects. A system which uses 36-bit memory won't really be able to efficiently process the same language dialect as a system that use octet-based memory, but a programmer who learns one dialect would be able to learn another merely by learning what integer representations the latter one uses. It's much more useful to tell a programmer who needs to write code for a 36-bit system, "This machine is just like the other machines except char is 9 bits, short is 18 bits, and long is 36 bits", than to say "You have to use assembly language because other languages would all require integer types this system can't process efficiently".

Not all machines have the same native word size. While you might be tempted to think a smaller variable size will be more efficient, it just ain't so. In fact, using a variable that is the same size as the native word size of the CPU is much, much faster for arithmetic, logical and bit manipulation operations.
But what, exactly, is the "native word size"? Almost always, this means the register size of the CPU, which is the same as the Arithmetic Logic Unit (ALU) can work with.
In embedded environments, there are still such things as 8 and 16 bit CPUs (are there still 4-bit PIC controllers?). There are mountains of 32-bit processors out there still. So the concept of "native word size" is alive and well for C developers.
With 64-bit processors, there is often good support for 32-bit operands. In practice, using 32-bit integers and floating point values can often be faster than the full word size.
Also, there are trade-offs between native word alignment and overall memory consumption when laying out C structures.
But the two common usage patterns remain: size agnostic code for improved speed (int, short, long), or fixed size (int32_t, int16_t, int64_t) for correctness or interoperability where needed.

Can bit-level operations ever be "fast" in software?

Let me clarify the soft-sounding title straight away. This is actually something that has been nagging me for quite a while now, despite feeling like a pretty basic question.
Many languages give a faulty impression of efficiency by letting the developer play with bits, such as thebool.h C header which, as I understand it, is essentially just an int with a wrapper around it. Essentially, the byte seems to be the absolute lowest atomic unit of computation in C - bool x = 0 is not faster/more memory efficient than int x = 0.
What I'm wondering is then, what do we do when we want to implement an algorithm that is inherently tied to loading and manipulating single bits, such as decoding binary codes, unweighted graph connectivity problems and many others? In other words, is the atomicity of the byte an inherent property of modern CPUs or could we theoretically rival the efficiency of an ASIC just by using machine code?
EDIT: Pretty surprised by the downvotes, but I suppose people just didn't understand what I was asking. I think a really good, canonical example is traversing a binary tree (or any other sequential list of yes/no questions really). What I was wondering is if modern cpu architectures are fundamentally poorly equipped to do this (as compared to an ASIC/FPGA, that is), or if this is an artifact of some abstraction layer (language/kernel/etc). Mark's answer was good though (although I'd love a reference to the mentioned architecture extension)

No you can't rival the efficiency of an ASIC. An ASIC means you can replicate parallel bit streams as much as you have budget for on the chip. You just cut and paste your HDL until you fill your die space. A CPU only has a limited number of cores.
I'm guessing that you think that bit operations like z = (x|(1<<y)>>4 are slow and yes, all that bit shifting is extra overhead. But that is just accessing the bits. The bit operations (OR, AND, etc) are all as fast as you can get on modern CPU, i.e. 1 cycle throughput.
The 8051 architecture has a way of accessing individual bits directly, without using byte registers, but if you are worried about speed, you wouldn't consider a 8051.

By convention, a byte is the smallest addressable piece of memory in a computer. The number of bits that a byte has can differ from one system to another.
In the case of x86, there are instructions to move bytes from memory to a register and back, and instructions to manipulate values in registers. I can't speak to other architectures, but they most likely work in a similar way.
So anytime you need to manipulate some number of bits you need to do so a byte (or word, i.e. multiple bytes) at a time.

I also don't know why this question got so many downvotes, the question:
In other words, is the atomicity of the byte an inherent property of modern CPUs or could we theoretically rival the efficiency of an ASIC just by using machine code?
seems reasonable to me. It's certainly not a bad compared to many questions on stackoverflow.
The answer is: no CPUs can't match the efficiency of an ASIC.
However, the reason is not because CPUs are manipulating bytes instead of bits. Instead it's because most of the work that CPUs do to process an instruction is involved with loading it from memory, decoding it, tracking dependencies, etc., rather than performing the actual arithmetic operations on bits or bytes that the instruction directs the CPU to perform.
A good explanation of this is shown in the following presentation from the 2014 LLVM developers meeting. The presentation shows how OpenCL can be used to generate custom FPGA hardware. Slides 12 to 28 show a nice pictorial example of overhead associated with a CPU algorithm and how custom hardware can remove much of this overhead.

Is a 32 bit integer on a 16 bit machine possible?

Just wondering if it possible. If yes, are there other ways besides compiler emulation layer?
Thanks

It's processor-dependent. Some processors have special instructions to manipulate register pairs (e.g. the 8-bit AVR instruction set has instructions for 16-bit register pairs). On processors without such native support, the compiler usually emits instructions that work with pairs of registers at a time (this is what is usually done to support 64-bit numbers on 32-bit processors, for example).

Yes, it is possible. Look at the Z80 from the 70s as an example of a 8-bit processor that can manipulate 16-bit values.
Make sure you know what "16-bit processor" means because I have found a lot of people have a misconception about it. Does it mean the opcode size, because some processors have variable width operations? Does it mean the addressing size? Does it mean the smallest/largest value it can natively manipulate?
And as far as at compile-time, sure. Check out arbitrary large number libraries (aka "big nums").

Word and Double Word integers in C

I am trying to implement a simple, moderately efficient bignum library in C. I would like to store digits using the full register size of the system it's compiled on (presumably 32 or 64-bit ints). My understanding is that I can accomplish this using intptr_t. Is this correct? Is there a more semantically appropriate type, i.e. something like intword_t?
I also know that with GCC I can easily do overflow detection on a 32-bit machine by upcasting both arguments to 64-bit ints, which will occupy two registers and take advantage of instructions like IA31 ADC (add with carry). Can I do something similar on a 64-bit machine? Is there a 128-bit type I can upcast to which will compile to use these instructions if they're available? Better yet, is there a standard type that represents twice the register size (like intdoubleptr_t) so this could be done in a machine independent fashion?
Thanks!

Any reason not to use size_t? size_t is 4 bytes on a 32-bit system and 8 bytes on a 64-bit system, and is probably more portable than using WORD_SIZE (I think WORD_SIZE is gcc-specific, no?)
I am not aware of any 128-bit value on 64-bit systems, could be wrong here but haven't come across that type in the kernel or regular user apps.

I'd strongly recommend using the C99 <stdint.h> header. It declares int32_t, int64_t, uint32_t, and uint64_t, which look like what you really want to use.
EDIT: As Alok points out, int_fast32_t, int_fast64_t, etc. are probably what you want to use. The number of bits you specify should be the minimum you need for the math to work, i.e. for the calculation to not "roll over".
The optimization comes from the fact that the CPU doesn't have to waste cycles realigning data, padding the leading bits on a read, and doing a read-modify-write on a write. Truth is, a lot of processors (such as recent x86s) have hardware in the CPU that optimizes these access pretty well (at least the padding and read-modify-write parts), since they're so common and usually only involve transfers between the processor and cache.
So the only thing left for you to do is make sure the accesses are aligned: take sizeof(int_fast32_t) or whatever and use it to make sure your buffer pointers are aligned to that.
Truth is, this may not amount to that much improvement (due to the hardware optimizing transfers at runtime anyway), so writing something and timing it may be the only way to be sure. Also, if you're really crazy about performance, you may need to look at SSE or AltiVec or whatever vectorization tech your processor has, since that will outperform anything you can write that is portable when doing vectored math.