Is it possible to create custom-width integers in C?

Is it possible to create custom-width integers in C? - c

The C standard and C compilers come with fixed width integer types, such as uint8_t, int16_t, etc.
Is there a way of defining a 128-bit integer in C that would be useable in code using the same semantics as the existing fixed-width integers?

You'll need something like GMP:
http://gmplib.org/
You won't get the "exact" semantics.

Assuming you're programming a 64-bit machine, you could define a uint128_t type as a struct of two uint64_ts and implement just the arithmetic operators you need by hand. You'd have to manually handle carrying bits from the low 64 bits to the high or vice versa in the case of an operation that overflows in one or the other direction. (For example, add 0xFFFFFFFF to 0x00000001.) That would be much lighter-weight than using an entire library for arbitrary-size bignums like GMP. It'd be a bit tricky to write but by no means impossible--it's pretty much how the built-in BASICs on 8-bit computers back in the day handled math with numbers larger than 255, for example.

Related

What was with the historical typedef soup for integers in C programs?

This is a possibly inane question whose answer I should probably know.
Fifteen years ago or so, a lot of C code I'd look at had tons of integer typedefs in platform-specific #ifdefs. It seemed every program or library I looked at had their own, mutually incompatible typedef soup. I didn't know a whole lot about programming at the time and it seemed like a bizarre bunch of hoops to jump through just to tell the compiler what kind of integer you wanted to use.
I've put together a story in my mind to explain what those typedefs were about, but I don't actually know whether it's true. My guess is basically that when C was first developed and standardized, it wasn't realized how important it was to be able to platform-independently get an integer type of a certain size, and thus all the original C integer types may be of different sizes on different platforms. Thus everyone trying to write portable C code had to do it themselves.
Is this correct? If so, how were programmers expected to use the C integer types? I mean, in a low level language with a lot of bit twiddling, isn't it important to be able to say "this is a 32 bit integer"? And since the language was standardized in 1989, surely there was some thought that people would be trying to write portable code?

When C began computers were less homogenous and a lot less connected than today. It was seen as more important for portability that the int types be the natural size(s) for the computer. Asking for an exactly 32-bit integer type on a 36-bit system is probably going to result in inefficient code.
And then along came pervasive networking where you are working with specific on-the-wire size fields. Now interoperability looks a whole lot different. And the 'octet' becomes the de facto quanta of data types.
Now you need ints of exact multiples of 8-bits, so now you get typedef soup and then eventually the standard catches up and we have standard names for them and the soup is not as needed.

C's earlier success was due to it flexibility to adapt to nearly all existing variant architectures #John Hascall with:
1) native integer sizes of 8, 16, 18, 24, 32, 36, etc. bits,
2) variant signed integer models: 2's complement, 1's complement, signed integer and
3) various endian, big, little and others.
As coding developed, algorithms and interchange of data pushed for greater uniformity and so the need for types that met 1 & 2 above across platforms. Coders rolled their own like typedef int int32 inside a #if .... The many variations of that created the soup as noted by OP.
C99 introduced (u)int_leastN_t, (u)int_fastN_t, (u)intmax_t to make portable yet somewhat of minimum bit-width-ness types. These types are required for N = 8,16,32,64.
Also introduced are semi-optional types (see below **) like (u)intN_t which has the additional attributes of they must be 2's complement and no padding. It is these popular types that are so widely desired and used to thin out the integer soup.
how were programmers expected to use the C integer types?
By writing flexible code that did not strongly rely on bit width. Is is fairly easy to code strtol() using only LONG_MIN, LONG_MAX without regard to bit-width/endian/integer encoding.
Yet many coding tasks oblige precise width types and 2's complement for easy high performance coding. It is better in that case to forego portability to 36-bit machines and 32-bit sign-magnitudes ones and stick with 2N wide (2's complement for signed) integers. Various CRC & crypto algorithms and file formats come to mind. Thus the need for fixed-width types and a specified (C99) way to do it.
Today there are still gotchas that still need to be managed. Example: The usual promotions int/unsigned lose some control as those types may be 16, 32 or 64.
**
These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall deﬁne the corresponding typedef names. C11 7.20.1.1 Exact-width integer types 3

I remember that period and I'm guilty of doing the same!
One issue was the size of int, it could be the same as short, or long or in between. For example, if you were working with binary file formats, it was imperative that everything align. Byte ordering complicated things as well. Many developer went the lazy route and just did fwrite of whatever, instead of picking numbers apart byte-by-byte. When the machines upgraded to longer word lengths, all hell broke loose. So typedef was an easy hack to fix that.
If performance was an issue, as it often was back then, int was guaranteed to be the machine's fastest natural size, but if you needed 32 bits, and int was shorter than that, you were in danger of rollover.
In the C language, sizeof() is not supposed to be resolved at the preprocessor stage, which made things complicated because you couldn't do #if sizeof(int) == 4 for example.
Personally, some of the rationale was also just working from an assembler language mindset and not being willing to abstract out the notion of what short, int and long are for. Back then, assembler was used in C quite frequently.
Nowadays, there are plenty of non-binary file formats, JSON, XML, etc. where it doesn't matter what the binary representation is. As well, many popular platforms have settled on a 32-bit int or longer, which is usually enough for most purposes, so there's less of an issue with rollover.

C is a product of the early 1970s, when the computing ecosystem was very different. Instead of millions of computers all talking to each other over an extended network, you had maybe a hundred thousand systems worldwide, each running a few monolithic apps, with almost no communication between systems. You couldn't assume that any two architectures had the same word sizes, or represented signed integers in the same way. The market was still small enough that there wasn't any percieved need to standardize, computers didn't talk to each other (much), and nobody though much about portability.
If so, how were programmers expected to use the C integer types?
If you wanted to write maximally portable code, then you didn't assume anything beyond what the Standard guaranteed. In the case of int, that meant you didn't assume that it could represent anything outside of the range [-32767,32767], nor did you assume that it would be represented in 2's complement, nor did you assume that it was a specific width (it could be wider than 16 bits, yet still only represent a 16 bit range if it contained any padding bits).
If you didn't care about portability, or you were doing things that were inherently non-portable (which bit twiddling usually is), then you used whatever type(s) met your requirements.
I did mostly high-level applications programming, so I was less worried about representation than I was about range. Even so, I occasionally needed to dip down into binary representations, and it always bit me in the ass. I remember writing some code in the early '90s that had to run on classic MacOS, Windows 3.1, and Solaris. I created a bunch of enumeration constants for 32-bit masks, which worked fine on the Mac and Unix boxes, but failed to compile on the Windows box because on Windows an int was only 16 bits wide.

C was designed as a language that could be ported to as wide a range of machines as possible, rather than as a language that would allow most kinds of programs to be run without modification on such a range of machines. For most practical purposes, C's types were:
An 8-bit type if one is available, or else the smallest type that's at least 8 bits.
A 16-bit type, if one is available, or else the smallest type that's at least 16 bits.
A 32-bit type, if one is available, or else some type that's at least 32 bits.
A type which will be 32 bits if systems can handle such things as efficiently as 16-bit types, or 16 bits otherwise.
If code needed 8, 16, or 32-bit types and would be unlikely to be usable on machines which did not support them, there wasn't any particular problem with such code regarding char, short, and long as 8, 16, and 32 bits, respectively. The only systems that didn't map those names to those types would be those which couldn't support those types and wouldn't be able to usefully handle code that required them. Such systems would be limited to writing code which had been written to be compatible with the types that they use.
I think C could perhaps best be viewed as a recipe for converting system specifications into language dialects. A system which uses 36-bit memory won't really be able to efficiently process the same language dialect as a system that use octet-based memory, but a programmer who learns one dialect would be able to learn another merely by learning what integer representations the latter one uses. It's much more useful to tell a programmer who needs to write code for a 36-bit system, "This machine is just like the other machines except char is 9 bits, short is 18 bits, and long is 36 bits", than to say "You have to use assembly language because other languages would all require integer types this system can't process efficiently".

Not all machines have the same native word size. While you might be tempted to think a smaller variable size will be more efficient, it just ain't so. In fact, using a variable that is the same size as the native word size of the CPU is much, much faster for arithmetic, logical and bit manipulation operations.
But what, exactly, is the "native word size"? Almost always, this means the register size of the CPU, which is the same as the Arithmetic Logic Unit (ALU) can work with.
In embedded environments, there are still such things as 8 and 16 bit CPUs (are there still 4-bit PIC controllers?). There are mountains of 32-bit processors out there still. So the concept of "native word size" is alive and well for C developers.
With 64-bit processors, there is often good support for 32-bit operands. In practice, using 32-bit integers and floating point values can often be faster than the full word size.
Also, there are trade-offs between native word alignment and overall memory consumption when laying out C structures.
But the two common usage patterns remain: size agnostic code for improved speed (int, short, long), or fixed size (int32_t, int16_t, int64_t) for correctness or interoperability where needed.

Is there a general way in C to make a number greater than 64 bits or smaller than 8 bits?

First of all, I am aware that what I am trying to do might be outside the C standard.
I'd like to know if it is possible to make a uint4_t/int4_t or uint128_t/int128_t type in C.
I know I could do this using bitshifts and complex functions, but can I do it without those?

You can use bitfields within a structure to get fields narrower than a uint8_t, but, the base datatype they're stored in will not be any smaller.
struct SmallInt
{
unsigned int a : 4;
};
will give you a structure with a member called a that is 4 bits wide.

Individual storage units (bytes) are no less than CHAR_BITS bits wide1; even if you create a struct with a single 4-bit bitfield, the associated object will always take up a full storage unit.
There are multiple precision libraries such as GMP that allow you to work with values that can't fit into 32 or 64 bits. Might want to check them out.
8 bits minimum, but may be wider.

In practice, if you want very wide numbers (but that is not specified in standard C11) you probably want to use some arbitrary-precision arithmetic external library (a.k.a. bignums). I recommend using GMPlib.
In some cases, for tiny ranges of numbers, you might use bitfields inside struct to have tiny integers. Practically speaking, they can be costly (the compiler would emit shift and bitmask instructions to deal with them).
See also this answer mentioning __int128_t as an extension in some compilers.

Looking for Ansi C89 arbitrary precision math library

I wrote an Ansi C compiler for a friend's custom 16-bit stack-based CPU several years ago but I never got around to implementing all the data types. Now I would like to finish the job so I'm wondering if there are any math libraries out there that I can use to fill the gaps. I can handle 16-bit integer data types since they are native to the CPU and therefore I have all the math routines (ie. +, -, *, /, %) done for them. However, since his CPU does not handle floating point then I have to implement floats/doubles myself. I also have to implement the 8-bit and 32-bit data types (bother integer and floats/doubles). I'm pretty sure this has been done and redone many times and since I'm not particularly looking forward to recreating the wheel I would appreciate it if someone would point me at a library that can help me out.
Now I was looking at GMP but it seems to be overkill (library must be absolutely huge, not sure my custom compiler would be able to handle it) and it takes numbers in the form of strings which would be wasteful for obvious reasons. For example :
mpz_set_str(x, "7612058254738945", 10);
mpz_set_str(y, "9263591128439081", 10);
mpz_mul(result, x, y);
This seems simple enough, I like the api... but I would rather pass in an array rather than a string. For example, if I wanted to multiply two 32-bit longs together I would like to be able to pass it two arrays of size two where each array contains two 16-bit values that actually represent a 32-bit long and have the library place the output into an output array. If I needed floating point then I should be able to specify the precision as well.
This may seem like asking for too much but I'm asking in the hopes that someone has seen something like this.
Many thanks in advance!

Let's divide the answer.
8-bit arithmetic
This one is very easy. In fact, C already talks about this under the term "integer promotion". This means that if you have 8-bit data and you want to do an operation on them, you simply pad them with zero (or one if signed and negative) to make them 16-bit. Then you proceed with the normal 16-bit operation.
32-bit arithmetic
Note: so long as the standard is concerned, you don't really need to have 32-bit integers.
This could be a bit tricky, but it is still not worth using a library for. For each operation, you would need to take a look at how you learned to do them in elementary school in base 10, and then do the same in base 216 for 2 digit numbers (each digit being one 16-bit integer). Once you understand the analogy with simple base 10 math (and hence the algorithms), you would need to implement them in assembly of your CPU.
This basically means loading the most significant 16 bit on one register, and the least significant in another register. Then follow the algorithm for each operation and perform it. You would most likely need to get help from overflow and other flags.
Floating point arithmetic
Note: so long as the standard is concerned, you don't really need to conform to IEEE 754.
There are various libraries already written for software emulated floating points. You may find this gcc wiki page interesting:
GNU libc has a third implementation, soft-fp. (Variants of this are also used for Linux kernel math emulation on some targets.) soft-fp is used in glibc on PowerPC --without-fp to provide the same soft-float functions as in libgcc. It is also used on Alpha, SPARC and PowerPC to provide some ABI-specified floating-point functions (which in turn may get used by GCC); on PowerPC these are IEEE quad functions, not IBM long double ones.
Performance measurements with EEMBC indicate that soft-fp (as speeded up somewhat using ideas from ieeelib) is about 10-15% faster than fp-bit and ieeelib about 1% faster than soft-fp, testing on IBM PowerPC 405 and 440. These are geometric mean measurements across EEMBC; some tests are several times faster with soft-fp than with fp-bit if they make heavy use of floating point, while others don't make significant use of floating point. Depending on the particular test, either soft-fp or ieeelib may be faster; for example, soft-fp is somewhat faster on Whetstone.
One answer could be to take a look at the source code for glibc and see if you could salvage what you need.

Type with fixed length > 64 in C

I'm currently writing some cryptographic code in C that has to deal with numbers greater than what uint64_t is able to hold. (It's for doing HDCP authentication, and one of the cipher values is 84 bits)
What would be the best way to do this? One of the variables I need to store is 84 bits long — should I take one uint64_t for the low 64 bits, and an uint32_t for the high 20 bits? This seems like a hack to me, but I'm not sure if there's really a better solution, especially for storing in a structure.
The ideal alternative would be declaring a custom datatype, like uint64_t, but instead 84 bits long, which behaves the same way. Is this even possible? I'm not sure if libc can handle variables with bit widths not multiples of 8, but an 88 bit type could work for that, although I'm not even sure how feasible declaring a custom bit-width data type is.
Edit: I've checked for uint128_t, but that doesn't seem to exist in clang's C99 mode. I'll be doing standard arithmetic and bit operations on this, the standard shebang associated with crypto code.

7 years ago my cryptography professor introduced us to MIRACL open source library, which contained a very fast implementation of cryptographic functions and the facilities to work on big precision numbers. It's been a long time since I used it, but it seems to be still in good shape. Maybe it's not ideal if your problem is simply to represent exactly 84-bit values and nothing else, but it might be a more general solution.

OK, I just installed a 32-bit Clang 3.1, and this works:
#include <stdio.h>
int main()
{
unsigned __int128 i;
printf("%d\n", sizeof(i));
}
(And prints "16".)

If you need more than 64 bits, and there is no integer type with more than 64 bits in the target environment, then you must use something else. You will have to use two variables, an array, or perhaps a struct with several members. You will also have to create the operations yourself, such as a plus function if you want to add numbers.
There is no built-in support in C to easily and automatically create N-bit integers.

What about the GNU Multiple Precision (GMP) library? It claims arbitrary precision:
GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.

8 bit enum, in C

I have to store instructions, commands that I will be receiving via serial.
The commands will be 8 bits long.
I need to preserve transparency between command name, and its value.
So as to avoid having to translate an 8-bit number received in serial into any type.
I'd like to use Enumerations to deal with them in my code.
Only a enumeration corresponds to a on this platform a 16 bit integer.
The platform is AVR ATmega169V microcontroller, on the Butterfly demo board.
It is a 8bit system with some limited support for 16bit operations.
It is not a fast system and has about 1KB of RAM.
It doesn't have any luxuries like file I/O, or an operating systems.
So any suggestions as to what type I should be using to store 8-bit commands?
There has got to be something better than a massive header of #defines.

gcc's -fshort-enums might be useful:
Allocate to an "enum" type only as
many bytes as it needs for the
declared range of possible values.
Specifically, the "enum" type will be
equivalent to
the smallest integer type which has enough room.
In fact, here's a page with a lot of relevant information. I hope you come across many GCC switches you never knew existed. ;)

You are trying to solve a problem that does not exist.
Your question is tagged C. In C language enum types in value context are fully compatible with integral types and behave just like other integral types. When used in expressions, they are subjected to exactly the same integral promotions as other integral types. Once you take that into account, you should realize that if you want to store values described by enumeration constants in a 8-bit integral type, all you have to do is to choose a suitable generic 8-bit integral type (say int8_t) and use it instead of enum type. You'll lose absolutely nothing by storing your enum constant values in an object of type int8_t (as opposed to an object explicitly declared with enum type).
The issue you describe would exist in C++, where enum types are separated much farther from other integral types. In C++ using an integral type in place of enum type for the purpose of saving memory is more difficult (although possible). But not in C, where it requires no additional effort whatsoever.

I don't see why an enum wouldn't work. Comparisons to, and assignments from, an enum should all work fine with the default widening. Just be careful that your 8 bit values are signed correctly (I would think you would want unsigned extension).
You will get 16-bit comparisons this way, I hope that won't be a performance problem (it shouldn't be, especially if your processor is 16-bit as it sounds like it is).

Microsoft's C compiler allows you to do something like this, but it's an extension (it's standard in C++0x):
enum Foo : unsigned char {
blah = 0,
blargh = 1
};
Since you tagged GCC, I'm not entirely sure if the same thing is possible, but GCC might have an extension in gnu99 mode or something for it. Give it a whirl.

I'd recommend to stay on enum in any case for the following reasons:
This solution allows you to map command values directly to what your serial protocol expects.
If you really use 16-bit architecture there is not so big number of advantages to move to 8 bits type. Think about aspects other then 1 memory byte saved.
At some compilers I used actual enum size used minimal number of bits (enums that could be fit in byte used only byte, then 16 bit then 32).
First you should not care about real type width. Only if you really need effective way of storage you should use compiler flags such as -fshort-enums on GNU compiler but I don't recommend them unless you really need them.
As last option you can define 'enum' as presentation data for the commands and use conversion to byte with 2 simple operations to store / restore command value to/from memory (and encapsulate this in one place). What about this? These are very simple operations so you can even inline them (but this allows you to really use only 1 byte for storage and from other side to perform operations using most usable enum defined as you like.

Answer which is relevant for ARC compiler
(Quoted from DesignWare MetaWare C/C++ Programmer’s Guide for ARC; section 11.2.9.2)
Size of Enumerations The size of an enum type depends on the status of toggle *Long_enums*.
■ If toggle *Long_enums* is off, the enum type maps to the smallest of one, two, or four bytes, such that all values can be represented.
■ If toggle *Long_enums* is on, an enum maps to
four bytes (matching the AT&T Portable C Compiler convention).