Type casting with pointers - c

I'm programming in Standard C on several microcontrollers now for about 30 years.
Now I have a question:
I have somewhere a variable with type long. The value of this variable does never extend +-0x7fff.
Choosing "long" has other benefits. Because it's a 32 bit machine, arithmetic with 32 bit is faster than 16 or 8 bit (no sign extension needed, this speeds up my fast control algorithm up to 10%...).
On the other hand I have a legacy function, that requires this variable as short.
At the moment I'm copying this variable somewhere from "long" into a "short" working variable.
To speed my code further up, I want to get rid of this useless copy.
As modern CPUs are quite good with pointer operations, I like to access the long variable with a local pointer
short * p;
During initialization of the module
p = (short *) pointer_to_long_variable
so that I can access this variable simply with
do_some_calculations with *p
I think this idea should work on my low-endian machine, but is this save even on machines with big-endianess?
Sorry for my bad English, I'm not a native speaker :)

You do not need it at all. As a general rule avoid pointer punning at any price. If you need to convert (on byte level) from type a to b then use unions or memcpy (do not worry - memcpy will not be called as it is well known to the compiler)
short legacy(short y)
{
return y+1;
}
long bar(long x)
{
return legacy(x);
}

Related

Comparison uint8_t vs uint16_t while declaring a counter

Assuming to have a counter which counts from 0 to 100, is there an advantage of declaring the counter variable as uint16_t instead of uint8_t.
Obviously if I use uint8_t I could save some space. On a processor with natural wordsize of 16 bits access times would be the same for both I guess. I couldn't think why I would use a uint16_t if uint8_t can cover the range.
Using a wider type than necessary can allow the compiler to avoid having to mask the higher bits.
Suppose you were working on a 16 bit architecture, then using uint16_t could be more efficient, however if you used uint16_t instead of uint8_t on a 32 bit architecture then you would still have the mask instructions but just masking a different number of bits.
The most efficient type to use in a cross-platform portable way is just plain int or unsigned int, which will always be the correct type to avoid the need for masking instructions, and will always be able to hold numbers up to 100.
If you are in a MISRA or similar regulated environment that forbids the use of native types, then the correct standard-compliant type to use is uint_fast8_t. This guarantees to be the fastest unsigned integer type that has at least 8 bits.
However, all of this is nonsense really. Your primary goal in writing code should be to make it readable, not to make it as fast as possible. Penny-pinching instructions like this makes code convoluted and more likely to have bugs. Also because it is harder to read, the bugs are less likely to be found during code review.
You should only try to optimize like this once the code is finished and you have tested it and found the particular part which is the bottleneck. Masking a loop counter is very unlikely to be the bottleneck in any real code.
Obviously if I use uint8_t I could save some space.
Actually, that's not necessarily obvious! A loop index variable is likely to end up in a register, and if it does there's no memory to be saved. Also, since the definition of the C language says that much arithmetic takes place using type int, it's possible that using a variable smaller than int might actually end up costing you space in terms of extra code emitted by the compiler to convert back and forth between int and your smaller variable. So while it could save you some space, it's not at all guaranteed that it will — and, in any case, the actual savings are going to be almost imperceptibly small in the grand scheme of things.
If you have an array of some number of integers in the range 0-100, using uint8_t is a fine idea if you want to save space. For an individual variable, on the other hand, the arguments are pretty different.
In general, I'd say that there are two reasons not to use type uint8_t (or, equivalently, char or unsigned char) as a loop index:
It's not going to save much data space (if at all), and it might cost code size and/or speed.
If the loop runs over exactly 256 elements (yours didn't, but I'm speaking more generally here), you may have introduced a bug (which you'll discover soon enough): your loop may run forever.
The interviewer was probably expecting #1 as an answer. It's not a guaranteed answer — under plenty of circumstances, using the smaller type won't cost you anything, and evidently there are microprocessors where it can actually save something — but as a general rule, I agree that using an 8-bit type as a loop index is, well, silly. And whether or not you agree, it's certainly an issue to be aware of, so I think it's a fair interview question.
See also this question, which discusses the same sorts of issues.
The interview question doesn't make much sense from a platform-generic point of view. If we look at code such as this:
for(uint8_t i=0; i<n; i++)
array[i] = x;
Then the expression i<n will get carried out on type int or larger because of implicit promotion. Though the compiler may optimize it to use a smaller type if it doesn't affect the result.
As for array[i], the compiler is likely to use a type corresponding to whatever address size the system is using.
What the interviewer was fishing for is likely that uint32_t on a 32 bitter tend to generate faster code in some situations. For those cases you can use uint_fast8_t, but more likely the compiler will perform optimizations no matter.
The only optimization uint8_t blocks the compiler from doing, is to allocate a larger variable than 8 bits on the stack. It doesn't however block the compiler from optimizing out the variable entirely and using a register instead. Such as for example storing it in an index register with the same width as the address bus.
Example with gcc x86_64: https://godbolt.org/z/vYscf3KW9. The disassembly is pretty painful to read, but the compiler just picked CPU registers to store anything regardless of the type of i, giving identical machine code between uint8_t anduint16_t. I would have been surprised if it didn't.
On a processor with natural wordsize of 16 bits access times would be the same for both I guess.
Yes this is true for all mainstream 16 bitters. Some might even manage faster code if given 8 bits instead of 16. Some exotic systems like DSP exist, but in case of lets say a 1 byte=16 bits DSP, then the compiler doesn't even provide you with uint8_t to begin with - it is an optional type. One generally doesn't bother with portability to wildly exotic systems, since doing so is a waste of everyone's time and money.
The correct answer: it is senseless to do manual optimization without a specific system in mind. uint8_t is perfectly fine to use for generic, portable code.

When should I just use "int" versus more sign-specific or size-specific types?

I have a little VM for a programming language implemented in C. It supports being compiled under both 32-bit and 64-bit architectures as well as both C and C++.
I'm trying to make it compile cleanly with as many warnings enabled as possible. When I turn on CLANG_WARN_IMPLICIT_SIGN_CONVERSION, I get a cascade of new warnings.
I'd like to have a good strategy for when to use int versus either explicitly unsigned types, and/or explicitly sized ones. So far, I'm having trouble deciding what that strategy should be.
It's certainly true that mixing them—using mostly int for things like local variables and parameters and using narrower types for fields in structs—causes lots of implicit conversion problems.
I do like using more specifically sized types for struct fields because I like the idea of explicitly controlling memory usage for objects in the heap. Also, for hash tables, I rely on unsigned overflow when hashing, so it's nice if the hash table's size is stored as uint32_t.
But, if I try to use more specific types everywhere, I find myself in a maze of twisty casts everywhere.
What do other C projects do?
Just using int everywhere may seem tempting, since it minimizes the need for casting, but there are several potential pitfalls you should be aware of:
An int might be shorter than you expect. Even though, on most desktop platforms, an int is typically 32 bits, the C standard only guarantees a minimum length of 16 bits. Could your code ever need numbers larger than 216−1 = 32,767, even for temporary values? If so, don't use an int. (You may want to use a long instead; a long is guaranteed to be at least 32 bits.)
Even a long might not always be long enough. In particular, there is no guarantee that the length of an array (or of a string, which is a char array) fits in a long. Use size_t (or ptrdiff_t, if you need a signed difference) for those.
In particular, a size_t is defined to be large enough to hold any valid array index, whereas an int or even a long might not be. Thus, for example, when iterating over an array, your loop counter (and its initial / final values) should generally be a size_t, at least unless you know for sure that the array is short enough for a smaller type to work. (But be careful when iterating backwards: size_t is unsigned, so for(size_t i = n-1; i >= 0; i--) is an infinite loop! Using i != SIZE_MAX or i != (size_t) -1 should work, though; or use a do/while loop, but beware of the case n == 0!)
An int is signed. In particular, this means that int overflow is undefined behavior. If there's ever any risk that your values might legitimately overflow, don't use an int; use an unsigned int (or an unsigned long, or uintNN_t) instead.
Sometimes, you just need a fixed bit length. If you're interfacing with an ABI, or reading / writing a file format, that requires integers of a specific length, then that's the length you need to use. (Of course, is such situations, you may also need to worry about things like endianness, and so may sometimes have to resort to manually packing data byte-by-byte anyway.)
All that said, there are also reasons to avoid using the fixed-length types all the time: not only is int32_t awkward to type all the time, but forcing the compiler to always use 32-bit integers is not always optimal, particularly on platforms where the native int size might be, say, 64 bits. You could use, say, C99 int_fast32_t, but that's even more awkward to type.
Thus, here are my personal suggestions for maximum safety and portability:
Define your own integer types for casual use in a common header file, something like this:
#include <limits.h>
typedef int i16;
typedef unsigned int u16;
#if UINT_MAX >= 4294967295U
typedef int i32;
typedef unsigned int u32;
#else
typedef long i32;
typedef unsigned long i32;
#endif
Use these types for anything where the exact size of the type doesn't matter, as long as they're big enough. The type names I've suggested are both short and self-documenting, so they should be easy to use in casts where needed, and minimize the risk of errors due to using a too-narrow type.
Conveniently, the u32 and u16 types defined as above are guaranteed to be at least as wide as unsigned int, and thus can be used safely without having to worry about them being promoted to int and causing undefined overflow behavior.
Use size_t for all array sizes and indexing, but be careful when casting between it and any other integer types. Optionally, if you don't like to type so many underscores, typedef a more convenient alias for it too.
For calculations that assume overflow at a specific number of bits, either use uintNN_t, or just use u16 / u32 as defined above and explicit bitmasking with &. If you choose to use uintNN_t, make sure to protect yourself against unexpected promotion to int; one way to do that is with a macro like:
#define u(x) (0U + (x))
which should let you safely write e.g.:
uint32_t a = foo(), b = bar();
uint32_t c = u(a) * u(b); /* this is always unsigned multiply */
For external ABIs that require a specific integer length, again define a specific type, e.g.:
typedef int32_t fooint32; /* foo ABI needs 32-bit ints */
Again, this type name is self-documenting, with regard to both its size and its purpose.
If the ABI might actually require, say, 16- or 64-bit ints instead, depending on the platform and/or compile-time options, you can change the type definition to match (and rename the type to just fooint) — but then you really do need to be careful whenever you cast anything to or from that type, because it might overflow unexpectedly.
If your code has its own structures or file formats that require specific bitlengths, consider defining custom types for those too, exactly as if it was an external ABI. Or you could just use uintNN_t instead, but you'll lose a little bit of self-documentation that way.
For all these types, don't forget to also define the corresponding _MIN and _MAX constants for easy bounds checking. This might sound like a lot of work, but it's really just a couple of lines in a single header file.
Finally, remember to be careful with integer math, especially overflows.
For example, keep in mind that the difference of two n-bit signed integers may not fit in an n-bit int. (It will fit into an n-bit unsigned int, if you know it's non-negative; but remember that you need to cast the inputs to an unsigned type before taking their difference to avoid undefined behavior!)
Similarly, to find the average of two integers (e.g. for a binary search), don't use avg = (lo + hi) / 2, but rather e.g. avg = lo + (hi + 0U - lo) / 2; the former will break if the sum overflows.
You seem to know what you are doing, judging from the linked source code, which I took a glance at.
You said it yourself - using "specific" types makes you have more casts. That's not an optimal route to take anyway. Use int as much as you can, for things that do not mandate a more specialized type.
The beauty of int is that it is abstracted over the types you speak of. It is optimal in all cases where you need not expose the construct to a system unaware of int. It is your own tool for abstracting the platform for your program(s). It may also yield you speed, size and alignment advantage, depending.
In all other cases, e.g. where you want to deliberately stay close to machine specifications, int can and sometimes should be abandoned. Typical cases include network protocols where the data goes on the wire, and interoperability facilities - bridges of sorts between C and other languages, kernel assembly routines accessing C structures. But don't forget that sometimes you would want to in fact use int even in these cases, as it follows platforms own "native" or preferred word size, and you might want to rely on that very property.
With platform types like uint32_t, a kernel might want to use these (although it may not have to) in its data structures if these are accessed from both C and assembler, as the latter doesn't typically know what int is supposed to be.
To sum up, use int as much as possible and resort to moving from more abstract types to "machine" types (bytes/octets, words, etc) in any situation which may require so.
As to size_t and other "usage-suggestive" types - as long as syntax follows semantics inherent to the type - say, using size_t for well, size values of all kinds - I would not contest. But I would not liberally apply it to anything just because it is guaranteed to be the largest type (regardless if it is actually true). That's an underwater stone you don't want to be stepping on later. Code has to be self-explanatory to the degree possible, I would say - having a size_t where none is naturally expected, would raise eyebrows, for a good reason. Use size_t for sizes. Use offset_t for offsets. Use [u]intN_t for octets, words, and such things. And so on.
This is about applying semantics inherent in a particular C type, to your source code, and about the implications on the running program.
Also, as others have illustrated, don't shy away from typedef, as it gives you the power to efficiently define your own types, an abstraction facility I personally value. A good program source code may not even expose a single int, nevertheless relying on int aliased behind a multitude of purpose-defined types. I am not going to cover typedef here, the other answers hopefully will.
Keep large numbers that are used to access members of arrays, or control buffers as size_t.
For an example of a project that makes use of size_t, refer to GNU's dd.c, line 155.
Here are a few things I do. Not sure they're for everyone but they work for me.
Never use int or unsigned int directly. There always seems to be a more appropriately named type for the job.
If a variable needs to be a specific width (e.g. for a hardware register or to match a protocol) use a width-specific type (e.g. uint32_t).
For array iterators, where I want to access array elements 0 thru n, this should also be unsigned (no reason to access any index less than 0) and I use one of the fast types (e.g. uint_fast16_t), selecting the type based on the minimum size required to access all array elements. For example, if I have a for loop that will iterate through 24 elements max, I'll use uint_fast8_t and let the compiler (or stdint.h, depending how pedantic we want to get) decide which is the fastest type for that operation.
Always use unsigned variables unless there is a specific reason for them to be signed.
If your unsigned variables and signed variables need to play together, use explicit casts and be aware of the consequences. (Luckily this will be minimized if you avoid using signed variables except where absolutely necessary.)
If you disagree with any of those or have recommended alternatives please let me know in the comments! That's the life of a software developer... we keep learning or we become irrelevant.
Always.
Unless you have specific reasons for using a more specific type, including you're on a 16-bit platform and need integers greater than 32767, or you need to ensure proper byte order and signage for data exchange over a network or in a file (and unless you're resource constrained, consider transferring data in "plain text," meaning ASCII or UTF8 if you prefer).
My experience has shown that "just use 'int'" is a good maxim to live by and makes it possible to turn out working, easily maintained, correct code quickly every time. But your specific situation may differ, so take this advice with a bit of well-deserved scrutiny.
Most of the time, using int is not ideal. The main reason is that int is signed and signed can cause UB, signed integers can also be negative, something that you don't need for most integers. Prefer unsigned integers. Secondly, data types reflect meaning and a, very limited, way to document the used range and values this variable may have. If you use int, you imply that you expect this variable to sometimes hold negative values, that this values probably do not always fit into 8 bit but always fit into INT_MAX, which can be as low as 32767. Do not assume a int is 32 bit.
Always, think about the possible values of a variable and choose the type accordingly. I use the following rules:
Use unsigned integers except when you need to be able to handle negative numbers.
If you want to index an array, from the start, use size_t except when there are good reasons not to. Almost never use int for it, a int can be too small and there is a high chance of creating a UB bug that isn't found during testing because you never tested arrays large enough.
Same for array sizes and sizes of other object, prefer size_t.
If you need to index array with negative index, which you may need for image processing, prefer ptrdiff_t. But be aware, ptrdiff_t can be too small, but that is rare.
If you have arrays that never exceed a certain size, you may use uint_fastN_t, uintN_t, or uint_leastN_t types. This can make a lot of sense especially on a 8 bit microcontroller.
Sometimes, unsigned int can be used instead of uint_fast16_t, similarly int for int_fast16_t.
To handle the value of a single byte (or character, but this is not a real character because of UTF-8 and Unicode sometimes using more than one code pointer per character), use int. int can store -1 if you need an indicator for error or not set and a character literal is of type int. (This is true for C, for C++ you may use a different strategy). There is the extremely rare possibility that a machine uses sizeof(int)==1 && CHAR_MIN==0 where a byte can not be handled with a int, but i never saw such a machine.
It can make sense to define your own types for different purposes.
Use explicit cast where casts are needed. This way the code is well defined and has the least amount of unexpected behaviour.
After a certain size, a project needs a list/enum of the native integer data types. You can use macros with the _Generic expression from C11, that only needs to handle bool, signed char, short, int, long, long long and their unsigned counterparts to get the underlying native type from a typedefed one. This way your parsers and similar parts only need to handle 11 integer types and not 56 standard integer (if i counted correctly), and a bunch of other non-standard types.

What is the point behind unions in C?

I'm going through O'Reilly's Practical C Programming book, and having read the K&R book on the C programming language, and I am really having trouble grasping the concept behind unions.
They take the size of the largest data type that makes them up...and the most recently assigned one overwrites the rest...but why not just use / free memory as needed?
The book mentions that it's used in communication, where you need to set flags of the same size; and on a googled website, that it can eliminate odd-sized memory chunks...but is it of any use in a modern, non-embedded memory space?
Is there something crafty you can do with it and CPU registers? Is it simply a hold over from an earlier era of programming? Or does it, like the infamous goto, still have some powerful use (possibly in tight memory spaces) that makes it worth keeping around?
Well, you almost answered your question: Memory.
Back in the days memory was rather low, and even saving a few kbytes has been useful.
But even today there are scenarios where unions would be useful. For example, if you'd like to implement some kind of variant datatype. The best way to do this is using a union.
This doesn't sound like much, but let's just assume you want to use a variable either storing a 4 character string (like an ID) or a 4 byte number (which could be some hash or indeed just a number).
If you use a classic struct, this would be 8 bytes long (at least, if you're unlucky there are filling bytes as well). Using an union it's only 4 bytes. So you're saving 50% memory, which isn't a lot for one instance, but imagine having a million of these.
While you can achieve similar things by casting or subclassing a union is still the easiest way to do this.
One use of unions is having two variables occupy the same space, and a second variable in the struct decide what data type you want to read it as.
e.g. you could have a boolean 'isDouble', and a union 'doubleOrLong' which has both a double and a long. If isDouble == true interpret the union as a double else interpret it as a long.
Another use of unions is accessing data types in different representations. For instance, if you know how a double is laid out in memory, you could put a double in a union, access it as a different data type like a long, directly access its bits, its mantissa, its sign, its exponent, whatever, and do some direct manipulation with it.
You don't really need this nowadays since memory is so cheap, but in embedded systems it has its uses.
The Windows API makes use of unions quite a lot. LARGE_INTEGER is an example of such a usage. Basically, if the compiler supports 64-bit integers, use the QuadPart member; otherwise, set the low DWORD and the high DWORD manually.
It's not really a hold over, as the C language was created in 1972, when memory was a real concern.
You could make the argument that in modern, non-embedded space, you might not want to use C as a programming language to begin with. If you've chosen C as your language choice for implementation, you're looking to harness the benefits of C: it's efficient, close-to-metal, which results in tight, fast binaries.
As such, when choosing to use C, you'd still want to take advantage of it's benefits, which includes memory-space efficiency. To which, the Union works very well; allowing you to have some degree of type safety, while enforcing the smallest memory foot print available.
One place where I have seen it used is in the Doom 3/idTech 4 Fast Inverse Square Root implementation.
For those unfamiliar with this algorithm, it essentially requires treating a floating point number as an integer. The old Quake (and earlier) version of the code does this by the following:
float y = 2.0f;
// treat the bits of y as an integer
long i = * ( long * ) &y;
// do some stuff with i
// treat the bits of i as a float
y = * ( float * ) &i;
original source on GitHub
This code takes the address of a floating point number y, casts it to a pointer to a long (ie, a 32 bit integer in Quake days), and derefences it into i. Then it does some incredibly bizarre bit-twiddling stuff, and the reverse.
There are two disadvantages of doing it this way. One is that the convoluted address-of, cast, dereference process forces the value of y to be read from memory, rather than from a register1, and ditto on the way back. On Quake-era computers, however, floating point and integer registers were completely separate so you pretty much had to push to memory and back to deal with this restriction.
The second is that, at least in C++, doing such casting is deeply frowned upon, even when doing what amounts to voodoo such as this function does. I'm sure there are more compelling arguments, however I'm not sure what they are :)
So, in Doom 3, id included the following bit in their new implementation (which uses a different set of bit twiddling, but a similar idea):
union _flint {
dword i;
float f;
};
...
union _flint seed;
seed.i = /* look up some tables to get this */;
double r = seed.f; // <- access the bits of seed.i as a floating point number
original source on GitHub
Theoretically, on an SSE2 machine, this can be accessed through a single register; I'm not sure in practice whether any compiler would do this. It's still somewhat cleaner code in my opinion than the casting games in the earlier Quake version.
1 - ignoring "sufficiently advanced compiler" arguments

size_t vs. uintptr_t

The C standard guarantees that size_t is a type that can hold any array index. This means that, logically, size_t should be able to hold any pointer type. I've read on some sites that I found on the Googles that this is legal and/or should always work:
void *v = malloc(10);
size_t s = (size_t) v;
So then in C99, the standard introduced the intptr_t and uintptr_t types, which are signed and unsigned types guaranteed to be able to hold pointers:
uintptr_t p = (size_t) v;
So what is the difference between using size_t and uintptr_t? Both are unsigned, and both should be able to hold any pointer type, so they seem functionally identical. Is there any real compelling reason to use uintptr_t (or better yet, a void *) rather than a size_t, other than clarity? In an opaque structure, where the field will be handled only by internal functions, is there any reason not to do this?
By the same token, ptrdiff_t has been a signed type capable of holding pointer differences, and therefore capable of holding most any pointer, so how is it distinct from intptr_t?
Aren't all of these types basically serving trivially different versions of the same function? If not, why? What can't I do with one of them that I can't do with another? If so, why did C99 add two essentially superfluous types to the language?
I'm willing to disregard function pointers, as they don't apply to the current problem, but feel free to mention them, as I have a sneaking suspicion they will be central to the "correct" answer.
size_t is a type that can hold any array index. This means that,
logically, size_t should be able to
hold any pointer type
Not necessarily! Hark back to the days of segmented 16-bit architectures for example: an array might be limited to a single segment (so a 16-bit size_t would do) BUT you could have multiple segments (so a 32-bit intptr_t type would be needed to pick the segment as well as the offset within it). I know these things sound weird in these days of uniformly addressable unsegmented architectures, but the standard MUST cater for a wider variety than "what's normal in 2009", you know!-)
Regarding your statement:
"The C standard guarantees that size_t is a type that can hold any array index. This means that, logically, size_t should be able to hold any pointer type."
This is actually a fallacy (a misconception resulting from incorrect reasoning)(a). You may think the latter follows from the former but that's not actually the case.
Pointers and array indexes are not the same thing. It's quite plausible to envisage a conforming implementation that limits arrays to 65536 elements but allows pointers to address any value into a massive 128-bit address space.
C99 states that the upper limit of a size_t variable is defined by SIZE_MAX and this can be as low as 65535 (see C99 TR3, 7.18.3, unchanged in C11). Pointers would be fairly limited if they were restricted to this range in modern systems.
In practice, you'll probably find that your assumption holds, but that's not because the standard guarantees it. Because it actually doesn't guarantee it.
(a) This is not some form of personal attack by the way, just stating why your statements are erroneous in the context of critical thinking. For example, the following reasoning is also invalid:
All puppies are cute. This thing is cute. Therefore this thing must be a puppy.
The cuteness or otherwise of puppiess has no bearing here, all I'm stating is that the two facts do not lead to the conclusion, because the first two sentences allow for the existance of cute things that are not puppies.
This is similar to your first statement not necessarily mandating the second.
I'll let all the other answers stand for themselves regarding the reasoning with segment limitations, exotic architectures, and so on.
Isn't the simple difference in names reason enough to use the proper type for the proper thing?
If you're storing a size, use size_t. If you're storing a pointer, use intptr_t. A person reading your code will instantly know that "aha, this is a size of something, probably in bytes", and "oh, here's a pointer value being stored as an integer, for some reason".
Otherwise, you could just use unsigned long (or, in these here modern times, unsigned long long) for everything. Size is not everything, type names carry meaning which is useful since it helps describe the program.
It's possible that the size of the largest array is smaller than a pointer. Think of segmented architectures - pointers may be 32-bits, but a single segment may be able to address only 64KB (for example the old real-mode 8086 architecture).
While these aren't commonly in use in desktop machines anymore, the C standard is intended to support even small, specialized architectures. There are still embedded systems being developed with 8 or 16 bit CPUs for example.
I would imagine (and this goes for all type names) that it better conveys your intentions in code.
For example, even though unsigned short and wchar_t are the same size on Windows (I think), using wchar_t instead of unsigned short shows the intention that you will use it to store a wide character, rather than just some arbitrary number.
Looking both backwards and forwards, and recalling that various oddball architectures were scattered about the landscape, I'm pretty sure they were trying to wrap all existing systems and also provide for all possible future systems.
So sure, the way things settled out, we have so far needed not so many types.
But even in LP64, a rather common paradigm, we needed size_t and ssize_t for the system call interface. One can imagine a more constrained legacy or future system, where using a full 64-bit type is expensive and they might want to punt on I/O ops larger than 4GB but still have 64-bit pointers.
I think you have to wonder: what might have been developed, what might come in the future. (Perhaps 128-bit distributed-system internet-wide pointers, but no more than 64 bits in a system call, or perhaps even a "legacy" 32-bit limit. :-) Image that legacy systems might get new C compilers...
Also, look at what existed around then. Besides the zillion 286 real-mode memory models, how about the CDC 60-bit word / 18-bit pointer mainframes? How about the Cray series? Never mind normal ILP64, LP64, LLP64. (I always thought microsoft was pretensious with LLP64, it should have been P64.) I can certainly imagine a committee trying to cover all bases...
size_t vs. uintptr_t
In addition to other good answers:
size_t is defined in <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <uchar.h>, <wchar.h>. It is at least 16-bit.
uintptr_t is defined in <stdint.h>. It is optional. A compliant library might not define it, likely because there is not a wide-enough integer type to round trip a void*-uintptr_t-void *.
Both are unsigned integer types.
Note: the optional companion intptr_t is a signed integer type.
int main(){
int a[4]={0,1,5,3};
int a0 = a[0];
int a1 = *(a+1);
int a2 = *(2+a);
int a3 = 3[a];
return a2;
}
Implying that intptr_t must always substitute for size_t and visa versa.

Smart typedefs

I've always used typedef in embedded programming to avoid common mistakes:
int8_t - 8 bit signed integer
int16_t - 16 bit signed integer
int32_t - 32 bit signed integer
uint8_t - 8 bit unsigned integer
uint16_t - 16 bit unsigned integer
uint32_t - 32 bit unsigned integer
The recent embedded muse (issue 177, not on the website yet) introduced me to the idea that it's useful to have some performance specific typedefs. This standard suggests having typedefs that indicate you want the fastest type that has a minimum size.
For instance, one might declare a variable using int_fast16_t, but it would actually be implemented as an int32_t on a 32 bit processor, or int64_t on a 64 bit processor as those would be the fastest types of at least 16 bits on those platforms. On an 8 bit processor it would be int16_t bits to meet the minimum size requirement.
Having never seen this usage before I wanted to know
Have you seen this in any projects, embedded or otherwise?
Any possible reasons to avoid this sort of optimization in typedefs?
For instance, one might declare a
variable using int_fast16_t, but it
would actually be implemented as an
int32_t on a 32 bit processor, or
int64_t on a 64 bit processor as those
would be the fastest types of at least
16 bits on those platforms
That's what int is for, isn't it? Are you likely to encounter an 8-bit CPU any time soon, where that wouldn't suffice?
How many unique datatypes are you able to remember?
Does it provide so much additional benefit that it's worth effectively doubling the number of types to consider whenever I create a simple integer variable?
I'm having a hard time even imagining the possibility that it might be used consistently.
Someone is going to write a function which returns a int16fast_t, and then someone else is going to come along and store that variable into an int16_t.
Which means that in the obscure case where the fast variants are actually beneficial, it may change the behavior of your code. It may even cause compiler errors or warnings.
Check out stdint.h from C99.
The main reason I would avoid this typedef is that it allows the type to lie to the user. Take int16_t vs int_fast16_t. Both type names encode the size of the value into the name. This is not an uncommon practice in C/C++. I personally use the size specific typedefs to avoid confusion for myself and other people reading my code. Much of our code has to run on both 32 and 64 bit platforms and many people don't know the various sizing rules between the platforms. Types like int32_t eliminate the ambiguity.
If I had not read the 4th paragraph of your question and instead just saw the type name, I would have assumed it was some scenario specific way of having a fast 16 bit value. And I obviously would have been wrong :(. For me it would violate the "don't surprise people" rule of programming.
Perhaps if it had another distinguishing verb, letter, acronym in the name it would be less likely to confuse users. Maybe int_fast16min_t ?
When I am looking at int_fast16_t, and I am not sure about the native width of the CPU in which it will run, it may make things complicated, for example the ~ operator.
int_fast16_t i = 10;
int_16_t j = 10;
if (~i != ~j) {
// scary !!!
}
Somehow, I would like to willfully use 32 bit or 64 bit based on the native width of the processor.
I'm actually not much of a fan of this sort of thing.
I've seen this done many times (in fact, we even have these typedefs at my current place of employment)... For the most part, I doubt their true usefulness... It strikes me as change for changes sake... (and yes, I know the sizes of some of the built ins can vary)...
I commonly use size_t, it happens to be the fastest address size, a tradition I picked up in embedding. And it never caused any issues or confusion in embedded circles, but it actually began causing me problems when I began working on 64bit systems.

Resources