Combining a pointer and an integer in a union - c

I have defined a union as follows:
union {
uintptr_t refcount;
struct slab_header *page;
} u;
The page pointer is guaranteed to be aligned on a page boundary (most probably 4096), and is never going to be NULL. This implies that the lowest possible address is going to be 4096.
refcount will be within 0 .. 4095.
Upon the creation of the enclosing struct, I can either have u.refcount = 0 or u.page = mmap(...).
The code around this union is going to be something like that:
if (u.refcount < 4096) {
/* work with refcount, possibly increment it */
} else {
/* work with page, possibly dereference it */
}
Is this always guaranteed to work on a fully POSIX-compliant implementation? Is it ever possible that uintptr_t and struct slab_header * have different representations, so that, for example, when u.page == 8192, u.refcount < 4096 yields true?

I don't think that it's "always guaranteed to work", because:
uintptr_t is optional (7.18.1.4).
A void * can be converted to uintptr_t and back (7.18.1.4). It's not guaranteed that it is the case with struct slab_header*. A void * has the same representation and alignment requirements as a pointer to a character type. Pointers to structures needn't have the same representation or alignment (6.2.5 27). Even if this was not the case, nothing guarantees sizeof(uintptr_t) == sizeof(void *), it could obviously be larger and still satisfy the requirement of being convertible to void * in the typical case of homogeneous pointers.
Finally, even if they have the same size and are convertible, it's possible the representation of the pointer values differs in a strange way from that of unsigned integers. The representation of unsigned integers is relatively constrained (6.2.6.2 1), but no such constraints exist on pointers.
Therefore, I'd conclude the best way would be to have a common initial elements that tells the state.

I'm going to answer a different question -- "is this a good idea?" The more significant worry I see from your code is aliasing issues. I would be unsurprised (in fact, I would mildly expect it) if it were possible to write a piece of code that had the effect of
Write something to u.refcount
Write something to u.page
Read u.refcount
Discover the value you read is the same as what you first wrote
You may scoff, but I've seen this happen -- and if you don't appreciate the danger, it will take a very long time to debug.
You may be safe with this particular union; I'll leave it to someone with more experience with this sort of thing (and/or a copy of the C standard handy) to make that judgment. But I want to emphasize that this is something important to worry about!!!
There is another cost. Using the union in combination with a "magic" test to discover what is stored in it -- especially one using system-specific facts -- is going to make your code harder to read, debug, and maintain. You can take steps to mitigate that fact, but it will always be an issue.
And, of course, the cost of your code simply not working when someone tries to use it on a weird machine.
The right solution is probably to structure your code in a way so that the only code that cares about the data layout is a couple tiny inlined access routines, so that you can easily swap how to store things. Organize your code to use a compile-time flag to choose which one!
Now that I've said all that, the question I really want to ask (and want to get you in the habit of considering): "is it worth it?" You're making sacrifices in your time, code readability, ease of programming, and portability. What are you gaining?
A lot of people forget to ask this question, and they write complex buggy code when simplistic, easy to write and read code is just as good.
If you've discovered this change is going to double the performance of a time-intensive routine, then it's probably worth dealing with this hack. But if you're investigating this simply because it's a clever trick to save 8 bytes, then you should consider it only as an intellectual exercise.

I think that <stdint.h> and C99 standard guarantee that uintptr_t and void* has the same size and can be casted without loss.
Being page aligned for a pointer is an implementation detail.

there should be any problem with that code. But for any case you can have in the init of your program a check:
if (sizeof(uintptr_t) != sizeof (struct slab_header*)
{
print error
}
but something seems off (or not clear). you want the "refcount" to be 0 when ever you have a page. and not zero when the page is NULL ?

Is this always guaranteed to work?
No. You can't read a member of an union different than the last one written. And optimizers do take that into account, so that isn't a theoretical problem.
Is it ever possible that uintptr_t and struct slab_header * have different representations,
so that, for example, when u.page == 8192, u.refcount < 4096 yields true?
It is also theoretically possible. I can't think of a current implementation where it occurs.

Related

Comparison uint8_t vs uint16_t while declaring a counter

Assuming to have a counter which counts from 0 to 100, is there an advantage of declaring the counter variable as uint16_t instead of uint8_t.
Obviously if I use uint8_t I could save some space. On a processor with natural wordsize of 16 bits access times would be the same for both I guess. I couldn't think why I would use a uint16_t if uint8_t can cover the range.
Using a wider type than necessary can allow the compiler to avoid having to mask the higher bits.
Suppose you were working on a 16 bit architecture, then using uint16_t could be more efficient, however if you used uint16_t instead of uint8_t on a 32 bit architecture then you would still have the mask instructions but just masking a different number of bits.
The most efficient type to use in a cross-platform portable way is just plain int or unsigned int, which will always be the correct type to avoid the need for masking instructions, and will always be able to hold numbers up to 100.
If you are in a MISRA or similar regulated environment that forbids the use of native types, then the correct standard-compliant type to use is uint_fast8_t. This guarantees to be the fastest unsigned integer type that has at least 8 bits.
However, all of this is nonsense really. Your primary goal in writing code should be to make it readable, not to make it as fast as possible. Penny-pinching instructions like this makes code convoluted and more likely to have bugs. Also because it is harder to read, the bugs are less likely to be found during code review.
You should only try to optimize like this once the code is finished and you have tested it and found the particular part which is the bottleneck. Masking a loop counter is very unlikely to be the bottleneck in any real code.
Obviously if I use uint8_t I could save some space.
Actually, that's not necessarily obvious! A loop index variable is likely to end up in a register, and if it does there's no memory to be saved. Also, since the definition of the C language says that much arithmetic takes place using type int, it's possible that using a variable smaller than int might actually end up costing you space in terms of extra code emitted by the compiler to convert back and forth between int and your smaller variable. So while it could save you some space, it's not at all guaranteed that it will — and, in any case, the actual savings are going to be almost imperceptibly small in the grand scheme of things.
If you have an array of some number of integers in the range 0-100, using uint8_t is a fine idea if you want to save space. For an individual variable, on the other hand, the arguments are pretty different.
In general, I'd say that there are two reasons not to use type uint8_t (or, equivalently, char or unsigned char) as a loop index:
It's not going to save much data space (if at all), and it might cost code size and/or speed.
If the loop runs over exactly 256 elements (yours didn't, but I'm speaking more generally here), you may have introduced a bug (which you'll discover soon enough): your loop may run forever.
The interviewer was probably expecting #1 as an answer. It's not a guaranteed answer — under plenty of circumstances, using the smaller type won't cost you anything, and evidently there are microprocessors where it can actually save something — but as a general rule, I agree that using an 8-bit type as a loop index is, well, silly. And whether or not you agree, it's certainly an issue to be aware of, so I think it's a fair interview question.
See also this question, which discusses the same sorts of issues.
The interview question doesn't make much sense from a platform-generic point of view. If we look at code such as this:
for(uint8_t i=0; i<n; i++)
array[i] = x;
Then the expression i<n will get carried out on type int or larger because of implicit promotion. Though the compiler may optimize it to use a smaller type if it doesn't affect the result.
As for array[i], the compiler is likely to use a type corresponding to whatever address size the system is using.
What the interviewer was fishing for is likely that uint32_t on a 32 bitter tend to generate faster code in some situations. For those cases you can use uint_fast8_t, but more likely the compiler will perform optimizations no matter.
The only optimization uint8_t blocks the compiler from doing, is to allocate a larger variable than 8 bits on the stack. It doesn't however block the compiler from optimizing out the variable entirely and using a register instead. Such as for example storing it in an index register with the same width as the address bus.
Example with gcc x86_64: https://godbolt.org/z/vYscf3KW9. The disassembly is pretty painful to read, but the compiler just picked CPU registers to store anything regardless of the type of i, giving identical machine code between uint8_t anduint16_t. I would have been surprised if it didn't.
On a processor with natural wordsize of 16 bits access times would be the same for both I guess.
Yes this is true for all mainstream 16 bitters. Some might even manage faster code if given 8 bits instead of 16. Some exotic systems like DSP exist, but in case of lets say a 1 byte=16 bits DSP, then the compiler doesn't even provide you with uint8_t to begin with - it is an optional type. One generally doesn't bother with portability to wildly exotic systems, since doing so is a waste of everyone's time and money.
The correct answer: it is senseless to do manual optimization without a specific system in mind. uint8_t is perfectly fine to use for generic, portable code.

What is the point behind unions in C?

I'm going through O'Reilly's Practical C Programming book, and having read the K&R book on the C programming language, and I am really having trouble grasping the concept behind unions.
They take the size of the largest data type that makes them up...and the most recently assigned one overwrites the rest...but why not just use / free memory as needed?
The book mentions that it's used in communication, where you need to set flags of the same size; and on a googled website, that it can eliminate odd-sized memory chunks...but is it of any use in a modern, non-embedded memory space?
Is there something crafty you can do with it and CPU registers? Is it simply a hold over from an earlier era of programming? Or does it, like the infamous goto, still have some powerful use (possibly in tight memory spaces) that makes it worth keeping around?
Well, you almost answered your question: Memory.
Back in the days memory was rather low, and even saving a few kbytes has been useful.
But even today there are scenarios where unions would be useful. For example, if you'd like to implement some kind of variant datatype. The best way to do this is using a union.
This doesn't sound like much, but let's just assume you want to use a variable either storing a 4 character string (like an ID) or a 4 byte number (which could be some hash or indeed just a number).
If you use a classic struct, this would be 8 bytes long (at least, if you're unlucky there are filling bytes as well). Using an union it's only 4 bytes. So you're saving 50% memory, which isn't a lot for one instance, but imagine having a million of these.
While you can achieve similar things by casting or subclassing a union is still the easiest way to do this.
One use of unions is having two variables occupy the same space, and a second variable in the struct decide what data type you want to read it as.
e.g. you could have a boolean 'isDouble', and a union 'doubleOrLong' which has both a double and a long. If isDouble == true interpret the union as a double else interpret it as a long.
Another use of unions is accessing data types in different representations. For instance, if you know how a double is laid out in memory, you could put a double in a union, access it as a different data type like a long, directly access its bits, its mantissa, its sign, its exponent, whatever, and do some direct manipulation with it.
You don't really need this nowadays since memory is so cheap, but in embedded systems it has its uses.
The Windows API makes use of unions quite a lot. LARGE_INTEGER is an example of such a usage. Basically, if the compiler supports 64-bit integers, use the QuadPart member; otherwise, set the low DWORD and the high DWORD manually.
It's not really a hold over, as the C language was created in 1972, when memory was a real concern.
You could make the argument that in modern, non-embedded space, you might not want to use C as a programming language to begin with. If you've chosen C as your language choice for implementation, you're looking to harness the benefits of C: it's efficient, close-to-metal, which results in tight, fast binaries.
As such, when choosing to use C, you'd still want to take advantage of it's benefits, which includes memory-space efficiency. To which, the Union works very well; allowing you to have some degree of type safety, while enforcing the smallest memory foot print available.
One place where I have seen it used is in the Doom 3/idTech 4 Fast Inverse Square Root implementation.
For those unfamiliar with this algorithm, it essentially requires treating a floating point number as an integer. The old Quake (and earlier) version of the code does this by the following:
float y = 2.0f;
// treat the bits of y as an integer
long i = * ( long * ) &y;
// do some stuff with i
// treat the bits of i as a float
y = * ( float * ) &i;
original source on GitHub
This code takes the address of a floating point number y, casts it to a pointer to a long (ie, a 32 bit integer in Quake days), and derefences it into i. Then it does some incredibly bizarre bit-twiddling stuff, and the reverse.
There are two disadvantages of doing it this way. One is that the convoluted address-of, cast, dereference process forces the value of y to be read from memory, rather than from a register1, and ditto on the way back. On Quake-era computers, however, floating point and integer registers were completely separate so you pretty much had to push to memory and back to deal with this restriction.
The second is that, at least in C++, doing such casting is deeply frowned upon, even when doing what amounts to voodoo such as this function does. I'm sure there are more compelling arguments, however I'm not sure what they are :)
So, in Doom 3, id included the following bit in their new implementation (which uses a different set of bit twiddling, but a similar idea):
union _flint {
dword i;
float f;
};
...
union _flint seed;
seed.i = /* look up some tables to get this */;
double r = seed.f; // <- access the bits of seed.i as a floating point number
original source on GitHub
Theoretically, on an SSE2 machine, this can be accessed through a single register; I'm not sure in practice whether any compiler would do this. It's still somewhat cleaner code in my opinion than the casting games in the earlier Quake version.
1 - ignoring "sufficiently advanced compiler" arguments

Reasons to use (or not) stdint

I already know that stdint is used to when you need specific variable sizes for portability between platforms. I don't really have such an issue for now, but what are the cons and pros of using it besides the already shown fact above?
Looking for this on stackoverflow and others sites, I found 2 links that treats about the theme:
codealias.info - this one talks about the portability of the stdint.
stackoverflow - this one is more specific about uint8_t.
These two links are great specially if one is looking to know more about the main reason of this header - portability. But for me, what I like most about it is that I think uint8_t is cleaner than unsigned char (for storing an RBG channel value for example), int32_t looks more meaningful than simply int, etc.
So, my question is, exactly what are the cons and pros of using stdint besides the portability? Should I use it just in some specifics parts of my code, or everywhere? if everywhere, how can I use functions like atoi(), strtok(), etc. with it?
Thanks!
Pros
Using well-defined types makes the code far easier and safer to port, as you won't get any surprises when for example one machine interprets int as 16-bit and another as 32-bit. With stdint.h, what you type is what you get.
Using int etc also makes it hard to detect dangerous type promotions.
Another advantage is that by using int8_t instead of char, you know that you always get a signed 8 bit variable. char can be signed or unsigned, it is implementation-defined behavior and varies between compilers. Therefore, the default char is plain dangerous to use in code that should be portable.
If you want to give the compiler hints of that a variable should be optimized, you can use the uint_fastx_t which tells the compiler to use the fastest possible integer type, at least as large as 'x'. Most of the time this doesn't matter, the compiler is smart enough to make optimizations on type sizes no matter what you have typed in. Between sequence points, the compiler can implicitly change the type to another one than specified, as long as it doesn't affect the result.
Cons
None.
Reference: MISRA-C:2004 rule 6.3."typedefs that indicate size and signedness shall be used in place of the basic types".
EDIT : Removed incorrect example.
The only reason to use uint8_t rather than unsigned char (aside from aesthetic preference) is if you want to document that your program requires char to be exactly 8 bits. uint8_t exists if and only if CHAR_BIT==8, per the requirements of the C standard.
The rest of the intX_t and uintX_t types are useful in the following situations:
reading/writing disk/network (but then you also have to use endian conversion functions)
when you want unsigned wraparound behavior at an exact cutoff (but this can be done more portably with the & operator).
when you're controlling the exact layout of a struct because you need to ensure no padding exists (e.g. for memcmp or hashing purposes).
On the other hand, the uint_least8_t, etc. types are useful anywhere that you want to avoid using wastefully large or slow types but need to ensure that you can store values of a certain magnitude. For example, while long long is at least 64 bits, it might be 128-bit on some machines, and using it when what you need is just a type that can store 64 bit numbers would be very wasteful on such machines. int_least64_t solves the problem.
I would avoid using the [u]int_fastX_t types entirely since they've sometimes changed on a given machine (breaking the ABI) and since the definitions are usually wrong. For instance, on x86_64, the 64-bit integer type is considered the "fast" one for 16-, 32-, and 64-bit values, but while addition, subtraction, and multiplication are exactly the same speed whether you use 32-bit or 64-bit values, division is almost surely slower with larger-than-necessary types, and even if they were the same speed, you're using twice the memory for no benefit.
Finally, note that the arguments some answers have made about the inefficiency of using int32_t for a counter when it's not the native integer size are technically mostly correct, but it's irrelevant to correct code. Unless you're counting some small number of things where the maximum count is under your control, or some external (not in your program's memory) thing where the count might be astronomical, the correct type for a count is almost always size_t. This is why all the standard C functions use size_t for counts. Don't consider using anything else unless you have a very good reason.
cons
The primary reason the C language does not specify the size of int or long, etc. is for computational efficiency. Each architecture has a natural, most-efficient size, and the designers specifically empowered and intended the compiler implementor to use the natural native data size data for speed and code size efficiency.
In years past, communication with other machines was not a primary concern—most programs were local to the machine—so the predictability of each data type's size was of little concern.
Insisting that a particular architecture use a particular size int to count with is a really bad idea, even though it would seem to make other things easier.
In a way, thanks to XML and its brethren, data type size again is no longer much of a concern. Shipping machine-specific binary structures from machine to machine is again the exception rather than the rule.
I use stdint types for one reason only, when the data I hold in memory shall go on disk/network/descriptor in binary form. You only have to fight the little-endian/big-endian issue but that's relatively easy to overcome.
The obvious reason not to use stdint is when the code is size-independent, in maths terms everything that works over the rational integers. It would produce ugly code duplicates if you provided a uint*_t version of, say, qsort() for every expansion of *.
I use my own types in that case, derived from size_t when I'm lazy or the largest supported unsigned integer on the platform when I'm not.
Edit, because I ran into this issue earlier:
I think it's noteworthy that at least uint8_t, uint32_t and uint64_t are broken in Solaris 2.5.1.
So for maximum portability I still suggest avoiding stdint.h (at least for the next few years).

Are there well-known "profiles" of the C standard?

I write C code that makes certain assumptions about the implementation, such as:
char is 8 bits.
signed integral types are two's complement.
>> on signed integers sign-extends.
integer division rounds negative quotients towards zero.
double is IEEE-754 doubles and can be type-punned to and from uint64_t with the expected result.
comparisons involving NaN always evaluate to false.
a null pointer is all zero bits.
all data pointers have the same representation, and can be converted to size_t and back again without information loss.
pointer arithmetic on char* is the same as ordinary arithmetic on size_t.
functions pointers can be cast to void* and back again without information loss.
Now, all of these are things that the C standard doesn't guarantee, so strictly speaking my code is non-portable. However, they happen to be true on the architectures and ABIs I'm currently targeting, and after careful consideration I've decided that the risk they will fail to hold on some architecture that I'll need to target in the future is acceptably low compared to the pragmatic benefits I derive from making the assumptions now.
The question is: how do I best document this decision? Many of my assumptions are made by practically everyone (non-octet chars? or sign-magnitude integers? on a future, commercially successful, architecture?). Others are more arguable -- the most risky probably being the one about function pointers. But if I just list everything I assume beyond what the standard gives me, the reader's eyes are just going to glaze over, and he may not notice the ones that actually matter.
So, is there some well-known set of assumptions about being a "somewhat orthodox" architecture that I can incorporate by reference, and then only document explicitly where I go beyond even that? (Effectively such a "profile" would define a new language that is a superset of C, but it might not acknowledge that in so many words -- and it may not be a pragmatically useful way to think of it either).
Clarification: I'm looking for a shorthand way to document my choices, not for a way to test automatically whether a given compiler matches my expectations. The latter is obviously useful too, but does not solve everything. For example, if a business partner contacts us saying, "we're making a device based on Google's new G2015 chip; will your software run on it?" -- then it would be nice to be able to answer "we haven't worked with that arch yet, but it shouldn't be a problem if it has a C compiler that satisfies such-and-such".
Clarify even more since somebody has voted to close as "not constructive": I'm not looking for discussion here, just for pointers to actual, existing, formal documents that can simplify my documentation by being incorporated by reference.
I would introduce a STATIC_ASSERT macro and put all your assumptions in such asserts.
Unfortunately, not only is there a lack of standards for a dialect of C that combines the extensions which have emerged as de facto standards during the 1990s (two's-complement, universally-ranked pointers, etc.) but compilers trends are moving in the opposite direction. Given the following requirements for a function:
* Accept int parameters x,y,z:
* Return 0 if x-y is computable as "int" and is less than Z
* Return 1 if x-y is computable as "int" and is not less than Z
* Return 0 or 1 if x-y is not computable */
The vast majority of compilers in the 1990s would have allowed:
int diffCompare(int x, int y, int z)
{ return (x-y) >= z; }
On some platforms, in cases where the difference between x-y was not computable as int, it would be faster to compute a "wrapped" two's-complement value of x-y and compare that, while on others it would be faster to perform the calculation using a type larger than int and compare that. By the late 1990s, however, nearly every C compiler would implement the above code to use one of whichever one of those approaches would have been more efficient on its hardware platform.
Since 2010, however, compiler writers seem to have taken the attitude that if computations overflow, compilers shouldn't perform the calculations in whatever fashion is normal for their platform and let what happens happens, nor should they recognizably trap (which would break some code, but could prevent certain kinds of errant program behavior), but instead they should overflows as an excuse to negate laws of time and causality. Consequently, even if a programmer would have been perfectly happy with any behavior a 1990s compiler would have produced, the programmer must replace the code with something like:
{ return ((long)x-y) >= z; }
which would greatly reduce efficiency on many platforms, or
{ return x+(INT_MAX+1U)-y >= z+(INT_MAX+1U); }
which requires specifying a bunch of calculations the programmer doesn't actually want in the hopes that the optimizer will omit them (using signed comparison to make them unnecessary), and would reduce efficiency on a number of platforms (especially DSPs) where the form using (long) would have been more efficient.
It would be helpful if there were standard profiles which would allow programmers to avoid the need for nasty horrible kludges like the above using INT_MAX+1U, but if trends continue they will become more and more necessary.
Most compiler documentation includes a section that describes the specific behavior of implementation-dependent features. Can you point to that section of the gcc or msvc docs to describe your assumptions?
You can write a header file "document.h" where you collect all your assumptions.
Then, in every file that you know that non-standard assumptions are made, you can #include such a file.
Perhaps "document.h" would not have real sentences at all, but only commented text and some macros.
// [T] DOCUMENT.H
//
#ifndef DOCUMENT_H
#define DOCUMENT_H
// [S] 1. Basic assumptions.
//
// If this file is included in a compilation unit it means that
// the following assumptions are made:
// [1] A char has 8 bits.
// [#]
#define MY_CHARBITSIZE 8
// [2] IEEE 754 doubles are addopted for type: double.
// ........
// [S] 2. Detailed information
//
#endif
The tags in brackets: [T] [S] [#] [1] [2] stand for:
* [T]: Document Title
* [S]: Section
* [#]: Print the following (non-commented) lines as a code-block.
* [1], [2]: Numbered items of a list.
Now, the idea here is to use the file "document.h" in a different way:
To parse the file in order to convert the comments in "document.h" to some printable document, or some basic HTML.
Thus, the tags [T] [S] [#] etc., are intended to be interpreted by a parser that convert any comment into an HTML line of text (for example), and generate <h1></h1>, <b></b> (or whatever you want), when a tag appears.
If you keep the parser as a simple and small program, this can give you a short hand to handle this kind of documentation.

size_t vs. uintptr_t

The C standard guarantees that size_t is a type that can hold any array index. This means that, logically, size_t should be able to hold any pointer type. I've read on some sites that I found on the Googles that this is legal and/or should always work:
void *v = malloc(10);
size_t s = (size_t) v;
So then in C99, the standard introduced the intptr_t and uintptr_t types, which are signed and unsigned types guaranteed to be able to hold pointers:
uintptr_t p = (size_t) v;
So what is the difference between using size_t and uintptr_t? Both are unsigned, and both should be able to hold any pointer type, so they seem functionally identical. Is there any real compelling reason to use uintptr_t (or better yet, a void *) rather than a size_t, other than clarity? In an opaque structure, where the field will be handled only by internal functions, is there any reason not to do this?
By the same token, ptrdiff_t has been a signed type capable of holding pointer differences, and therefore capable of holding most any pointer, so how is it distinct from intptr_t?
Aren't all of these types basically serving trivially different versions of the same function? If not, why? What can't I do with one of them that I can't do with another? If so, why did C99 add two essentially superfluous types to the language?
I'm willing to disregard function pointers, as they don't apply to the current problem, but feel free to mention them, as I have a sneaking suspicion they will be central to the "correct" answer.
size_t is a type that can hold any array index. This means that,
logically, size_t should be able to
hold any pointer type
Not necessarily! Hark back to the days of segmented 16-bit architectures for example: an array might be limited to a single segment (so a 16-bit size_t would do) BUT you could have multiple segments (so a 32-bit intptr_t type would be needed to pick the segment as well as the offset within it). I know these things sound weird in these days of uniformly addressable unsegmented architectures, but the standard MUST cater for a wider variety than "what's normal in 2009", you know!-)
Regarding your statement:
"The C standard guarantees that size_t is a type that can hold any array index. This means that, logically, size_t should be able to hold any pointer type."
This is actually a fallacy (a misconception resulting from incorrect reasoning)(a). You may think the latter follows from the former but that's not actually the case.
Pointers and array indexes are not the same thing. It's quite plausible to envisage a conforming implementation that limits arrays to 65536 elements but allows pointers to address any value into a massive 128-bit address space.
C99 states that the upper limit of a size_t variable is defined by SIZE_MAX and this can be as low as 65535 (see C99 TR3, 7.18.3, unchanged in C11). Pointers would be fairly limited if they were restricted to this range in modern systems.
In practice, you'll probably find that your assumption holds, but that's not because the standard guarantees it. Because it actually doesn't guarantee it.
(a) This is not some form of personal attack by the way, just stating why your statements are erroneous in the context of critical thinking. For example, the following reasoning is also invalid:
All puppies are cute. This thing is cute. Therefore this thing must be a puppy.
The cuteness or otherwise of puppiess has no bearing here, all I'm stating is that the two facts do not lead to the conclusion, because the first two sentences allow for the existance of cute things that are not puppies.
This is similar to your first statement not necessarily mandating the second.
I'll let all the other answers stand for themselves regarding the reasoning with segment limitations, exotic architectures, and so on.
Isn't the simple difference in names reason enough to use the proper type for the proper thing?
If you're storing a size, use size_t. If you're storing a pointer, use intptr_t. A person reading your code will instantly know that "aha, this is a size of something, probably in bytes", and "oh, here's a pointer value being stored as an integer, for some reason".
Otherwise, you could just use unsigned long (or, in these here modern times, unsigned long long) for everything. Size is not everything, type names carry meaning which is useful since it helps describe the program.
It's possible that the size of the largest array is smaller than a pointer. Think of segmented architectures - pointers may be 32-bits, but a single segment may be able to address only 64KB (for example the old real-mode 8086 architecture).
While these aren't commonly in use in desktop machines anymore, the C standard is intended to support even small, specialized architectures. There are still embedded systems being developed with 8 or 16 bit CPUs for example.
I would imagine (and this goes for all type names) that it better conveys your intentions in code.
For example, even though unsigned short and wchar_t are the same size on Windows (I think), using wchar_t instead of unsigned short shows the intention that you will use it to store a wide character, rather than just some arbitrary number.
Looking both backwards and forwards, and recalling that various oddball architectures were scattered about the landscape, I'm pretty sure they were trying to wrap all existing systems and also provide for all possible future systems.
So sure, the way things settled out, we have so far needed not so many types.
But even in LP64, a rather common paradigm, we needed size_t and ssize_t for the system call interface. One can imagine a more constrained legacy or future system, where using a full 64-bit type is expensive and they might want to punt on I/O ops larger than 4GB but still have 64-bit pointers.
I think you have to wonder: what might have been developed, what might come in the future. (Perhaps 128-bit distributed-system internet-wide pointers, but no more than 64 bits in a system call, or perhaps even a "legacy" 32-bit limit. :-) Image that legacy systems might get new C compilers...
Also, look at what existed around then. Besides the zillion 286 real-mode memory models, how about the CDC 60-bit word / 18-bit pointer mainframes? How about the Cray series? Never mind normal ILP64, LP64, LLP64. (I always thought microsoft was pretensious with LLP64, it should have been P64.) I can certainly imagine a committee trying to cover all bases...
size_t vs. uintptr_t
In addition to other good answers:
size_t is defined in <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <uchar.h>, <wchar.h>. It is at least 16-bit.
uintptr_t is defined in <stdint.h>. It is optional. A compliant library might not define it, likely because there is not a wide-enough integer type to round trip a void*-uintptr_t-void *.
Both are unsigned integer types.
Note: the optional companion intptr_t is a signed integer type.
int main(){
int a[4]={0,1,5,3};
int a0 = a[0];
int a1 = *(a+1);
int a2 = *(2+a);
int a3 = 3[a];
return a2;
}
Implying that intptr_t must always substitute for size_t and visa versa.

Resources