Can a C primitive be wider than a void*? - c

I know that void* can hold any pointer but I wonder also if with casting it can hold anything besides an array or a struct, i.e., primitives and unions. Is this a guarantee with C (and if so which version) or just implementation-specific or not even a thing?

I wonder also if with casting it can hold anything besides an array or a struct, i.e., primitives and unions. Is this a guarantee with C [...]?
No, it is not. As a counterexample, pointers are typically 32 bits wide in C implementations for 32-bit CPUs,* but C requires type long long int to be at least 64 bits wide.** C places no upper bound on the size of any arithmetic type and no (explicit) lower bound on the size of any pointer type, so in principle, even short might be wider than void *.
*For example, GCC uses the pointer representation specified by the target application binary interface (ABI). On x86 Linux, that means the x86 System V ABI, which defines the size of an address as 4 bytes.
** Per paragraph 5.2.4.2.1/1 of the current C language standard, the maximum value representable by an object of type long long int must be at least 9223372036854775807, which is 263 - 1. If you include the sign bit, too, then it follows that a long long int requires at least 64 bits.

Related

Size of Pointer Variables

Considering pch, pshort, pdouble declared as pointers to char ,short int ,double respectively, what would be the arrangement if the three variables were arranged according to their size ?
The facetious answer is you don't know. char, short, and double could all be the same size, and char*, short*, and double* could all be different sizes!
sizeof(char) is 1 by the standard. You can't have anything smaller than that, so it makes sense to put char first.
But short int could be the same size as a long int: the standard only specifies minimum ranges. And either could be larger than a double.
Normally a double weighs in at 64 bit, and a short 16 or 32 bits.
The parsimonious answer is char, short, double.
As for pointers, the standard allows sizeof(char*), sizeof(short*), and sizeof(double*) to all differ.
Pointers to different types may have different sizes, although on most modern platforms they are all the same size (32 bits on x86, 64 bits on x86_64).
The requirements1 are:
Pointers to char and pointers to void have the same size and representation;
Pointers to struct types all have the same size and representation;
Pointers to union types all have the same size and representation;
Pointers to qualified and unqualified versions of compatible types have the same size and representation (i.e., sizeof (int*) == sizeof (const int *))
Pointers to all other types may have different sizes.
C 2011 Online Draft, section 6.2.5, para 28.
Pointer variables are usualy 64 bits on machines that can do 64 bit arthmetic, because that allows the machine to handle more than 2^32(4 billion, or 4 gigabytes) of RAM. In machines that can't handle 64 bit arithmetic, long pointers, which are slower and actually two poimters put together, that treat each maximum amount of RAM as an index in an array, must be used. Consequentially, long pointers are slower than regular pointer.
EDIT
Bethsheba reminded me, in a comment below that the pointer will usually not be 64 bits when in a register, but will usually be padded to 64 bits in memory for performance reasons (so, treat it as 64 bits).

Fixed-sized pointer type in C99

I want to create a type to store pointers. The type should be compatible with C99 and have a fixed-width of 64 bits. I came up with several alternatives but they all seem flawed:
Using uint64_t is incorrect since conversions between pointers and integers are implementation-defined [C99 standard, 6.3.2.3].
uinptr_t also appears to be out of the picture, since the width of this type is not fixed and the type is optional anyway [7.18.1.4].
Using a struct such as
struct {
#ifdef __LP64__
void* ptr;
#else
// if big endian the following two fields need to be flipped
void* ptr;
uint32_t padding;
#endif
} fixed_ptr_type;
does not work either because the size of a pointer is not fixed even within the same implementation
Is there any C99-compatible definition of the type I'm looking for?
Object pointers
The best type to store object pointers is void *. Any object pointer can be converted to void * and back again.
Function pointers
A void * cannot necessarily store a function pointer. However, any function pointer can be converted to another type of function pointer, so you could store them in some arbitrary type (such as void (*)(void)).
Padding
I have no idea why you would need your pointer type to have a predetermined size, but you could pad them by using a union and hope that the result is not too large:
union fixed_ptr_type {
void *p;
char c[64/CHAR_BIT];
};
assert (CHAR_BIT * sizeof (union fixed_ptr_type) == 64);
I don't understand your objection to using the void * with padding. All objects of the same type have the same size. If a different object pointer type has a different size, that doesn't matter, because you convert it to void * to store it in your super-pointer.
Regarding uintptr_t: If it is not supported , then chances are that it's because there is actually no way of doing this on the particular platform.
So you could use uintptr_t. To add in the fixed-width requirement, you could cast to uintptr_t then to uint64_t (if you're happy with knowing you'll have to change your code when someone puts out a system that has pointers greater than 64bits!)
You cannot portably store pointer values in a 64-bit type. It's perfectly legal for an implementation to use 128-bit pointers.
If you don't mind losing portability to systems with pointers bigger than 64 bits, you can probably get away with using uint64_t. Conversions from pointer types to uint64_t are not guaranteed to work correctly without losing information, but they will almost certainly do so on any reasonable systems where pointers are no wider than 64 bits.
If an implementation has no 64-bit unsigned integer type without padding bits, then it will not define uint64_t at all (for example, a system with 9-bit bytes would not be able to implement uint64_t). There's a type uint_least64_t that's guaranteed, as the name implies, to be at least 64 bits wide; it will be exactly 64 bits on most systems, and wider than 64 bits only on systems where uint64_t doesn't exist.
uintptr_t is guaranteed to hold a converted void* value without loss of information, but it's not guaranteed to exist -- and if it doesn't exist, then no integer type can hold a converted void* value without loss of information. A conforming implementation needn't necessarily have any integer type that can hold a pointer value without loss of information.
Function pointers are another matter. Conversion from a function pointer to void*, or to any integer type, has undefined behavior (because the standard doesn't say what the behavior should be).
There simply is no 100% portable way to do what you're trying to do. You'll just have to settle for 99.9% portability. If you're not concerned with function pointers, I'd suggest using uint64_t (perhaps defining your own typedef to make it clear what you're doing) and add a compile-time or run-time check to confirm that sizeof (void*) <= sizeof (uint64_t). That should cover every existing implementation that I've ever heard of.
It might be helpful to know what your actual goal is. Why do you want to store pointers in no more or less than 64 bits? What problem does this solve that storing them in void* objects doesn't solve?
Incidentally, the __LP64__ macro that you mention in your question is non-standard.

Is sizeof(int) guaranteed to equal sizeof(void*)

Is the size of the datatype "int" always equals to the size of a pointer in the c language?
I'm just curious.
Not at all, there is no guarantee that sizeof(int) == sizeof(void*). And on Linux/AMD64 sizeof(int) is 4 bytes, and sizeof(void*) is 8 bytes (same as sizeof(long) on that platform).
Recent C standard (e.g. C99) defines a standard header <stdint.h> which should define, among others, an integral type intptr_t which is guaranteed to have the size of pointers (and probably even which is reversably castable to and from pointers).
I think that the standard does not guarantee that all pointers have the same size, in particular pointer to functions can be "bigger" than data pointers (I cannot name a platform where it is true). I believe that recent Posix standard requires that (e.g. for dlsym(3)).
See also this C reference and the n1570 draft C11 standard (or better)
PS. In 2021 I cannot name a common platform with sizeof(long) != sizeof(void*). But in the previous century the old intel 286 could have been such a platform.
No. for example, in most 64bit systems, int is 4 bytes, and void* is 8.
It is not guaranteed.
And for example, in most 64-bit systems both sizes are usually different.
Even sizeof (int *) is not guranteed to be equal to sizeof (void *).
The only guarantee for void * size is
sizeof (void *) == sizeof (char *)
== sizeof (signed char *) == sizeof (unsigned char *)
No. Some (mostly older, VAX-era) code assumes this, but it's definitely not required, and assuming it is not portable. There are real implementations where the two differ (e.g., some current 64-bit environments use a 64-bit pointer and 32-bit int).
The C languages gives no guarantees of anything when it comes to integer or pointer sizes.
The size of int is typically the same as the data bus width, but not necessarily. The size of a pointer is typically the same as the address bus width, but not necessarily.
Many compilers use non-standard extensions like the far keyword, to access data beyond the width of the default pointer type.
In addition to 64-bit systems, there are also plenty of microcontroller/microprocessor architectures where the size of int and the size of a pointer are different. Windows 3.1 and DOS are other examples.
There's no guarantee of any relation between the sizes of these two types, nor that either can be faithfully represented in the other via round-trip casts. It's all implementation-defined.
With that said, in the real world, unless you're dealing with really obscure legacy 16-bit systems or odd DSPs or such, sizeof(int) is going to be less than or equal to sizeof(void *), and you can faithfully convert int values to void * to pass them to interfaces (like pthread_create) that take a generic void * argument to avoid wasteful allocation and freeing of memory to store a single int. In particular, if you're using POSIX or Windows interfaces already, this is definitely a safe real-world assumption to make.
You should never assume void * can be faithfully represented in int (i.e. casting a pointer to int and back). This does not work on any popular real-world 64-bit systems, and the percentage of systems it works on is sure to plummet in the near future.
No. Pointer types do not have to be the same size or representation as integer types. Here are a few relevant sections from the C language standard (online draft available here):
6.2.5 Types
...
27 A pointer to void shall have the same representation and alignment requirements as a
pointer to a character type.39) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements
as each other. All pointers to union types shall have the same representation and
alignment requirements as each other. Pointers to other types need not have the same
representation or alignment requirements.
...
39) The same representation and alignment requirements are meant to imply interchangeability as
arguments to functions, return values from functions, and members of unions.
...
6.3.2.3 Pointers
...
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.56)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
...
56) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
No, it doesn't have to be, but it's usually the case that sizeof(long) == sizeof(void*).

Why size_t when int would suffice for the size of an array?

The C standard guarantees that an int is able to store every possible array size. At least, that's what I understand from reading §6.5.2.1, subsection 1 (Array subscripting constraints):
One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall
have integer type, and the result has type ‘‘type’’.
Since we shall use ints as array subscripts, why are we supposed to use size_t to determine the size of an array?
Why does strlen() return size_t when int would suffice?
The term "integer type" doesn't mean int - for example, char, and short are integer types.
Just because you can use an int to subscript an array doesn't necessarily mean that it can reach all possible array elements.
More specifically about size_t vs. int, one example would be platforms where int might be a 16-bit type and size_t might be a 32-bit type (or the more common 32-bit int vs 64 bit size_t difference on today's 64-bit platforms).
integer type is not necessarily an "int". "long long" is an integer type too, as is "size_t".
Arrays can be larger than 2GB. This property is quite handy for those who write memory hungry programs, e.g DBMS with big buffer pools, application servers with big memory caches etc. Arrays bigger than 2GB/4GB is the whole point of 64 bit computing :)
size_t for strlen(), at least sounds compatible with how C standard handles arrays, whether it makes practical sense or not, or whether somebody have seen strings that large, is another question.
Firstly, what you quoted from the standard does not make any references to type int specifically. And no, int is not guaranteed to be sufficient to store the size of any object (including arrays) in C.
Secondly, C language does not really have "array subscriptions" specifically. The array subscription is implemented through pointer arithmetic. And the integral operand in pointer arithmetics has ptrdiff_t type. Not size_t, not int, but ptrdiff_t. It is a signed type, BTW, meaning that the value can be negative.
Thirdly, the purpose of size_t is to store the size of any object in the program (i.e. to store the result of sizeof). It is not immediately intended to be used as an array index. It just happens to work as an array index since it is guaranteed that it is always large enough to index any array. However, from an abstract point of view, "array" is a specific kind of "container" and there are other kinds of containers out there (lists-based ones, tree-based ones and so on). In generic case size_t is not sufficient to store the size of any container, which in generic case makes it a questionable choice for array indexing as well. (strlen, on the other hand, is a function that works with arrays specifically, which makes size_t appropriate there.)
When the C Standard was written, it was common for machines to have a 16-bit "int" type, and be incapable of handling any single object larger than 65535
bytes, but nonetheless be capable of handling objects larger than 32767 bytes.
Since arithmetic on an unsigned int would be large enough to handle the largest
size of such objects, but arithmetic on signed int would not, size_t was defined
to be unsigned so as to accommodate such objects without having to use "long"
computations.
On machines where the maximum allowable object size is between INT_MAX and
UINT_MAX, the difference between pointers to the start and end of such an
object may be too large to fit in "int". While the Standard doesn't impose
any requirements for how implementations should handle that, a common approach
is to define integer and pointer wrap-around behavior such that if S and E
are pointers to the start and end of a char[49152], then even though E-S
would exceed INT_MAX, it will yield a value which, when added to S, will
yield E.
Nowadays, there's seldom any real advantage to the fact that size_t is an
unsigned type (since code which needs objects larger than 2GB would often
need to use 64-bit pointers for other reasons) and it causes many kinds of
comparisons involving object sizes to behave counter-intuitively, but the
fact that sizeof expressions yield an unsigned type is sufficiently well
entrenched that it's unlikely ever to change.
size_t is a typedef of unsigned integer (such as int or long).
In some 64bit platforms, int can be 32bit, while size_t can be 64bit.
It is used as a more standard way for size.

Byte precision pointer arithmetic in C when sizeof(char) != 1

How can one portably perform pointer arithmetic with single byte precision?
Keep in mind that:
char is not 1 byte on all platforms
sizeof(void) == 1 is only available as an extension in GCC
While some platforms may have pointer deref pointer alignment restrictions, arithmetic may still require a finer granularity than the size of the smallest fundamental POD type
Your assumption is flawed - sizeof(char) is defined to be 1 everywhere.
From the C99 standard (TC3), in section 6.5.3.4 ("The sizeof operator"):
(paragraph 2)
The sizeof operator yields the size
(in bytes) of its operand, which may
be an expression or the
parenthesized name of a type.
(paragraph 3)
When applied to an operand that has
type char, unsigned char, or signed
char, (or a qualified version
thereof) the result is 1.
When these are taken together, it becomes clear that in C, whatever size a char is, that size is a "byte" (even if that's more than 8 bits, on some given platform).
A char is therefore the smallest addressable type. If you need to address in units smaller than a char, your only choice is to read a char at a time and use bitwise operators to mask out the parts of the char that you want.
sizeof(char) always returns 1, in both C and C++. A char is always one byte long.
According to the standard char is the smallest addressable chunk of data. You just can't address with greater precision - you would need to do packing/unpacking manually.
sizeof(char) is guaranteed to be 1 by the C standard. Even if char uses 9 bits or more.
So you can do:
type *pt;
unsigned char *pc = (unsigned char *)pt;
And use pc for arithmetic. Assigning pc to pt by using the cast above is undefined behavior by the C standard though.
If char is more than 8-bits wide, you can't do byte-precision pointer arithmetic in portable (ANSI/ISO) C. Here, by byte, I mean 8 bits. This is because the fundamental type itself is bigger than 8 bits.
Cast the pointer to a uintptr_t. This will be an unsigned integer that is the size of a pointer. Now do your arithmetic on it, then cast the result back to a pointer of the type you want to dereference.
(Note that intptr_t is signed, which is usually NOT what you want! It's safer to stick to uintptr_t unless you have a good reason not to!)
I don't understand what you are trying to say with sizeof(void) being 1 in GCC. While type char might theoretically consist of more than 1 underlying machine byte, in C language sizeof(char) is 1 and always exactly 1. In other words, from the point of view of C language, char is always 1 "byte" (C-byte, not machine byte). Once you understand that, you'd also understand that sizeof(void) being 1 in GCC does not help you in any way. In GCC the pointer arithmetic on void * pointers works in exactly the same way as pointer arithmetic on char * pointers, which means that if on some platform char * doesn't work for you, then void * won't work for you either.
If on some platform char objects consist of multiple machine bytes, the only way to access smaller units of memory than a full char object would be to use bitwise operations to "extract" and "modify" the required portions of a complete char object. C language offers no way to directly address anything smaller than char. Once again char is always a C-byte.
The C99 standard defines the uint8_t that is one byte long. If the compiler doesn't support this type, you could define it using a typedef. Of course you would need a different definition, depending on the the platform and/or compiler. Bundle everything in a header file and use it everywhere.

Resources