C Pointer Sizes [duplicate] - c

By conducting a basic test by running a simple C++ program on a normal desktop PC it seems plausible to suppose that sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits ?
For example: in 32 bits architectures -> 4 bytes and in 64 bits architectures -> 8 bytes.
However I remember reading that, it is not like that in general!
So I was wondering what would be such circumstances?
For equality of size of pointers to data types compared with size of pointers
to other data types
For equality of size of pointers to data types compared with size of pointers
to functions
For equality of size of pointers to target architecture

No, it is not reasonable to assume. Making this assumption can cause bugs.
The sizes of pointers (and of integer types) in C or C++ are ultimately determined by the C or C++ implementation. Normal C or C++ implementations are heavily influenced by the architectures and the operating systems they target, but they may choose the sizes of their types for reasons other than execution speed, such as goals of supporting lower memory use (smaller pointers means less memory used in programs with lots of pointers), supporting code that was not written to be fully portable to any type sizes, or supporting easier use of big integers.
I have seen a compiler targeted for a 64-bit system but providing 32-bit pointers, for the purpose of building programs with smaller memory use. (It had been observed that the sizes of pointers were a considerable factor in memory consumption, due to the use of many structures with many connections and references using pointers.) Source code written with the assumption that the pointer size equalled the 64-bit register size would break.

It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
Depends. If you're aiming for a quick estimate of memory consumption it can be good enough. But not if your programs correctness depends on it.
(including pointers to functions)
But here is one important remark. Although most pointers will have the same size, function pointers may differ. It is not guaranteed that a void* will be able to hold a function pointer. At least, this is true for C. I don't know about C++.
So I was wondering what would be such circumstances if any?
It can be tons of reasons why it differs. If your programs correctness depends on this size it is NEVER ok to do such an assumption. Check it up instead. It shouldn't be hard at all.
You can use this macro to check such things at compile time in C:
#include <assert.h>
static_assert(sizeof(void*) == 4, "Pointers are assumed to be exactly 4 bytes");
When compiling, this gives an error message:
$ gcc main.c
In file included from main.c:1:
main.c:2:1: error: static assertion failed: "Pointers are assumed to be exactly 4 bytes"
static_assert(sizeof(void*) == 4, "Pointers are assumed to be exactly 4 bytes");
^~~~~~~~~~~~~
If you're using C++, you can skip #include <assert.h> because static_assert is a keyword in C++. (And you can use the keyword _Static_assert in C, but it looks ugly, so use the include and the macro instead.)
Since these two lines are so extremely easy to include in your code, there's NO excuse not to do so if your program would not work correctly with the wrong pointer size.

It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
It might be reasonable, but it isn't reliably correct. So I guess the answer is "no, except when you already know the answer is yes (and aren't worried about portability)".
Potentially:
systems can have different register sizes, and use different underlying widths for data and addressing: it's not apparent what "target architecture bits" even means for such a system, so you have to choose a specific ABI (and once you've done that you know the answer, for that ABI).
systems may support different pointer models, such as the old near, far and huge pointers; in that case you need to know what mode your code is being compiled in (and then you know the answer, for that mode)
systems may support different pointer sizes, such as the X32 ABI already mentioned, or either of the other popular 64-bit data models described here
Finally, there's no obvious benefit to this assumption, since you can just use sizeof(T) directly for whatever T you're interested in.
If you want to convert between integers and pointers, use intptr_t. If you want to store integers and pointers in the same space, just use a union.

Target architecture "bits" says about registers size. Ex. Intel 8051 is 8-bit and operates on 8-bit registers, but (external)RAM and (external)ROM is accessed with 16-bit values.

For correctness, you cannot assume anything. You have to check and be prepared to deal with weird situations.
As a general rule of thumb, it is a reasonable default assumption.
It's not universally true though. See the X32 ABI, for example, which uses 32bit pointers on 64bit architectures to save a bit of memory and cache footprint. Same for the ILP32 ABI on AArch64.
So, for guesstimating memory use, you can use your assumption and it will often be right.

It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
If you look at all types of CPUs (including microcontrollers) currently being produced, I would say no.
Extreme counterexamples would be architectures where two different pointer sizes are used in the same program:
x86, 16-bit
In MS-DOS and 16-bit Windows, a "normal" program used both 16- and 32-bit pointers.
x86, 32-bit segmented
There were only a few, less known operating systems using this memory model.
Programs typically used both 32- and 48-bit pointers.
STM8A
This modern automotive 8-bit CPU uses 16- and 24-bit pointers. Both in the same program, of course.
AVR tiny series
RAM is addressed using 8-bit pointers, Flash is addressed using 16-bit pointers.
(However, AVR tiny cannot be programmed with C++, as far as I know.)

It's not correct, for example DOS pointers (16 bit) can be far (seg+ofs).
However, for the usual targets (Windows, OSX, Linux, Android, iOS) then it's correct. Because they all use the flat programming model which relies on paging.
In theory, you can also have systems which uses only the lower 32 bits when in x64. An example is a Windows executable linked without LARGEADDRESSAWARE. However this is to help the programmer avoid bugs when switching to x64. The pointers are truncated to 32 bits, but they are still 64 bit.
In x64 operating systems then this assumption is always true, because the flat mode is the only valid one. Long mode in CPU forces GDT entries to be 64 bit flat.
One also mentions a x32 ABI, I believe it is based on the same paging technology, forcing all pointers to be mapped to the lower 4gb. However this must be based to the same theory as in Windows. In x64 you can only have flat mode.
In 32 bit protected mode you could have pointers up to 48 bits. (Segmented mode). You can also have callgates. But, no operating system uses that mode.

Historically, on microcomputers and microcontrollers, pointers were often wider than general-purpose registers so that the CPU could address enough memory and still fit within the transistor budget. Most 8-bit CPUs (such as the 8080, Z80 or 6502) had 16-bit addresses.
Today, a mismatch is more likely to be because an app doesn’t need multiple gigabytes of data, so saving four bytes of memory on every pointer is a win.
Both C and C++ provide separate size_t, uintptr_t and off_t types, representing the largest possible object size (which might be smaller than the size of a pointer if the memory model is not flat), an integral type wide enough to hold a pointer, and a file offset (often wider than the largest object allowed in memory), respectively. A size_t (unsigned) or ptrdiff_t (signed) is the most portable way to get the native word size. Additionally, POSIX guarantees that the system compiler has some flag that means a long can hold any of these, but you cannot always assume so.

Generally pointers will be size 2 on a 16-bit system, 3 on a 24-bit system, 4 on a 32-bit system, and 8 on a 64-bit system. It depends on the ABI and C implementation. AMD has long and legacy modes, and there are differences between AMD64 and Intel64 for Assembly language programmers but these are hidden for higher level languages.
Any problems with C/C++ code is likely to be due to poor programming practices and ignoring compiler warnings. See: "20 issues of porting C++ code to the 64-bit platform".
See also: "Can pointers be of different sizes?" and LRiO's answer:
... you are asking about C++ and its compliant implementations, not some specific physical machine. I'd have to quote the entire standard in order to prove it, but the simple fact is that it makes no guarantees on the result of sizeof(T*) for any T, and (as a corollary) no guarantees that sizeof(T1*) == sizeof(T2*) for any T1 and T2).
Note: Where is answered by JeremyP, C99 section 6.3.2.3, subsection 8:
A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
In GCC you can avoid incorrect assumptions by using built-in functions: "Object Size Checking Built-in Functions":
Built-in Function: size_t __builtin_object_size (const void * ptr, int type)
is a built-in construct that returns a constant number of bytes from ptr to the end of the object ptr pointer points to (if known at compile time). To determine the sizes of dynamically allocated objects the function relies on the allocation functions called to obtain the storage to be declared with the alloc_size attribute (see Common Function Attributes). __builtin_object_size never evaluates its arguments for side effects. If there are any side effects in them, it returns (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3. If there are multiple objects ptr can point to and all of them are known at compile time, the returned number is the maximum of remaining byte counts in those objects if type & 2 is 0 and minimum if nonzero. If it is not possible to determine which objects ptr points to at compile time, __builtin_object_size should return (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3.

Related

objects of which types can always be allocated at page boundary?

On Linux, I am interested to know if at page boundary, some object type can always be allocated. For which C types, is this always guaranteed? Please point me to standard/documentation from which the answer follows.
Clarifying code:
#include <sys/mman.h>
#include <string.h>
typedef char TYPE;
int main()
{
TYPE val;
void *mem;
mem = mmap(NULL, sizeof(val), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
memcpy(mem, &val, sizeof(val));
/* is *(TYPE*)mem guaranteed to be the same as val ? */
}
The answer for char is yes, guaranteed.
For which other types is the answer yes, guaranteed?
Object alignment requirements are determined by the hardware architecture (processor being used), because it is the processor that may not be able to load data from unaligned addresses. (In some cases, the kernel can provide unaligned address support by trapping the processor interrupt generated by an access attempt via an unaligned address, and emulate the instruction. This is slow, however.)
The hardware architectures supported by the Linux kernel are listed at https://www.kernel.org/doc/html/latest/arch.html.
We can summarize these by saying that there is no hardware architecture supported by Linux that requires more than 128 bytes of alignment for any native data type supported by the processor, and page sizes are a multiple of 512 bytes (for historical reasons), so we can categorically say that on Linux, all data primitives can be accessed at page-aligned addresses.
(In Linux, you can use sysconf(_SC_PAGESIZE) to obtain the page size. Note that if huge pages are supported, they are larger, a multiple of this value.)
The above covers all C data types defined by the GNU C standard library and its extensions, because they only define scalar types, structures with elements aligned at natural boundaries (not packed), and vectorized types (using GCC/Clang vector extensions designed for SIMD architectures).
(You can define a packed structure that has to be allocated at a non-aligned address if you are really evil, using GCC type or variable attributes.)
If we look at the C standard, the type max_align_t (provided in <stddef.h> by the compiler; it is available even in freestanding environments where the standard C library is not available) has the maximum alignment needed for any object (except for evilly constructed packed structures mentioned above).
This means that _Alignof (max_align_t) tells you the maximum alignment required for the types the C standard defines. (Do remember to tell your C compiler the code uses features provided by the C11 standard or later, e.g. -std=c11 or -std=gnu2x.)
However, certain architectures with SIMD (single instruction, multiple data –– for which GCC and other C compilers have added vector extensions, for example implementing the Intel <immintrin.h> MMX/SSE/AVX etc. intrinsics), may require larger alignment for vector registers, up to the size of the vector registers. (This is where that 512 bits comes from.) On x86-64 (64-bit Intel and AMD architectures currently used) there are separate instructions for unaligned and aligned accesses, with unaligned accesses possibly slower than aligned accesses, depending on the exact processor. So, _Alignof(max_align_t) does not apply to these vectorized types, using vector extensions to the C standard. The C standard itself refers to such types as "requiring extended alignment".
In Linux, all types passed to the kernel, including pointers, must retain their information when cast to long, because the Linux kernel syscall interface passes syscall arguments as an array of up to six longs. See include/asm-generic/syscall.h:syscall_get_arguments() in the Linux kernel. (While this function is implemented for each hardware architecture separately, every implementation has the same signature, ie. passes the syscall arguments as long.)
The C standard does not define any relationship between the pointer address, and the value of a pointer when converted to a sufficiently large integer type. This is because there historically were architectures where this relationship was complicated (for example, on 8086 'far' pointers were 32-bit, where the actual address was 16*high16bits + low16bits). In Linux, however, the relationship is expected to be 1:1. This can be seen in things like /proc pseudo-filesystem, where pointers (say, in /proc/self/maps) are displayed in hexadecimal. See lib/vsprintf.c:pointer_string(), which is used to convert pointers to strings for userspace ABIs: it casts the pointer to unsigned long int, and prints the number value.
This means that when pointer ptr is N-byte aligned, in Linux, (unsigned long)ptr % N == 0.
While the C standard leaves signed integer overflow for each implementation to define, the Linux kernel expects and uses the GCC behaviour: signed integers use two's complement, and wrap around analogously to unsigned integers. This means that casts between long and unsigned long do not affect the storage representation and lose no information; the types only differ in whether the value they represent is considered signed or unsigned. Thus, any of the logic above wrt. long equally applies to unsigned long, and vice versa.
Finally, you can variants paraphrasing the statement "on all currently supported architectures, ints are assumed to be 32 bits, longs the size of a pointer and long long 64 bits" in both the kernel sources and on the Linux Kernel Mailing List, LKML. See e.g. Christoph Hellwig in 2003. The patch (to documentation on adding syscalls) to explicitly mention that Linux currently only supports ILP32 and LP64 models was submitted in April 2021 with positive reactions, but hasn't been applied yet.

sizeof Pointer differs for data type on same architecture

I have been going through some posts and noticed that pointers can be different sizes according to sizeof depending on the architecture the code is compiled for and running on. Seems reasonable enough to me (ie: 4-byte pointers on 32-bit architectures, 8-byte on 64-bit, makes total sense).
One thing that surprises me is that the size of a pointer can different based on the data type it points to. I would have assumed that, on a 32-bit architecture, all pointers would be 4-bytes in size, but it turns out that function pointers can be a different size (ie: larger than what I would have expected). Why is this, in the C programming language? I found an article that explains this for C++, and how the program may have to cope with virtual functions, but this doesn't seem to apply in pure C. Also, it seems the use of "far" and "near" pointers is no longer necessary, so I don't see those entering the equation.
So, in C, what justification, standard, or documentation describes why not all pointers are the same size on the same architecture?
Thanks!
The C standard lays down the law on what's required:
All data pointers can be converted to void* and back without loss of information.
All struct-pointers have the same representation+alignment and can thus be converted to each other.
All union-pointers have the same representation+alignment and can thus be converted to each other.
All character pointers and void pointers have the same representation+alignment.
All pointers to qualified and unqualified compatible types shall have the same representation+alignment. (For example unsigned / signed versions of the same type are compatible)
All function pointers have the same representation+alignment and can be converted to any other function pointer type and back again.
Nothing more is required.
The committee arrived at these guarantees by examining all current implementations and machines and codifying as many guarantees as they could.
On architectures where pointers are naturally word pointers instead of character pointers, you get data pointers of different sizes.
On architectures with different size code / data spaces (many micro-processors), or where additional info is needed for properly invoking functions (like itanium, though they often hide that behind a data-pointer), you get code pointers of different size from data pointers.
So, in C, what justification, standard, or documentation describes why not all pointers are the same size on the same architecture?
C11 : 6.2.5 p(28):
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.
6.3.2.3 Pointers p(8):
A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
This clarifies that pointers to data and pointers to functions are not of the same size.
One additional point:
Q: So, is it safe to say that, while I don't have to explicitly use the far/near keywords when defining a pointer, this is handled automatically "under the hood" by the compiler?
A: http://www.unix.com/programming/45002-far-pointer.html
It's a historical anachronism from segmented architectures such as the
8086.
Back in the days of yore there was the 8080, this was an 8 bit
processor with 16 bit address bus, hence 16 bit pointers.
Along came the 8086, in order to support some level of backward
compatiblity it adopted a segmented architecture which let use use
either 16 bit, 20 bit or 32 bit pointers depending on the day of the
week. Where a pointer was a combination of 16 bit segment register and
16 bit near offset. This lead to the rise of tiny, small, medium,
large and huge memory models with near, far and huge pointers.
Other architectures such as 68000 did not adopt this scheme and had
what is called a flat memory model.
With the 80386 and true 32 bit mode, all pointers are 32 bit, but
ironically are now really near pointers but 32 bit wide, the operating
system hides the segments from you.
I compiled this on three different platforms; the char * pointer was identical to the function pointer in every case:
CODE:
#include <stdio.h>
int main (int argc, char *argv[]) {
char * cptr = NULL;
void (*fnptr)() = NULL;
printf ("sizeof cptr=%ld, sizeof fnptr=%ld\n",
sizeof (cptr), sizeof (fnptr));
return 0;
}
RESULTS:
char ptr fn ptr
-------- ------
Win8/MSVS 2013 4 4
Debian7/i686/GCC 4 4
Centos/amd64/GCC 8 8
Some architecture support multiple kinds of address spaces. While nothing in the Standard would require that implementations provide access to all address spaces supported by the underlying platform, and indeed the Standard offers no guidance as to how such support should be provided, the ability to support multiple address spaces may make it possible for a programmer who is aware of them to write code that works much better than would otherwise be possible.
On some platforms, one address space will contain all the others, but accessing things in that address space will be slower (sometimes by 2x or more) than accessing things which are known to be in a particular part of it. On other platforms, there won't be any "master" address space, so different kinds of pointers will be needed to access things in different spaces.
I disagree with the claim that the existence of multiple address spaces should be viewed as a relic. On a number of ARM processors, it would be possible for a program to have up to 1K-4K (depending upon the exact chip) of globals which could be accessed twice as quickly as--and with less code than--"normal" global variables. I don't know of any ARM compilers that would exploit that, but there's no reason a compiler for the ARM couldn't do so.

What memory address spaces are there?

What forms of memory address spaces have been used?
Today, a large flat virtual address space is common. Historically, more complicated address spaces have been used, such as a pair of a base address and an offset, a pair of a segment number and an offset, a word address plus some index for a byte or other sub-object, and so on.
From time to time, various answers and comments assert that C (or C++) pointers are essentially integers. That is an incorrect model for C (or C++), since the variety of address spaces is undoubtedly the cause of some of the C (or C++) rules about pointer operations. For example, not defining pointer arithmetic beyond an array simplifies support for pointers in a base and offset model. Limits on pointer conversion simplify support for address-plus-extra-data models.
That recurring assertion motivates this question. I am looking for information about the variety of address spaces to illustrate that a C pointer is not necessarily a simple integer and that the C restrictions on pointer operations are sensible given the wide variety of machines to be supported.
Useful information may include:
Examples of computer architectures with various address spaces and descriptions of those spaces.
Examples of various address spaces still in use in machines currently being manufactured.
References to documentation or explanation, especially URLs.
Elaboration on how address spaces motivate C pointer rules.
This is a broad question, so I am open to suggestions on managing it. I would be happy to see collaborative editing on a single generally inclusive answer. However, that may fail to award reputation as deserved. I suggest up-voting multiple useful contributions.
Just about anything you can imagine has probably been used. The
first major division is between byte addressing (all modern
architectures) and word addressing (pre-IBM 360/PDP-11, but
I think modern Unisys mainframes are still word addressed). In
word addressing, char* and void* would often be bigger than
an int*; even if they were not bigger, the "byte selector"
would be in the high order bits, which were required to be 0, or
would be ignored for anything other than bytes. (On a PDP-10,
for example, if p was a char*, (int)p < (int)(p+1) would
often be false, even though int and char* had the same
size.)
Among byte addressed machines, the major variants are segmented
and non-segmented architectures. Both are still wide spread
today, although in the case of Intel 32bit (a segmented
architecture with 48 bit addresses), some of the more widely
used OSs (Windows and Linux) artificially restrict user
processes to a single segment, simulating a flat addressing.
Although I've no recent experience, I would expect even more
variety in embedded processors. In particular, in the past, it
was frequent for embedded processors to use a Harvard
architecture, where code and data were in independent address
spaces (so that a function pointer and a data pointer, cast to a
large enough integral type, could compare equal).
I would say you are asking the wrong question, except as historical curiosity.
Even if your system happens to use a flat address space -- indeed, even if every system from now until the end of time uses a flat address space -- you still cannot treat pointers as integers.
The C and C++ standards leave all sorts of pointer arithmetic "undefined". That can impact you right now, on any system, because compilers will assume you avoid undefined behavior and optimize accordingly.
For a concrete example, three months ago a very interesting bug turned up in Valgrind:
https://sourceforge.net/p/valgrind/mailman/message/29730736/
(Click "View entire thread", then search for "undefined behavior".)
Basically, Valgrind was using less-than and greater-than on pointers to try to determine if an automatic variable was within a certain range. Because comparisons between pointers in different aggregates is "undefined", Clang simply optimized away all of the comparisons to return a constant true (or false; I forget).
This bug itself spawned an interesting StackOverflow question.
So while the original pointer arithmetic definitions may have catered to real machines, and that might be interesting for its own sake, it is actually irrelevant to programming today. What is relevant today is that you simply cannot assume that pointers behave like integers, period, regardless of the system you happen to be using. "Undefined behavior" does not mean "something funny happens"; it means the compiler can assume you do not engage in it. When you do, you introduce a contradiction into the compiler's reasoning; and from a contradiction, anything follows... It only depends on how smart your compiler is.
And they get smarter all the time.
There are various forms of bank-switched memory.
I worked on an embedded system that had 128 KB of total memory: 64KB of RAM and 64KB of EPROM. Pointers were only 16-bit, so a pointer into the RAM could have the same value of a pointer in the EPROM, even though they referred to different memory locations.
The compiler kept track of the type of the pointer so that it could generate the instruction(s) to select the correct bank before dereferencing a pointer.
You could argue that this was like segment + offset, and at the hardware level, it essentially was. But the segment (or more correctly, the bank) was implicit from the pointer's type and not stored as the value of a pointer. If you inspected a pointer in the debugger, you'd just see a 16-bit value. To know whether it was an offset into the RAM or the ROM, you had to know the type.
For example, Foo * could only be in RAM and const Bar * could only be in ROM. If you had to copy a Bar into RAM, the copy would actually be a different type. (It wasn't as simple as const/non-const:
Everything in ROM was const, but not all consts were in ROM.)
This was all in C, and I know we used non-standard extensions to make this work. I suspect a 100% compliant C compiler probably couldn't cope with this.
From a C programmer's perspective, there are three main kinds of implementation to worry about:
Those which target machines with a linear memory model, and which are designed and/or configured to be usable as a "high-level assembler"--something the authors of the Standard have expressly said they did not wish to preclude. Most implementations behave in this way when optimizations are disabled.
Those which are usable as "high-level assemblers" for machines with unusual memory architectures.
Those which whose design and/or configuration make them suitable only for tasks that do not involve low-level programming, including clang and gcc when optimizations are enabled.
Memory-management code targeting the first type of implementation will often be compatible with all implementations of that type whose targets use the same representations for pointers and integers. Memory-management code for the second type of implementation will often need to be specifically tailored for the particular hardware architecture. Platforms that don't use linear addressing are sufficiently rare, and sufficiently varied, that unless one needs to write or maintain code for some particular piece of unusual hardware (e.g. because it drives an expensive piece of industrial equipment for which more modern controllers aren't available) knowledge of any particular architecture isn't likely to be of much use.
Implementations of the third type should be used only for programs that don't need to do any memory-management or systems-programming tasks. Because the Standard doesn't require that all implementations be capable of supporting such tasks, some compiler writers--even when targeting linear-address machines--make no attempt to support any of the useful semantics thereof. Even some principles like "an equality comparison between two valid pointers will--at worst--either yield 0 or 1 chosen in possibly-unspecified fashion don't apply to such implementations.

Does the size of pointers vary in C? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Can the Size of Pointers Vary Depending on what’s Pointed To?
Are there are any platforms where pointers to different types have different sizes?
Is it possible that the size of a pointer to a float in c differs from a pointer to int? Having tried it out, I get the same result for all kinds of pointers.
#include <stdio.h>
#include <stdlib.h>
int main()
{
printf("sizeof(int*): %i\n", sizeof(int*));
printf("sizeof(float*): %i\n", sizeof(float*));
printf("sizeof(void*): %i\n", sizeof(void*));
return 0;
}
Which outputs here (OSX 10.6 64bit)
sizeof(int*): 8
sizeof(float*): 8
sizeof(void*): 8
Can I assume that pointers of different types have the same size (on one arch of course)?
Pointers are not always the same size on the same arch.
You can read more on the concept of "near", "far" and "huge" pointers, just as an example of a case where pointer sizes differ...
http://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer_sizes
In days of old, using e.g. Borland C compilers on the DOS platform, there were a total of (I think) 5 memory models which could even be mixed to some extent. Essentially, you had a choice of small or large pointers to data, and small or large pointers to code, and a "tiny" model where code and data had a common address space of (If I remember correctly) 64K.
It was possible to specify "huge" pointers within a program that was otherwise built in the "tiny" model. So in the worst case it was possible to have different sized pointers to the same data type in the same program!
I think the standard doesn't even forbid this, so theoretically an obscure C compiler could do this even today. But there are doubtless experts who will be able to confirm or correct this.
Pointers to data must always be compatible with void* so generally they would be nowadays realized as types of the same width.
This statement is not true for function pointers, they may have different width. For that reason in C99 casting function pointers to void* is undefined behavior.
As I understand it there is nothing in the C standard which guarantees that pointers to different types must be the same size, so in theory an int * and a float * on the same platform could be different sizes without breaking any rules.
There is a requirement that char * and void * have the same representation and alignment requirements, and there are various other similar requirements for different subsets of pointer types but there's nothing that encompasses everything.
In practise you're unlikely to run into any implementation that uses different sized pointers unless you head into some fairly obscure places.
Yes. It's uncommon, but this would certainly happen on systems that are not byte-addressable. E.g. a 16 bit system with 64 Kword = 128KB of memory. On such systems, you can still have 16 bits int pointers. But a char pointer to an 8 bit char would need an extra bit to indicate highbyte/lowbyte within the word, and thus you'd have 17/32 bits char pointers.
This might sound exotic, but many DSP's spend 99.x% of the time executing specialized numerical code. A sound DSP can be a bit simpler if it all it has to deal with is 16 bits data, leaving the occasional 8 bits math to be emulated by the compiler.
I was going to write a reply saying that C99 has various pointer conversion requirements that more or less ensure that pointers to data have to be all the same size. However, on reading them carefully, I realised that C99 is specifically designed to allow pointers to be of different sizes for different types.
For instance on an architecture where the integers are 4 bytes and must be 4 byte aligned an int pointer could be two bits smaller than a char or void pointer. Provided the cast actually does the shift in both directions, you're fine with C99. It helpfully says that the result of casting a char pointer to an incorrectly aligned int pointer is undefined.
See the C99 standard. Section 6.3.2.3
Yes, the size of a pointer is platform dependent. More specifically, the size of a pointer depends on the target processor architecture and the "bit-ness" you compile for.
As a rule of thumb, on a 64bit machine a pointer is usually 64bits, on a 32bit machine usually 32 bits. There are exceptions however.
Since a pointer is just a memory address its always the same size regardless of what the memory it points to contains. So a pointer to a float, a char or an int are all the same size.
Can I assume that pointers of different types have the same size (on one arch of course)?
For the platforms with flat memory model (== all popular/modern platforms) pointer size would be the same.
For the platforms with segmented memory model, for efficiency, often there are platform-specific pointer types of different sizes. (E.g. far pointers in the DOS, since 8086 CPU used segmented memory model.) But this is platform specific and non-standard.
You probably should keep in mind that in C++ size of normal pointer might differ from size of pointer to virtual method. Pointers to virtual methods has to preserve extra bit of information to not to work properly with polymorphism. This is probably only exception I'm aware of, which is still relevant (since I doubt that segmented memory model would ever make it back).
There are platforms where function pointers are a different size than other pointers.
I've never seen more variation than this. All other pointers must be at most sizeof(void*) since the standard requires that they can be cast to void* without loss of information.
Pointer is a memory address - and hence should be the same on a specific machine. 32 bit machine => 4Bytes, 64 bit => 8 Bytes.
Hence irrespective of the datatype of the thing that the pointer is pointing to, the size of a pointer on a specific machine would be the same (since the space required to store a memory address would be the same.)
Assumption: I'm talking about near pointers to data values, the kind you declared in your question.

What is the maximum size of buffers memcpy/memset etc. can handle?

What is the maximum size of buffers memcpy and other functions can handle? Is this implementation dependent? Is this restricted by the size(size_t) passed in as an argument?
This is entirely implementation dependent.
This depends on the hardware as much as anything, but also on the age of the compiler. For anyone with a reasonably modern compiler (meaning anything based on a standard from the early 90's or later), the size argument is a size_t. This can reasonably be the largest 16 bit unsigned, the largest 32 bit unsigned, or the largest 64 bit unsigned, depending on the memory model the compiler compiles to. In this case, you just have to find out what size a size_t is in your implementation. However, for very old compilers (that is, before ANSI-C and perhaps for some early versions of ANSI C), all bets are off.
On the standards side, looking at cygwin and Solaris 7, for example, the size argument is a size_t. Looking at an embedded system that I have available, the size argument is an unsigned (meaning 16-bit unsigned). (The compiler for this embedded system was written in the 80's.) I found a web reference to some ANSI C where the size parameter is an int.
You may want to see this article on size_t as well as the follow-up article about a mis-feature of some early GCC versions where size_t was erroneously signed.
In summary, for almost everyone, size_t will be the correct reference to use. For those few using embedded systems or legacy systems with very old compilers, however, you need to check your man page.
Functions normally use a size_t to pass a size as parameter. I say normally because fgets() uses an int parameter, which in my opinion is a flaw in the C standard.
size_t is defined as a type which can contain the size (in bytes) of any object you could access. Generally it's a typedef of unsigned int or unsigned long.
That's why the values returnes by the sizeof operator are of size_t type.
So 2 ** (sizeof(size_t) * CHAR_BIT) gives you a maximum amount of memory that your program could handle, but it's certainly not the most precise one.
(CHAR_BIT is defined in limits.h and yields the number of bits contained in a char).
They take a size_t argument; so the it's platform dependent.
Implementation dependent, but you can look in the header (.h) file that you need to include before you can use memcpy. The declaration will tell you (look for size_t or other).
And then you ask what size_t is, well, that's the implementation dependent part.
Right, you cannot copy areas that are greater then 2^(sizeof(size_t)*8) bytes. But that is nothing to worry about, because you cannot allocate more space either, because malloc also takes the size as a size_t parameter.
There is also an issue related to what size_t can represent verses what your platform will allow a process to actually address.
Even with virtual memory on a 64-bit platform, you are unlikely to be able to call memcpy() with sizes of more than a few TB or so this week, and even then that is a pretty hot machine.... it is hard to imagine what a machine on which it would be possible to install a fully covered 64-bit address space would look like.
Never mind the embedded systems with only a few KB of total writable memory, where it can't make sense to attempt to memcpy() more information than the RAM regardless of the definition of size_t. Do think about what just happened to the stack holding the return address from that call if you did?
Or systems where the virtual address space seen by a process is smaller than the physical memory installed. This is actually the case with a Win32 process running on a Win64 platform, for example. (I first encountered this under the time sharing OS TSX-11 running on a PDP-11 with 4MB of physical memory, and 64KB virtual address in each process. 4MB of RAM was a lot of memory then, and the IBM PC didn't exist yet.)

Resources