Sizeof pointer for 16 bit and 32 bit - c

I was just curious to know what would the sizeof pointer return for a 16 bit and a 32 bit system
printf("%d", sizeof(int16 *));
printf("%d", sizeof(int32 *));
Thank you.

Short answer: On a 32bit Intel 386 you will likely see these returning 4, while targeting a 16bit 8086 you might most likely see either 2 or 4 depending on the memory model you selected.
The details
First standard C does not mandate anything particular about pointers, only that they need to be able to "point to" the given variable, and pointer arithmetic needs to work within the data area of the given variable. Even a C interpreter which has some exotic representation of pointers is possible, and given this flexibility pointers truly might be of any size depending on what you target.
Usually however compilers indeed represent pointers by memory addresses which makes several operations undefined by the C standard "usually working". The way how the compiler chooses to represent a pointer depends on the targeted architecture: compiler writers obviously chose representations which are either or both useful and efficient.
An example to useful representations is generic pointers on a Harward architecture micro. They allow you to address both code and data ram. On a 8 bit micro they might be encoded as one type byte plus 2 address bytes, this obviously implies that whenever you dereference one such pointer, more complex code has to be emitted to load the contents from the proper place.
That gives a good example to an efficient representation: why not have specific pointers then? One which points to code memory, an other which points to data memory? Just 2 bytes (assuming 16bit address space as usual for 8bit micros such as the 8051), and no need to select by type.
But then you have multiple types of pointers, eh (again the 8051: you will likely have at least one additional type of pointer pointing within it's internal RAM too...). The programmer then needs to think about which particular pointer type he needs to use.
And of course the sizes also differ. On this hypothetical compiler targeting the 8051, you would have a generic pointer type of 3 bytes, an external data memory pointer type of 2 bytes, a code memory pointer of 2 bytes, and an internal RAM pointer type of 1 byte.
Also note that these are types of pointers, and not the types of data they point to (function pointers are a little off here as the fact a pointer is a function pointer implies that it is of a different type than data pointers while not having any specific syntax difference except that the data type it points to is a function type).
Back to your 16bit machine, assuming it is a 8086:
If you use some memory model where the compiler assumes you have a single data segment, you will likely get 2 byte data pointers if you don't specifically declare one near or far. Otherwise you will get 4 byte pointers by default. The representation of 2 byte pointers is usually simply the 16bit offset, while for 4 byte pointers it is a segment:offset pair. You can always apply a near or far specifier to explicitly make your pointers one or another type.
(How near pointers work in an program which also uses far pointers? Simply there is a default data segment generated by the compiler, and all nears are located within that. The compiler may simply permanently, or at least most of the time, have the ds segment register filled with the default data segment, so access of data pointed by nears can be faster)

The size of the a pointer depends on the architecture. Precisely, it depends on the size of the addresses used in that architecture which reflects the size of the bus system to access the memory.
For example, on 32 bits architecture the size of an address is 4 bytes :
sizeof (void *) == 4 Bytes.
On 64bits, addreses have size 8 bytes:
sizeof (void *) == 8 bytes.
Note, that all pointers have the same size interdependently of the type. So if you execute your code, the size of a int16 pointer and the size of int32 pointer will be the same.
However, the size of a pointer on a 16 bit system should be 2 bytes. Usually, 16bit systems have really few memory (some megabytes) and 2 bytes are enough to address all its locations. To be more precise, with a pointer of 16 bit the maximum memory you can have is around 65 KB. (really few compared to the amount of memory of a today computer).

Related

Do all objects sit in the same address space in C?

I am trying to work out if the C standard require that all addresses are in the same address space. If I have two objects of different type
double d;
int i;
I cannot do pointer arithmetic on their addresses, because they are pointers of different types. However, the standard says that I can point character type pointers there and will get the address of the first byte in the objects.
char *dp = (char *)&d;
char *ip = (char *)&i;
and with those I can do pointer arithmetic, and for example figure out how far apart they are in memory, (dp - ip). That is, of course, if doubles and ints sit in the same memory. They always do on the platforms I know, but is it guaranteed by the standard? Or is pointer arithmetic only allowed if my char pointers actually point at something with the same type?
Pointer arithmetic is only defined when the pointers have the same type and they point within the same object. More specifically, the standard says:
3 For subtraction, one of the following shall hold:
both operands have arithmetic type;
both operands are pointers to qualified or unqualified versions of compatible complete object types; or
the left operand is a pointer to a complete object type and the right operand has integer type.
and:
9 When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.
(For the purposes of interpreting the above, a single object is treated as an array with one element.)
Casting the pointer types to char * addresses the constraint in clause 3, but a pointer to d is not pointing to an element of i. So you can't subtract them.
Due to factors like ASLR, and the lack of specificity in the C specification regarding how variables are actually positioned in memory, you really can't trust the difference of two pointers to two different objects to represent anything.
Are things allocated on the stack in a top-down manner? Usually, sure, it's a long-standing convention, but it is not required to be that way. They could be heap allocated, or strewn about randomly. That's unlikely, but allowed.
In any protected mode operating system you are not seeing real memory addresses, they're user-space addresses that might look and feel very real, but they're remapped by the CPU to their actual location in memory, or perhaps not even, as that memory could have been swapped out to disk, compressed, or other more mysterious and confusing things that are all hidden away by the kernel and CPU.
While you can take the difference of two locations within a given allocation, as in through malloc or calloc, the difference between two arbitrary allocations or objects is really not meaningful. Not only does the kernel add an abstraction layer, it will deliberately scramble the allocations it gives you through Address Space Layout Randomization as a measure to make your allocations more unpredictable.
Why? To make it harder to weaponize a buffer overflow bug.
So if you're curious about the position of variables in memory, that's great, have a look, explore, but don't presume that the strategy used by your compiler, operating system, or CPU won't change in the future in some dramatic way.
On any modern 64-bit CPU and operating system there's a huge amount of address space to work with, like 18,446,744,073,709,551,616 possible bytes, and while large chunks of this are walled off and reserved, there's still a nearly inexhaustible amount of space left. That's also multiplied by the fact that each process has its own address space, so there's actually a lot more than that in theory to work with.
Fun fact: Before 64-bit CPUs took hold there were unusual 36-bit memory schemes where a 32-bit operating system and CPU could address more than 4GB of memory, but each individual process could only "see" 4GB since it uses 32-bit pointers.
Memory allocated for malloc may be used for any object with a fundamental alignment requirement, which includes all the “built in” types (e.g., including special types a compiler might provide as an extension), per C 2018 7.22.3 1, and therefore all such objects must share the address space used by malloc.
Further, any types of objects can be put into a structure or union together and therefore must share an address space.

size of pointers in c language

What is meant by the size of a pointer? Shouldn't the size of pointer depend on the type? Most of the sources say the size of a pointer if 4 or 8 bytes. I need some clarity on this claim.
For size of a pointer I would mean the number of bits (or bytes) necessary to hold that pointer in memory, or send its value across some channel, and this is (possibly) different from the size of the object the pointer points to.
Then, it can be assumed as fairly true the affirmation that pointer sizes are commonly 32 or 64 bits (4 or 8 bytes), in the sense that the systems much talked about (computers, smartphones and tablets) have pointers of that size.
But there are other systems around, smaller like DOS-based PCs or microcontrollers for embedded systems, where a pointer can be 16 bits wide or even less, and bigger systems with bus width of, say, 128 bits.
I worked in the past with the Intel 8051 CPU, which had pointers 8 bits wide, 16 bits wide, and 24 bits wide. Of course they were not freely mixable... That CPU was indeed quite strange, having about 3-4 different (and little) areas of memory; a "specialized" pointer could point only in its special area, while the 24 bit wide one could point to any area because in the upper byte there was a "selector".
Another matter is the size of the object the pointer points to. On normal computers it is a byte, but sometimes, on certain systems, it is impossible to address bytes on odd addresses in this way, so pointer arithmetic gets complicated. The 8051 (I like it!) had even pointers pointing to bits! So the size of the pointed object was actually an eight of byte, and incrementing the pointer by one could, or could not, address a different memory location than before.
Data is stored in memory. That memory has an address. Pointers hold the memory address for where the data starts.
Specifically, pointers usually hold the address of the "first byte" of data where the type resides (note that technically, the first byte might contain the last bits of data, depending on endianness).
i.e., if a long double is 128bit (16 bytes), the pointer value will point to the first byte and the pointer type will indicate the numbers of bytes that should be read.
Should you "cast" the long double pointer in the example to an int * (an int pointer), only sizeof(int) bytes would be read - but the value, the address of the first byte, will remain the same.
Hence, the pointer value is oblivious to the size of the data, the pointer only needs to be large enough to contain the address of the first byte. For this reason, usually pointers have the same length which is derived from a computer's "address space".
It is very similar to a catalog card in a library. Just like a "book address" in a library depends on the size of the library, the pointer value (the memory address) depends on the size of the computer's "address space", not the size of the type.
On most 32 bit and 64 bit CPUs, the address space is limited to either 32 or 64 bits. However, some systems have special address spaces for special pointers (such as function pointers)... this is mostly obsolete. It was more in use when CPUS were smaller than 32 bits and the "address space" was limited.
Note that values in the address space (pointers) can point to any location on the hardware (usually a byte in memory, but sometimes a register or a piece of hardware)... this is why the OS (kernel), leveraging some hardware support, will usually expose a "virtual" address space per process, shielding the hardware and other processed from a misbehaving process.
P.S.
I loved the answer given by #linuxfansaysReinstateMonica ... However, I found that I wanted to clarify some of the information in that answer. You should really read it. This answer is mostly a clarification for their answer.

sizeof Pointer differs for data type on same architecture

I have been going through some posts and noticed that pointers can be different sizes according to sizeof depending on the architecture the code is compiled for and running on. Seems reasonable enough to me (ie: 4-byte pointers on 32-bit architectures, 8-byte on 64-bit, makes total sense).
One thing that surprises me is that the size of a pointer can different based on the data type it points to. I would have assumed that, on a 32-bit architecture, all pointers would be 4-bytes in size, but it turns out that function pointers can be a different size (ie: larger than what I would have expected). Why is this, in the C programming language? I found an article that explains this for C++, and how the program may have to cope with virtual functions, but this doesn't seem to apply in pure C. Also, it seems the use of "far" and "near" pointers is no longer necessary, so I don't see those entering the equation.
So, in C, what justification, standard, or documentation describes why not all pointers are the same size on the same architecture?
Thanks!
The C standard lays down the law on what's required:
All data pointers can be converted to void* and back without loss of information.
All struct-pointers have the same representation+alignment and can thus be converted to each other.
All union-pointers have the same representation+alignment and can thus be converted to each other.
All character pointers and void pointers have the same representation+alignment.
All pointers to qualified and unqualified compatible types shall have the same representation+alignment. (For example unsigned / signed versions of the same type are compatible)
All function pointers have the same representation+alignment and can be converted to any other function pointer type and back again.
Nothing more is required.
The committee arrived at these guarantees by examining all current implementations and machines and codifying as many guarantees as they could.
On architectures where pointers are naturally word pointers instead of character pointers, you get data pointers of different sizes.
On architectures with different size code / data spaces (many micro-processors), or where additional info is needed for properly invoking functions (like itanium, though they often hide that behind a data-pointer), you get code pointers of different size from data pointers.
So, in C, what justification, standard, or documentation describes why not all pointers are the same size on the same architecture?
C11 : 6.2.5 p(28):
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.
6.3.2.3 Pointers p(8):
A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
This clarifies that pointers to data and pointers to functions are not of the same size.
One additional point:
Q: So, is it safe to say that, while I don't have to explicitly use the far/near keywords when defining a pointer, this is handled automatically "under the hood" by the compiler?
A: http://www.unix.com/programming/45002-far-pointer.html
It's a historical anachronism from segmented architectures such as the
8086.
Back in the days of yore there was the 8080, this was an 8 bit
processor with 16 bit address bus, hence 16 bit pointers.
Along came the 8086, in order to support some level of backward
compatiblity it adopted a segmented architecture which let use use
either 16 bit, 20 bit or 32 bit pointers depending on the day of the
week. Where a pointer was a combination of 16 bit segment register and
16 bit near offset. This lead to the rise of tiny, small, medium,
large and huge memory models with near, far and huge pointers.
Other architectures such as 68000 did not adopt this scheme and had
what is called a flat memory model.
With the 80386 and true 32 bit mode, all pointers are 32 bit, but
ironically are now really near pointers but 32 bit wide, the operating
system hides the segments from you.
I compiled this on three different platforms; the char * pointer was identical to the function pointer in every case:
CODE:
#include <stdio.h>
int main (int argc, char *argv[]) {
char * cptr = NULL;
void (*fnptr)() = NULL;
printf ("sizeof cptr=%ld, sizeof fnptr=%ld\n",
sizeof (cptr), sizeof (fnptr));
return 0;
}
RESULTS:
char ptr fn ptr
-------- ------
Win8/MSVS 2013 4 4
Debian7/i686/GCC 4 4
Centos/amd64/GCC 8 8
Some architecture support multiple kinds of address spaces. While nothing in the Standard would require that implementations provide access to all address spaces supported by the underlying platform, and indeed the Standard offers no guidance as to how such support should be provided, the ability to support multiple address spaces may make it possible for a programmer who is aware of them to write code that works much better than would otherwise be possible.
On some platforms, one address space will contain all the others, but accessing things in that address space will be slower (sometimes by 2x or more) than accessing things which are known to be in a particular part of it. On other platforms, there won't be any "master" address space, so different kinds of pointers will be needed to access things in different spaces.
I disagree with the claim that the existence of multiple address spaces should be viewed as a relic. On a number of ARM processors, it would be possible for a program to have up to 1K-4K (depending upon the exact chip) of globals which could be accessed twice as quickly as--and with less code than--"normal" global variables. I don't know of any ARM compilers that would exploit that, but there's no reason a compiler for the ARM couldn't do so.

Size of pointers and if that size is dependent of the architecture

Well, sorry for the question, is more like a general culture one (haven't found precise answers).
If I have something like
char * Field
or
void * Field
or
double pointers
The size of the pointer is the same? (as far I remember from college it was 4 bytes but ...)
Is the size of the pointer the same depending of the architecture of the CPU?
If I point to a data structure, the size of the the pointer itself is the same, isn't it?
Assume the examples in C (I would be prone to believe that it will be the same for other languages that does not handle pointers directly)
The size of the pointer is the same? (as far I remember from college it was 4 bytes but ...)
Not necessarily the same and not necessarily 4 bytes: Are all data pointers the same size in one platform for all data types?
Is the size of the pointer the same depending of the architecture of the CPU?
It varies from archtecture to architecture. Even on the same hardware it can vary from operating system to operating system (e.g. 32-bit vs 64-bit).
If I point to a data structure, the size of the the pointer itself is the same, isn't it?
Again, not necessarily: Are all data pointers the same size in one platform for all data types?
In most systems, the size of the pointers is same, but C don't guarantee that. It's just promise you that void* is wide enough to contain every pointer type (except of pointer to function). and yes - it depends of the CPU. (In 64bit systems, pointer is usually 8 bytes)
A 32-bit system usually has pointers of size 4 bytes and a 64-bit machine, usually has pointers of 8 bytes size.
The keyword here is of course - usually, it is entirely possible that the device you maybe using is based on Harvard architecture(Or some other bus architecture scheme), which has separate memories for data and code regions.
Hence separate buses with different widths, therefore it can be a possibility that the size of variable pointers (int*, double*, long int* etc.) is 8-bit but the size of function pointer is 16-bit, in the very same architecture.
1) The size of a pointer is the same for all pointer types.
2) Generally on a 32 bit architecture it will be 4 bytes, and on a 64 bit architecture it will be 8 bytes.
3) The size of the pointer will be the same no matter what you point it to.

Does the size of pointers vary in C? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Can the Size of Pointers Vary Depending on what’s Pointed To?
Are there are any platforms where pointers to different types have different sizes?
Is it possible that the size of a pointer to a float in c differs from a pointer to int? Having tried it out, I get the same result for all kinds of pointers.
#include <stdio.h>
#include <stdlib.h>
int main()
{
printf("sizeof(int*): %i\n", sizeof(int*));
printf("sizeof(float*): %i\n", sizeof(float*));
printf("sizeof(void*): %i\n", sizeof(void*));
return 0;
}
Which outputs here (OSX 10.6 64bit)
sizeof(int*): 8
sizeof(float*): 8
sizeof(void*): 8
Can I assume that pointers of different types have the same size (on one arch of course)?
Pointers are not always the same size on the same arch.
You can read more on the concept of "near", "far" and "huge" pointers, just as an example of a case where pointer sizes differ...
http://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer_sizes
In days of old, using e.g. Borland C compilers on the DOS platform, there were a total of (I think) 5 memory models which could even be mixed to some extent. Essentially, you had a choice of small or large pointers to data, and small or large pointers to code, and a "tiny" model where code and data had a common address space of (If I remember correctly) 64K.
It was possible to specify "huge" pointers within a program that was otherwise built in the "tiny" model. So in the worst case it was possible to have different sized pointers to the same data type in the same program!
I think the standard doesn't even forbid this, so theoretically an obscure C compiler could do this even today. But there are doubtless experts who will be able to confirm or correct this.
Pointers to data must always be compatible with void* so generally they would be nowadays realized as types of the same width.
This statement is not true for function pointers, they may have different width. For that reason in C99 casting function pointers to void* is undefined behavior.
As I understand it there is nothing in the C standard which guarantees that pointers to different types must be the same size, so in theory an int * and a float * on the same platform could be different sizes without breaking any rules.
There is a requirement that char * and void * have the same representation and alignment requirements, and there are various other similar requirements for different subsets of pointer types but there's nothing that encompasses everything.
In practise you're unlikely to run into any implementation that uses different sized pointers unless you head into some fairly obscure places.
Yes. It's uncommon, but this would certainly happen on systems that are not byte-addressable. E.g. a 16 bit system with 64 Kword = 128KB of memory. On such systems, you can still have 16 bits int pointers. But a char pointer to an 8 bit char would need an extra bit to indicate highbyte/lowbyte within the word, and thus you'd have 17/32 bits char pointers.
This might sound exotic, but many DSP's spend 99.x% of the time executing specialized numerical code. A sound DSP can be a bit simpler if it all it has to deal with is 16 bits data, leaving the occasional 8 bits math to be emulated by the compiler.
I was going to write a reply saying that C99 has various pointer conversion requirements that more or less ensure that pointers to data have to be all the same size. However, on reading them carefully, I realised that C99 is specifically designed to allow pointers to be of different sizes for different types.
For instance on an architecture where the integers are 4 bytes and must be 4 byte aligned an int pointer could be two bits smaller than a char or void pointer. Provided the cast actually does the shift in both directions, you're fine with C99. It helpfully says that the result of casting a char pointer to an incorrectly aligned int pointer is undefined.
See the C99 standard. Section 6.3.2.3
Yes, the size of a pointer is platform dependent. More specifically, the size of a pointer depends on the target processor architecture and the "bit-ness" you compile for.
As a rule of thumb, on a 64bit machine a pointer is usually 64bits, on a 32bit machine usually 32 bits. There are exceptions however.
Since a pointer is just a memory address its always the same size regardless of what the memory it points to contains. So a pointer to a float, a char or an int are all the same size.
Can I assume that pointers of different types have the same size (on one arch of course)?
For the platforms with flat memory model (== all popular/modern platforms) pointer size would be the same.
For the platforms with segmented memory model, for efficiency, often there are platform-specific pointer types of different sizes. (E.g. far pointers in the DOS, since 8086 CPU used segmented memory model.) But this is platform specific and non-standard.
You probably should keep in mind that in C++ size of normal pointer might differ from size of pointer to virtual method. Pointers to virtual methods has to preserve extra bit of information to not to work properly with polymorphism. This is probably only exception I'm aware of, which is still relevant (since I doubt that segmented memory model would ever make it back).
There are platforms where function pointers are a different size than other pointers.
I've never seen more variation than this. All other pointers must be at most sizeof(void*) since the standard requires that they can be cast to void* without loss of information.
Pointer is a memory address - and hence should be the same on a specific machine. 32 bit machine => 4Bytes, 64 bit => 8 Bytes.
Hence irrespective of the datatype of the thing that the pointer is pointing to, the size of a pointer on a specific machine would be the same (since the space required to store a memory address would be the same.)
Assumption: I'm talking about near pointers to data values, the kind you declared in your question.

Resources