Is there any way the size of the pointer can be changed from 2 bytes? - c

Can we anyhow change the size of the pointer from 2 bytes so it can occupy more than 2 bytes?

Sure, compile for a 32 (or 64) bit platform :-)
The size of pointers is platform specific, it would be 2 bytes only on 16-bit platforms (which have not been widely used for more than a decade - nowadays all mainstream [update](desktop / laptop / server)[/update] platforms are at least 32 bits).

If your pointer size is 2 byte that means you're running on a 16-bit system.
The only way to increase the pointer size is to use a 32-bit or 64-bit system instead (which would mean any desktop or laptop computer built in the last 15 years or so).
If you're running on some embedded device that uses 16-bit, your only option would be to switch to another device which uses 32-bits (or just live with your pointers being 16-bit).

When a processor is said to be "X-bit" (where X is 16, 32, 64, etc), that X refers to the size of the memory address register. Thus a 16-bit system has a memory address register of 2 bytes.
You cannot cast a 4-byte address to anything smaller because it would lose part of where it's pointing to. (A 2-byte memory address register can only point to 2^16=64KB of memory, whereas a 4-byte register can point to 2^32=4GB of memory.)
You can always "step-up" (ie, run a 32-bit software application on a 64-bit computer) because there's no loss in pointer range. But you can never step down, which is why 64-bit programs don't run on 32-bit systems.

Think of a pointer as a number, only instead of an actual value used for computation, it's the number of a 'slot' in the memory map of the system.
A pointer must be able to represent the highest position of the memory map. That is, it must have at least the amount of bytes required to represent the number of the highest position.
In a 16-bit system, the highest possible position is 0xFFFF (a 16-bit number with all the bits set to 1). A pointer must also have 16 bits, so it can reach that number.
Generalizing, in an X-bit system, a pointer will have X bits.
You can store a pointer in a larger variable, the same way you can store the number 1 in a char, in an int, or an unsigned long long if you wanted to; but there's little point to that: think that, the same way a shorter pointer won't be able to reach the highest memory position, a longer pointer would be able to point to things that can't actually exist in memory, so why have it?
Also, you'd have to 'trick' the compiler for that. If you use the pointer notation in your code, the compiler will always use the correct amount of bytes for it. You can instruct the compiler to compile for another platform, though.

Related

Why does the size of the char* data type correspond to the computer's word size (4 bytes or 8 bytes), while a char gets only 1 byte?

As far as I know, the pointer data types (char *, etc.) get the word size of the system. If I think of memory as a grid, one field is the size of one word (so 4 bytes for a 32-bit system and 8 bytes for a 64-bit system). So the idea is to give the pointer exactly one field because that's convenient (better performance?). But then I wonder why a simple char gets only one byte. That would be 1/4 of a field. Why is that? And what happens to the remaining 3 bytes of the box?
The correct way to make the conversion between pointer and integers in C is via the type intptr_t. This is the optimal way to keep a pointer into an integer.
Your question is in link with the hardware of the computers. The C language influenced the hardware design.
There is a distinction between data path and control. Data path is the hard-coded part of the hardware and it contains buses of N wires. There are buses for addresses and buses for data and they do not have the same number of wires all the time. The C language sets the size of a pointer to object big enough to cover all the possible addresses on the target address buses (in some architectures the code is accessed on different buses, and there the size of pointer to function may differ). For practical reasons, the control contains instructions to access the data using different sizes, depending on the need. If you need to work with small integers there is no reason to access them from 4 to 4 bytes. They can be aligned more compactly.
But yes, there are C compilers that compile a char in 4 bytes (I have never seen any, but they exist).
the pointer data types (char *, etc.) get the word size of the system
Not exactly: they usually have the size of the address space, ie: enough bits to address any data in RAM. Note however that it can be more bits than the word size (the size of a typical CPU register) as was the case in some older systems: 8088, 8086, 80186 and 80286 had 16-bit registers but an address space ranging from 20 to 24 bits, requiring a pair of words to express an address. These systems actually had various compilation modes where pointers could be 16-bit or 32-bit depending on the amount of memory the program could use.
`But then I wonder why a simple char gets only one byte?
A byte is, by definition, the smallest item of memory that can be addressed directly. The C language maps this to the char type. On most systems, this is an octet comprising 8 bits, which happens to be the smallest possible size for a char. The address space is expressed in this unit, even if the data bus is wider. For example 64-bit intel processors typically have a 128-bit data bus, but addresses are still expressed in units of 8-bits. Some specific CPUs such as DSPs (digital signal processors) may not have this capability and can only address 16-bit or even 32-bit words. On these systems, a byte can be 16-bit or even 32-bit wide and the C compiler either uses this width for the char type or emulates a smaller char type in software.
what happens to the remaining 3 bytes of the box?
Nothing special, the box is a pack of 2, 4, 8 or more bytes, each of which can be addressed directly and independently, either as the hardware allows it or through software emulation.

Why 2 raised to 32 power results in a number in bytes instead of bits?

I just restart the C programming study. Now, I'm studying the memory storage capacity and the difference between bit and byte. I came across to this definition.
There is a calculation to a 32 bits system. I'm very confused, because in this calculation 2^32 = 4294967296 bytes and it means about 4 Gigabyte. My question is: Why 2 raised to 32 power results in a number in bytes instead of bits ?
Thanks for helping me.
Because the memory is byte-addressable (that is, each byte has its own address).
There are two ways to look at this:
A 32-bit integer can hold one of 2^32 different values. Thus, a uint32_t can represent the values from 0 to 4294967295.
A 32-bit address can represent 2^32 different addresses. And as Scott said, on a byte-addressable system, that means 2^32 different bytes can be addressed. Thus, a process with 32-bit pointers can address up to 4 GiB of virtual memory. Or, a microprocessor with a 32-bit address bus can address up to 4 GiB of RAM.
That description is really superficial and misses a lot of important considerations, especially as to how memory is defined and accessed.
Fundamentally an N-bit value has 2N possible states, so a 16-bit value has 65,536 possible states. Additionally, memory is accessed as bytes, or 8-bit values. This was not always the case, older machines had different "word" sizes, anywhere from 4 to 36 bits per word, occasionally more, but over time the 8-bit word, or "byte", became the dominant form.
In every case a memory "address" contains one "word" or, on more modern machines, "byte". Memory is measured in these units, like "kilowords" or "gigabytes", for reasons of simplicity even though the individual memory chips themselves are specified in terms of bits. For example, a 1 gigabyte memory module often has 8 gigabit chips on it. These chips are read at the same time, the resulting data combined to produce a single byte of memory.
By that article's wobbly definition this means a 16-bit CPU can only address 64KB of memory, which is wrong. DOS systems from the 1980s used two pointers to represent memory, a segment and an offset, and could address 16MB using an effective 24-bit pointer. This isn't the only way in which the raw pointer size and total addressable memory can differ.
Some 32-bit systems also had an alternate 36-bit memory model that allowed addressing up to 64GB of memory, though an individual process was limited to a 4GB slice of the available memory.
In other words, for systems with a singular pointer to a memory address and where the smallest memory unit is a byte then the maximum addressable memory is 2N bytes.
Thankfully, since 64-bit systems are now commonplace and a computer with > 64GB of memory is not even exotic or unusual, addressing systems are a lot simpler now then when having to work around pointer-size limitations.
We say that memory is byte-addressable, you can think like byte is the smallest unit of memory so you are not reading by bits but bytes. The reason might be that the smallest data type is 1 byte, even boolean type in c/c++ is 1 byte.

size of pointers in c language

What is meant by the size of a pointer? Shouldn't the size of pointer depend on the type? Most of the sources say the size of a pointer if 4 or 8 bytes. I need some clarity on this claim.
For size of a pointer I would mean the number of bits (or bytes) necessary to hold that pointer in memory, or send its value across some channel, and this is (possibly) different from the size of the object the pointer points to.
Then, it can be assumed as fairly true the affirmation that pointer sizes are commonly 32 or 64 bits (4 or 8 bytes), in the sense that the systems much talked about (computers, smartphones and tablets) have pointers of that size.
But there are other systems around, smaller like DOS-based PCs or microcontrollers for embedded systems, where a pointer can be 16 bits wide or even less, and bigger systems with bus width of, say, 128 bits.
I worked in the past with the Intel 8051 CPU, which had pointers 8 bits wide, 16 bits wide, and 24 bits wide. Of course they were not freely mixable... That CPU was indeed quite strange, having about 3-4 different (and little) areas of memory; a "specialized" pointer could point only in its special area, while the 24 bit wide one could point to any area because in the upper byte there was a "selector".
Another matter is the size of the object the pointer points to. On normal computers it is a byte, but sometimes, on certain systems, it is impossible to address bytes on odd addresses in this way, so pointer arithmetic gets complicated. The 8051 (I like it!) had even pointers pointing to bits! So the size of the pointed object was actually an eight of byte, and incrementing the pointer by one could, or could not, address a different memory location than before.
Data is stored in memory. That memory has an address. Pointers hold the memory address for where the data starts.
Specifically, pointers usually hold the address of the "first byte" of data where the type resides (note that technically, the first byte might contain the last bits of data, depending on endianness).
i.e., if a long double is 128bit (16 bytes), the pointer value will point to the first byte and the pointer type will indicate the numbers of bytes that should be read.
Should you "cast" the long double pointer in the example to an int * (an int pointer), only sizeof(int) bytes would be read - but the value, the address of the first byte, will remain the same.
Hence, the pointer value is oblivious to the size of the data, the pointer only needs to be large enough to contain the address of the first byte. For this reason, usually pointers have the same length which is derived from a computer's "address space".
It is very similar to a catalog card in a library. Just like a "book address" in a library depends on the size of the library, the pointer value (the memory address) depends on the size of the computer's "address space", not the size of the type.
On most 32 bit and 64 bit CPUs, the address space is limited to either 32 or 64 bits. However, some systems have special address spaces for special pointers (such as function pointers)... this is mostly obsolete. It was more in use when CPUS were smaller than 32 bits and the "address space" was limited.
Note that values in the address space (pointers) can point to any location on the hardware (usually a byte in memory, but sometimes a register or a piece of hardware)... this is why the OS (kernel), leveraging some hardware support, will usually expose a "virtual" address space per process, shielding the hardware and other processed from a misbehaving process.
P.S.
I loved the answer given by #linuxfansaysReinstateMonica ... However, I found that I wanted to clarify some of the information in that answer. You should really read it. This answer is mostly a clarification for their answer.

Size of pointers to pointers in memory

Just a quick question:
on a 32 bit machine, is a pointer to a pointer (**p) going to be 4 bytes?
The logic is that pointers are merely memory addresses. The memory address of any stored entity in a machine with 32-bit addresses is almost certainly 4 bytes. Therefore the memory address of a stored pointer is 4 bytes. Therefore a pointer to a pointer is 4 bytes. None of this is promised by the ISO C standard. It's just the way that nearly all implementations turn out.
yes... it will be 4 bytes... but its not guaranteed.
Correct. Pointers usually have a fixed size. On a 32-bit machine they are usually 32 bits (= 4 bytes)
Typically yes, addresses on 32-bit machines it will be 4 bytes.
Best bet if you don't want to make assumptions is run the old sizeof(p)
Others have already mentioned that it's most certainly 32 bits or 4 8-bit bytes.
However, depending on the hardware and the compiler it may be less or more than that.
If your machine can address its memory only as 32-bit units at 32-bit boundaries, you will have to have a bigger pointer to address and access 8-bit portions (chars/bytes) of every 32-bit memory cell. If the compiler here decides not to have pointers of different sizes, all pointers (including pointers to pointers) become 34+-bit long.
Likewise, if the program is very small and can fit into 64KB, the compiler may be able to reduce all pointers to 16 bits.

C why would a pointer be larger than an integer

I am playing around with sizeof() in GCC on a linux machine right now and I found something very surprising.
printf ("\nsize of int = %lu\n", sizeof(int));
printf ("\nsize of int* = %lu\n", sizeof(int *));
yields
size of int = 4
size of int* = 8
I thought the size of a pointer to an integer would be much smaller than the actual integer itself!
I am researching embedded software right now and I was under the understanding that passing by reference was more efficient ( in terms of power ) than passing by value.
Could someone please clarify why it is more efficient to pass by reference than by value if the size of the pointer is larger than the actual value.
Thanks!
Integer can be any size the compiler writer likes, the only rules (in standard C) are: a) int isn't smaller than a short or bigger than a long, and b) int has at least 16 bit.
It's not uncommon on a 64bit platform to keep int as 32bits for compatibility.
Passing by reference is more efficient than passing by value when the value to be passed is larger than the size of the reference. It makes a lot of sense to pass by reference if what you are passing is a large struct/object. It also makes sense to pass a reference if you want to make persistent modifications to your value.
Passing by reference is more efficient because no data (other than the pointer) needs to be copied. This means that this is only more efficient when passing classes with many fields or structs or any other data that is larger than a pointer on the system used.
In the case you mentioned it could indeed be more efficient to not use a pointer because the actual value is smaller than a pointer to it (at least on the machine you were using).
Bare in mind that on a 32 bit-machine a pointer has 4 bytes (4*8 = 32bits) while on the 64-bit machine you were apparently using the pointer has 8 bytes (8*8 = 64bits).
On even older 16 bit machines pointers do only require 2 bytes, maybe there are some embedded systems still using this architecture, but I don't know about this...
In C, a pointer, any pointer, is just a memory address. You're on a 64-bit machine, and at the hardware level, memory addresses are referred to with 64-bit values. This is why 64-bit machines can use much more memory than 32-bit machines.
A pointer to an integer can point at a single integer, but the same pointer can also point at ten, twenty, one hundred or one million integers.
Obviously passing a single 8 byte pointer in lieu of a single 4 byte integer is not a win; but passing a single 8 byte pointer in lieu of one million 4 byte integers certainly is.
One thing has nothing to do with the other. One is an address, a pointer to something, doesnt matter if that is a char or short or int or structure. The other is a language specific thing called an int, which the compiler for that system and that version of compiler and perhaps command line options happens to define as some size.
It appears as if you are running on a 64 bit system so all your pointers/addresses are going to be 64 bit. What they point to is a separate discussion, longs are probably going to be 64 bits as well, sometimes ints, shorts probably still 16 bit but not a hard/fast rule and chars hopefully 8 bits, but also not a hard/fast rule.
Where this can get even worse is cross compiling, while using llvm-gcc, before clang was as solid as it is now. With a 64 bit host the bytecode was all being generated based on the 64 bit host, so 64 bit integers, 64 bit pointers, etc. then when you do the backend for the arm target it had to use compiler library calls for all of this 64 bit work. The variables hardly needed to be shorts much less ints, but were ints. the -m32 switch was broken you still got 64 bit integers due to the host not the ultimate target. gcc directly doesnt appear to have this problem and clang+llvm doesnt currently have this problem either.
The short answer is the language defines some data types char, short, int, long, etc and those data types have a compiler implementation defined size. and address is just another implementation defined data type. it is like asking why is a short not the same number of bytes as a long? Because they are different data types one is a short, one is a long. One is an address the other is a variable, two different things.
http://developers.sun.com/solaris/articles/ILP32toLP64Issues.html
When converting 32-bit programs to 64-bit programs, only long types
and pointer types change in size from 32 bits to 64 bits; integers of
type int stay at 32 bits in size.
In 64-bit executables, pointers are 64-bits. Long ints are also 64-bits, but ints are only 32-bits.
In 32-bit executables, pointers, int's and long ints are all 32-bits. 32-bit executables also support 64-bit "long long" ints.
A pointer must be able to reference all of memory. If the (virtual) memory is larger than 4 Gi bytes or so, then the pointer must be more than 32 bits.

Resources