How are pointers stored in memory? - c

I'm a little confused about this.
On my system, if I do this:
printf("%d", sizeof(int*));
this will just yield 4. Now, the same happens for sizeof(int). Conclusion: if both integers and pointers are 4 bytes, a pointer can be safely "converted" to an int
(i.e. the memory it points to could be stored in an int). However, if I do this:
int* x;
printf("%p", x);
The returned hex address is far beyond the int scope, and thus any attempt to store the value in an int fails obviously.
How is this possible? If the pointer takes 4 bytes of memory, how can it store more than 232?
EDIT:
As suggested by a few users, I'm posting the code and the output:
#include <stdio.h>
int main()
{
printf ("%d\n", sizeof(int));
printf ("%d\n", sizeof(int*));
int *x;
printf ("%d\n", sizeof(x));
printf ("%p\n", x);
}
The output:
4
4
4
0xb7778000

C11, 6.3.2.3, paragraphs 5 and 6:
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
So the conversions are allowed, but the result is implementation defined (or undefined if the result cannot be stored in an integer type). (The "previously specified" is referring to NULL.)
In regards to your print statement for a pointer printing something larger than what 4 bytes of data can represent, this is not true, as 0xb7778000 is within range of a 32 bit integral type.

The returned hex address is far beyond the int scope, and thus any attempt to store the value in an int fails obviously.
4
4
4
0xb7778000
And 0xb7778000 is a 32-bit value, so an object of 4 bytes can hold it.

No, they cannot be "safely" converted. Certainly they use the same amount of storage space, but there is no guarantee that they interpret a number of set bits in the same manner.
As for the second question (and one question per question please), there is no guaranteed size for int, or for a pointer. An int is roughly the optimum size of data transfer on the bus (also known as a word). It can differ on different platforms, but must be relatively (equal or) larger than a short or char. This is why there are standard definitions for MAX_INT, but not a standard "value" for the definition.
A pointer is roughly the number of bits wide as necessary to access a memory location. The old original PC's had a 8 bit bus, but a 12 bit pointer (due to some fancy bit-shifting) to extend it's memory range past its bus size.

Related

what happens when we type cast from lower datatype to higher datatype

Will the accessibility of memory space get changed or just informing the compiler take the variable of mentioned type?
Example:
int main()
{
char a;
a = 123456789;
printf("ans is %d\n",(int)a);
}
Output:
overflow in implicit constant conversion a= 123456789.
ans is 21.
Here I know why it's causing overflow. But I want to know how memory is accessed when an overflow occurs.
This is kind of simple: Since char typically only holds one byte, only a single byte of 123456789 will be copied to a. Exactly how depends on if char is signed or unsigned (it's implementation-specific which one it is). For the exact details see e.g. this integer conversion reference.
What typically happens (I haven't seen any compiler do any different) is that the last byte of the value is copied, unmodified, into a.
For 123456789, if you view the hexadecimal representation of the value it will be 0x75bcd15. Here you can easily see that the last byte is 0x15 which is 21 in decimal.
What happens with the casting to int when you print the value is actually nothing that wouldn't happen anyway... When using variable-argument functions like printf values of a smaller type than int will be promoted to an int. Your printf call is exactly equal to
printf("ans is %d\n",a);

unexpected byte order after casting pointer-to-char into pointer-to-int

unsigned char tab[4] = 14;
If I print as individual bytes...
printf("tab[1] : %u\n", tab[0]); // output: 0
printf("tab[2] : %u\n", tab[1]); // output: 0
printf("tab[3] : %u\n", tab[2]); // output: 0
printf("tab[4] : %u\n", tab[3]); // output: 14
If I print as an integer...
unsigned int *fourbyte;
fourbyte = *((unsigned int *)tab);
printf("fourbyte : %u\n", fourbyte); // output: 234881024
My output in binary is : 00001110 00000000 00000000 00000000, which is the data I wanted but in this order tab[3] tab[2] tab[1] tab[0].
Any explanation of that, why the unsigned int pointer points to the last byte instead of the first ?
The correct answer here is that you should not have expected any relationship, order or otherwise. Except for unions, the C standard does not define a linear address space in which objects of different types can overlap. It is the case on many architecture/compiler-tool-chain combinations that these coincidences can occur from time to time, but you should never rely on them. The fact that by casting a pointer to a suitable scalar type yields a number comparable to others of the same type, in no-way implies that number is any particular memory address.
So:
int* p;
int z = 3;
int* pz = &z;
size_t cookie = (size_t)pz;
p = (int*)cookie;
printf("%d", *p); // Prints 3.
Works because the standard says it must work when cookie is derived from the same type of pointer that it is being converted to. Converting to any other type is undefined behavior. Pointers do not represent memory, they reference 'storage' in the abstract. They are merely references to objects or NULL, and the standard defines how pointers to the same object must behave and how they can be converted to scalar values and back again.
Given:
char array[5] = "five";
The standard says that &(array[0]) < &(array[1]) and that (&(array[0])) + 1) == &(array[1]), but it is mute on how elements in array are ordered in memory. The compiler writers are free to use whatever machine codes and memory layouts that they deem are appropriate for the target architecture.
In the case of unions, which provides for some overlap of objects in storage, the standard only says that each of its fields must be suitably aligned for their types, but just about everything else about them is implementation defined. The key clause is 6.2.6.1 p7:
When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
The gist of all of this is that the C standard defines an abstract machine. The compiler generates an architecture specific simulation of that machine based on your code. You cannot understand the C abstract machine through simple empirical means because implementation details bleed into your data set. You must limit your observations to those that are relevant to the abstraction. Therefore, avoid undefined behavior and be very aware of implementation defined behaviors.
Your example code is running on a computer that is Little-Endian. This term means that the "first byte" of an integer contains the least significant bits. By contrast, a Big-Endian computer stores the most significant bits in the first byte.
Edited to add: the way that you've demonstrated this is decidedly unsafe, as it relies upon undefined behavior to get "direct access" to the memory. There is a safer demonstration here

C Pointer Arithmetic for Unusual Architectures

I'm trying to get a better understanding of the C standard. In particular I am interested in how pointer arithmetic might work in an implementation for an unusual machine architecture.
Suppose I have a processor with 64 bit wide registers that is connected to RAM where each address corresponds to a cell 8 bits wide. An implementation for C for this machine defines CHAR_BIT to be equal to 8. Suppose I compile and execute the following lines of code:
char *pointer = 0;
pointer = pointer + 1;
After execution, pointer is equal to 1. This gives one the impression that in general data of type char corresponds to the smallest addressable unit of memory on the machine.
Now suppose I have a processor with 12 bit wide registers that is connected to RAM where each address corresponds to a cell 4 bits wide. An implementation of C for this machine defines CHAR_BIT to be equal to 12. Suppose the same lines of code are compiled and executed for this machine. Would pointer be equal to 3?
More generally, when you increment a pointer to a char, is the address equal to CHAR_BIT divided by the width of a memory cell on the machine?
Would pointer be equal to 3?
Well, the standard doesn't say how pointers are implemented. The standard tells what is to happen when you use a pointer in a specific way but not what the value of a pointer shall be.
All we know is that adding 1 to a char pointer, will make the pointer point at the next char object - where ever that is. But nothing about pointers value.
So when you say that
pointer = pointer + 1;
will make the pointer equal 1, it's wrong. The standard doesn't say anything about that.
On most systems a char is 8 bit and pointers are (virtual) memory addresses referencing a 8 bit addressable memory loacation. On such systems incrementing a char pointer will increase the pointer value (aka memory address) by 1. However, on - unusual architectures - there is no way to tell.
But if you have a system where each memory address references 4 bits and a char is 12 bits, it seems a good guess that ++pointer will increase the pointer by three.
Pointers are incremented by the minimum of they width of the datatype they "point to", but are not guaranteed to increment to that size exactly.
For memory alignment purposes, there are many times where a pointer might increment to the next memory word alignment past the minimum width.
So, in general, you cannot assume this pointer to be equal to 3. It very well may be 3, 4, or some larger number.
Here is an example.
struct char_three {
char a;
char b;
char c;
};
struct char_three* my_pointer = 0;
my_pointer++;
/* I'd be shocked if my_pointer was now 3 */
Memory alignment is machine specific. One cannot generalize about it, except that most machines define a WORD as the first address that can be aligned to a memory fetch on the bus. Some machines can specify addresses that don't align with the bus fetches. In such a case, selecting two bytes that span the alignment may result in loading two WORDS.
Most systems don't accept WORD loads on non-aligned boundaries without complaining. This means that a bit of boiler plate assembly is applied to translate the fetch to the proceeding WORD boundary, if maximum density is desired.
Most compilers prefer speed to maximum density of data, so they align their structured data to take advantage of WORD boundaries, avoiding the extra calculations. This means that in many cases, data that is not carefully aligned might contain "holes" of bytes that are not used.
If you are interested in details of the above summary, you can read up on Data Structure Alignment which will discuss alignment (and as a consequence) padding.
char *pointer = 0;
After execution, pointer is equal to 1
Not necessarily.
This special case gives you a null pointer, since 0 is a null pointer constant. Strictly speaking, such a pointer is not supposed to point at a valid object. If you look at the actual address stored in the pointer, it could be anything.
Null pointers aside, the C language expects you to do pointer arithmetic by first pointing at an array. Or in case of char, you can also point at a chunk of generic data such as a struct. Everything else, like your example, is undefined behavior.
An implementation of C for this machine defines CHAR_BIT to be equal to 12
The C standard defines char to be equal to a byte, so your example is a bit weird and contradicting. Pointer arithmetic will always increase the pointer to point at the next object in the array. The standard doesn't really speak of representation of addresses at all, but your fictional example that would sensibly increase the address by 12 bits, because that's the size of a char.
Fictional computers are quite meaningless to discuss even from a learning point-of-view. I'd advise to focus on real-world computers instead.
When you increment a pointer to a char, is the address equal to CHAR_BIT divided by the width of a memory cell on the machine?
On a "conventional" machine -- indeed on the vast majority of machines where C runs -- CHAR_BIT simply is the width of a memory cell on the machine, so the answer to the question is vacuously "yes" (since CHAR_BIT / CHAR_BIT is 1.).
A machine with memory cells smaller than CHAR_BIT would be very, very strange -- arguably incompatible with C's definition.
C's definition says that:
sizeof(char) is exactly 1.
CHAR_BIT, the number of bits in a char, is at least 8. That is, as far as C is concerned, a byte may not be smaller than 8 bits. (It may be larger, and this is a surprise to many people, but it does not concern us here.)
There is a strong suggestion (if not an explicit requirement) that char (or "byte") is the machine's "minimum addressable unit" or some such.
So for a machine that can address 4 bits at a time, we would have to pick unnatural values for sizeof(char) and CHAR_BIT (which would otherwise probably want to be 2 and 4, respectively), and we would have to ignore the suggestion that type char is the machine's minimum addressable unit.
C does not mandate the internal representation (the bit pattern) of a pointer. The closest a portable C program can get to doing anything with the internal representation of a pointer value is to print it out using %p -- and that's explicitly defined to be implementation-defined.
So I think the only way to implement C on a "4 bit" machine would involve having the code
char a[10];
char *p = a;
p++;
generate instructions which actually incremented the address behind p by 2.
It would then be an interesting question whether %p should print the actual, raw pointer value, or the value divided by 2.
It would also be lots of fun to watch the ensuing fireworks as too-clever programmers on such a machine used type punning techniques to get their hands on the internal value of pointers so that they could increment them by actually 1 -- not the 2 that "proper" additions of 1 would always generate -- such that they could amaze their friends by accessing the odd nybble of a byte, or confound the regulars on SO by asking questions about it. "I just incremented a char pointer by 1. Why is %p showing a value that's 2 greater?"
Seems like the confusion in this question comes from the fact that the word "byte" in the C standard doesn't have the typical definition (which is 8 bits). Specifically, the word "byte" in the C standard means a collection of bits, where the number of bits is specified by the implementation-defined constant CHAR_BITS. Furthermore, a "byte" as defined by the C standard is the smallest addressable object that a C program can access.
This leaves open the question as to whether there is a one-to-one correspondence between the C definition of "addressable", and the hardware's definition of "addressable". In other words, is it possible that the hardware can address objects that are smaller than a "byte"? If (as in the OP) a "byte" occupies 3 addresses, then that implies that "byte" accesses have an alignment restriction. Which is to say that 3 and 6 are valid "byte" addresses, but 4 and 5 are not. This is prohibited by section 6.2.8 which discusses the alignment of objects.
Which means that the architecture proposed by the OP is not supported by the C specification. In particular, an implementation may not have pointers that point to 4-bit objects when CHAR_BIT is equal to 12.
Here are the relevant sections from the C standard:
§3.6 The definition of "byte" as used in the standard
[A byte is an] addressable unit of data storage large enough to hold
any member of the basic character set of the execution environment.
NOTE 1 It is possible to express the address of each individual byte
of an object uniquely.
NOTE 2 A byte is composed of a contiguous sequence of bits, the number
of which is implementation-defined. The least significant bit is
called the low-order bit; the most significant bit is called the
high-order bit.
§5.2.4.2.1 describes CHAR_BIT as the
number of bits for smallest object that is not a bit-field (byte)
§6.2.6.1 restricts all objects that are larger than a char to be a multiple of CHAR_BIT bits:
[...]
Except for bit-fields, objects are composed of contiguous sequences of
one or more bytes, the number, order, and encoding of which are either
explicitly specified or implementation-defined.
[...] Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object of that
type, in bytes.
§6.2.8 restricts the alignment of objects
Complete object types have alignment requirements which place
restrictions on the addresses at which objects of that type may be
allocated. An alignment is an implementation-defined integer value
representing the number of bytes between successive addresses at which
a given object can be allocated.
Valid alignments include only those values returned by an _Alignof
expression for fundamental types, plus an additional
implementation-defined set of values, which may be empty. Every
valid alignment value shall be a nonnegative integral power of two.
§6.5.3.2 specifies the sizeof a char, and hence a "byte"
When sizeof is applied to an operand that has type char, unsigned
char, or signed char, (or a qualified version thereof) the result is
1.
The following code fragment demonstrates an invariant of C pointer arithmetic -- no matter what CHAR_BIT is, no matter what the hardware least addressable unit is, and no matter what the actual bit representation of pointers is,
#include <assert.h>
int main(void)
{
T x[2]; // for any object type T whatsoever
assert(&x[1] - &x[0] == 1); // must be true
}
And since sizeof(char) == 1 by definition, this also means that
#include <assert.h>
int main(void)
{
T x[2]; // again for any object type T whatsoever
char *p = (char *)&x[0];
char *q = (char *)&x[1];
assert(q - p == sizeof(T)); // must be true
}
However, if you convert to integers before performing the subtraction, the invariant evaporates:
#include <assert.h>
#include <inttypes.h>
int main(void);
{
T x[2];
uintptr_t p = (uintptr_t)&x[0];
uintptr_t q = (uintptr_t)&x[1];
assert(q - p == sizeof(T)); // implementation-defined whether true
}
because the transformation performed by converting a pointer to an integer of the same size, or vice versa, is implementation-defined. I think it's required to be bijective, but I could be wrong about that, and it is definitely not required to preserve any of the above invariants.

issue in double pointer address addition

I have got one issue from a open source code in pointers side, which i have tried to replicate in this below small snippet.
int main()
{
int **a=0x0;
printf ("a = %d Add = %d\n", a, a+75);
return 1;
}
Expectation is to get 75/0x4B but this code gives 300 in 32 bit and 600 in 64 bit machines.
Output:
a = 0 Add = 600
But the ideology behind to access the added position i.e 75th position in Hash table.
So it should be
printf ("a = %d Add = %d\n", a, sizeof (a)+75);
But i couldn't able to guess why this 300 or 600 output. could anyone please point out?
I went till a point where there is some left shift internally happening since:
75 - 1001011
600 - 1001011000.
Solutions are appreciated. Thanks in advance.
Pointer arithmetic is always done using the size of what is pointed to. In your case a is a pointer to a pointer to int, so the unit size is sizeof(int*) which in your case seems to be 4 (32 bits). 4 * 75 = 300.
More precisely, a + 75 adds the byte offset sizeof(*a) * 75 (note the dereferencing of a) to the pointer. What happens is that you are effectively doing &a[75], i.e. you're getting a pointer to the 75:th element.
On a slightly related note, when you print pointers with printf you should be using the format "%p", and casting the pointers to void *. See e.g. this printf (and family) reference.
As for the different size on 32 and 64 bit systems, it's to be expected. A pointer on a 32-bit system is typically 32 bits, while on a 64-bit system its 64 bits.
The program behaviour is undefined:
The format specifier %d is not valid for pointer types: use %p instead.
Pointer arithmetic is only valid within and one past the last element for arrays, or one past the address of the scalar for scalars. You can't read a + 75.
First of all, use %p for printing pointers and %zu for a sizeof result.
That said, check the type of a, it is int **, which is the size of a pointer. And, it depends on the platform / compiler.
Pointer arithmetic honors the data type, so the initial pointer is always incremented based on the LHS data type.

strange behavior of scanf for short int

the code is as follows:
#include <stdio.h>
main()
{
int m=123;
int n = 1234;
short int a;
a=~0;
if((a>>5)!=a){
printf("Logical Shift\n");
m=0;
}
else{
printf("Arithmetic Shift\n");
m=1;
}
scanf("%d",&a);
printf("%d\n", m);
}
after the line scanf("%d",&a); the value of m becomes 0.
I know it may be caused by the scanf: a's type is short and the input's type is int. But How can this affect the value of m ?
Thanks a lot !
The most likely reason for m being 0 in your snippet is because you assign m to have this value in the body of your if-statement, but since the code contains undefined behavior no one can say that for sure.
The likely story about passing a short* when scanf expects an int*
Assuming sizeof(short) = 2 and sizeof(int) == 4.
When entering your main function the stack on which the variables reside would normally look something like the below:
_
|short int (a) : scanf will try to read an int (4 bytes).
|_ 2 bytes : This part of memory will most
|int (n) : likely be overwritten
| :..
|
|_ 4 bytes
|int (m)
|
|
|_ 4 bytes
When you read a %d (ie. an int) into the variable a that shouldn't affect variable m, though n will most likely have parts of it overwritten.
Undefined Behavior
Though it's all a guessing game since you are invoking what we normally refer to as "undefined behavior" when using your scanf statement.
Everything the standard doesn't guarantee is UB, and the result could be anything. Maybe you will write data to another segment that is part of a different variable, or maybe you might make the universe implode.
Nobody can guarantee that we will live to see another day when UB is present.
How to read a short int using scanf
Use %hd, and be sure to pass it a short*.. we've had enough of UB for one night!
Assuming that int and short are four- and two-byte integers, respectively, on your platform (which is a likely assumption, but not guaranteed by the standard), you're asking scanf to read in an integer and store it in four bytes: the two bytes of b, and whatever two bytes follow it in memory. (Well, technically this is undefined behavior, and no specific behavior is guaranteed; but that's what it's likely to do.) Apparently your compiler is using the two bytes after b as the first two bytes of m. Which is a bit surprising — I certainly wouldn't expect b and m to be adjacent, and it rather implies that your compiler isn't aligning shorts and ints to the beginning of four-byte blocks — but perfectly legal.
You can see better what's going on if you add
printf("&a: %08X\n&m: %08X\n", (int)&a, (int)&m);
which will show you where a and m are stored, relative to each other. (Just as a test, I mean. You wouldn't want that in "real" code.)
You are correct, %d expects and writes an int. If you enter a value less than 65535, it fits in the bytes outside short, so you see 0 when you print a back. I tried reading a short and printing it back; I entered 65536123, and got 123, which makes perfect sense (65536 occupies precisely 16 bits; you see the remaining 123 through the two bytes of the short). This behavior is dangerous, because the other two bytes of the short end up in a "variable next door" to the short, which is very, very bad. I hope this should convince you not to do it.
P.S. To read a short with scanf, declare a temporary int variable, read the value into it using scanf, and then cast it to short.
You're invoking Undefined Behavior when passing a pointer to a non-int to scanf's %d.
Likely, the compiler introduces padding bytes for alignment purposes and the values get stored in the padding bytes and not the "useful" bytes.
However, the compiler is free to do anything from raise a segfault / access violation to invoke nasal demons.
If you had actually used variable n, then it would probably have been the one that got clobbered, rather than m. Since you didn't use n, the compiler optimized it away, and that means that it was m that got clobbered by scanf() writing 4 bytes (because it was told that it got a pointer to a (4-byte) integer) instead of the 2 bytes. This depends on quite a lot of details of your hardware, such as endian-ness and alignment (if int had to be aligned on a 4-byte boundary, you wouldn't see the problem; I guess you are on an Intel machine rather than, say, PowerPC or SPARC).
Don't fib to your compiler - even accidentally. It will get its own back.

Resources