What addresses do the pointer store? - c

I am currently in the learning process of pointers in C.
I came to know that, a pointer is a variable which stores the address of another variable.
So when I did something like,
#include <stdio.h>
int main()
{
int x = 10;
int *ptr;
ptr = &x;
printf("%d" ,ptr);
The above gave me the address in integer values.
My question is, the pointer variable ptr stores address of variable of type int.
As per my PC, int is taking 4 bytes which is 32 bits. As per my understanding each bit has a separate memory address.
So what is the address pointer will point to? Will it point to the first bits memory address or something else? Please let me know.
Please correct me if my understanding is wrong.

A memory address is the location of one byte. A 32-bit location is the location on a four byte boundary.
So memory address 0x0000 is equal to the first 32-bit memory location. Address 0x0004 would be equal to the next four-byte boundary or, in other words, the next 32-bit location.
Then that just leaves the issue of big-endian and little-endian.

On systems like x86, each individual byte has its own address. For multi-byte objects like ints or doubles, the address of the object is the address of its first byte. On a little-endian system like x86, the first byte is the least-significant byte, while on a big-endian system like Power the first byte is the most significant byte:
int x = 0x01234567;
A A+1 A+2 A+3 big-endian
+––––––+––––––+––––––+––––––+
| 0x01 | 0x23 | 0x45 | 0x67 |
+––––––+––––––+––––––+––––––+
A+3 A+2 A+1 A little-endian
Most architectures have alignment restrictions such that multi-byte entities must have an address that is a multiple of 2 or 4. This is why struct types may have "padding" bytes between members.
Addressing and byte ordering are a function of the underlying architecture, not the C language. There are some word-addressed systems where each individual byte does not have its own address, so pointers to smaller types like char may need to include an offset into the word. Representation of pointer types can vary.
Unless you’re working on bare metal, the address values you’re working with are virtual addresses, not physical.

As per my understanding each bit has a separate memory address.
No, every byte has a memory address. You'll have to use an additional offset to get individual bits. You cannot make a pointer point at a single bit.
So what is the address pointer will point to? Will it point to the first bits memory address or something else? Please let me know.
Nope. It points to the object as a whole. You cannot say which byte in the object, because that depends on endianness.
Furthermore, when compiled with -Wall -Wextra this gives a warning.
warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘int *’
You're using the wrong format specifier for printing a pointer. %d is for int. Your pointer, however, has the type int* which is not int. If you want to print the address, use this instead:
printf("%p", (void*)ptr);

The minimum addressable unit is byte. Independent on how much bytes an object of the type int occupies a pointer to such an object points to the first byte of the extent of memory occupied by the object.
From the C Standard
3.5
1 bit
unit of data storage in the execution environment large enough to hold
an object that may have one of two values
2 NOTE It need not be possible to express the address of each
individual bit of an object.
3.6
1 byte
addressable unit of data storage large enough to hold any member of
the basic character set of the execution environment
Pay attention to that this call
printf("%d" ,ptr);
invokes undefined behavior.
If you want to output the value of a pointer you should use the conversion specifier p. For example
printf("%p\n" , ( void * )ptr);

Related

C programming pointer, bytes and memory allocation have 8 question

I am trying to get the bytes and pointers and how they are stored can any one explain or answer some of my questions. Thank you
int num = 513; <-- allocating a 4 bit memory by initializing
//[01][02][00][00] <-- (numbers are sorted and shown as litle endian)
char * ptr = &num; //char is (one byte)
↓
//[01][02][00][00]
// pointer always start from the [0] (as in this array byte length)
// in the allocated address in the memory ptr[0] is in this case = [01]
// (printed as %x02 "printf("the byte %02x\n",ptr[0]);" - if it's only
//single number 1 a zero will be added on the length so it prints out as 01)
int * ptr = &num; //now creating a pointer with the type of int (four bytes)
↓ ↓ ↓ ↓
//[01][02][00][00]
how can i access the first byte of this int pointer? [question01]
is there a way to see the bites inside the of the first byte([01])? [question02]
where does the pointer save the address? does it have to allocate a memory space in the ram to save whe address such as 0x233828ff21 and if so this(0x233828ff21) address requires a lot of bytes? [question03]
where does this int pointer stores it's type length (4bytes)? [question05]
what happens if i declare a type with longer byte memory allocation such as long long * ptr = &num; [01][02][00][00][00][00][00][00]
since i am pointing a long long to a 4 byte int, can those 4 last already been allocated by another program and in use? can i read it? [question06]
binary are only 0 and 1 and whether one of those(0 or 1) is called a bite? [question07]
one byte is 8 bits right? why am i getting 16 bits 0000000000000001 when converting the number 1 in this website (https://www.rapidtables.com/convert/number/decimal-to-binary.html) shouldn't it be 8? [question08]
Note: char * ptr = &num; should really be unsigned char * ptr = (unsigned char *)&num; to avoid compiler warnings and to ensure that the bytes are treated as unsigned values.
how can i access the first byte of this int pointer? [question01]
If you really want to access the first byte of a pointer, you can use:
unsigned char *ptr2 = (unsigned char *)&ptr;
then ptr2[0] is the first byte of the pointer ptr.
is there a way to see the bites inside the of the first byte([01])? [question02]
I assume you mean the bits inside the first byte. Bits are not directly addressable, so you need an expression (usually with bit-wise operators) to get the value each bit. For example, (ptr[m] >> n) & 1 will be the value of the nth bit of the mth byte of an object (where ptr is an unsigned char * pointing to the start of the object).
where does the pointer save the address? does it have to allocate a memory space in the ram to save whe address such as 0x233828ff21 and if so this(0x233828ff21) address requires a lot of bytes? [question03]
Addresses are stored in pointer variables in the same way as numbers are stored in variables of numeric type. At the CPU instruction level, there is no difference between a stored pointer value and a stored integer value, other than the width.
The most typical sizes of pointer types are 8 bytes or 4 bytes, depending on the target architecture of the compiler.
(There is no question04.)
where does this int pointer stores it's type length (4bytes)? [question05]
It doesn't store the length of the type, but the compiler knows that a TYPE * points to an object that is sizeof(TYPE) bytes long.
what happens if i declare a type with longer byte memory allocation such as long long * ptr = # [01][02][00][00][00][00][00][00] since i am pointing a long long to a 4 byte int, can those 4 last already been allocated by another program and in use? can i read it? [question06]
If the pointer is not correctly aligned for the referenced type (long long) then the behavior is undefined. Otherwise it can be converted back to the original pointer type int *. In any case, accessing *ptr will result in undefined behavior (unless long long is the same width as int, which is not typical).
binary are only 0 and 1 and whether one of those(0 or 1) is called a bite? [question07]
It is called a bit. There is also a type called _Bool. Expressions of type _Bool always have the value 0 or 1.
one byte is 8 bits right? why am i getting 16 bits 0000000000000001 when converting the number 1 in this website (https://www.rapidtables.com/convert/number/decimal-to-binary.html) shouldn't it be 8? [question08]
Who cares what some random web-site displays?
What C calls a "byte" is any type where sizeof(type) is 1, including char, signed char and unsigned char. It is at least 8 bits wide, but is wider than 8 bits on some exotic systems.
A pointer of character type (char *, signed char * or unsigned char *) can be used to access the individual bytes within any object, but that might not be true for pointers of other size 1 types, and is certainly not true for pointer to _Bool (_Bool *)!
• how can i access the first byte of this int pointer? [question01]
Generally, it is preferable to use unsigned char rather than char to access arbitrary bytes, so let’s do that.
After unsigned char *ptr = &num;, ptr is a pointer to unsigned char, and you could access the first byte of the int with *ptr or ptr[0], as in printf("The first byte, in hexadecimal, is 0x%02hhx.\n", *ptr);.
If instead you have int *ptr = &num;, there is no direct way to access the first byte. ptr here is a pointer to an int, and, to access an individual byte, you need a pointer to an unsigned char or other single-byte type. You could convert ptr to a pointer to unsigned char, as with (unsigned char *) ptr, and then you can access the individual byte with * (unsigned char *) ptr.
• is there a way to see the bites inside the of the first byte([01])? [question02]
The C standard does not provide a way to display the individual bits of a byte. Commonly programmers print the values in hexadecimal, as above, and read the bits from the hexadecimal digits. You can also write your own routine to write binary output from a byte.
• where does the pointer save the address? does it have to allocate a memory space in the ram to save whe address such as 0x233828ff21 and if so this(0x233828ff21) address requires a lot of bytes? [question03]
A pointer is a variable like your other int and char variables. It has space of its own in memory where its value is stored. (This model of variables having memory is used to specify the behavior of C programs. When a program is optimized by a compiler, it may change this.)
In current systems, pointers are commonly 32 or 64 bits (four or eight 8-bit bytes), depending on the target architecture. You can find out which with printf("The size of a 'char *' is %zu bytes.\n", sizeof (char *));. (The C standard allows pointers of different types to be different sizes, but that is rare in modern C implementations.)
• where does this int pointer stores it's type length (4bytes)? [question05]
The compiler knows the sizes of pointers. The pointer itself does not store the length of the thing it is pointing to. The compiler simply generates appropriate code when you use the pointer. If you use *ptr to get the value that a pointer points to, the compiler will generate a load-byte instruction if the type of ptr is char *, and it will generate a load-four-byte instruction of the type of ptr is int * (and int is four bytes in your C implementation).
• what happens if i declare a type with longer byte memory allocation such as long long * ptr = # [01][02][00][00][00][00][00][00] since i am pointing a long long to a 4 byte int, can those 4 last already been allocated by another program and in use? can i read it? [question06]
When long long is an eight-byte integer, and you have a long long *ptr that is pointing to a four-byte integer, the C standard does not define the behavior when you attempt to use *ptr.
In general-purpose multi-user operating systems, the memory after the int cannot be allocated by another program (unless this program and the other program have both arranged to share memory). Each process is given its own virtual address space, and their memory is kept separate.
Using this long long *ptr in your program may access memory beyond that of the int. This can cause various types of bugs in your program, including corrupting data and alignment errors.
• binary are only 0 and 1 and whether one of those(0 or 1) is called a bite? [question07]
One binary digit is a “bit”. Multiple binary digits are “bits”.
The smallest group of bits that a particular computer operates on as a unit is a “byte”. The size of a byte can vary; early computers had bytes of different sizes. Modern computers almost all use eight-bit bytes.
If your program includes the header <limits.h>, it defines a macro named CHAR_BIT that provides the number of bits in a byte. It is eight in almost all modern C implementations.
• one byte is 8 bits right? why am i getting 16 bits 0000000000000001 when converting the number 1 in this website (https://www.rapidtables.com/convert/number/decimal-to-binary.html) shouldn't it be 8? [question08]
The web site is not merely converting to one byte.
It seems to show at least 16 bits, choosing the least of 16, 32, or 64 bits that the value fits in as a signed integer type.

How do bytes and addresses correlate in C?

I have seen that there are many questions related to this topic, but I could not infer an answer, so I decided to ask my first question here on stack overflow. Currently, my question is regarding the bytes and addresses, does each address actually represent one address, meaning that if I would initialize one address e.g. 0x55555555d156 but if I were to initialize an int, it would take 4 addresses, meaning that it will range from e.g. 0x55555555d156 to 0x55555555d160 ? So what confuses me is that, a pointer will hold an address, right?
Let's say the pointer holds the address e.g. 0x55555555d156 and if I were to deference that address I would get the value of that int, right? what about the other 3 addresses, if I deference them? I could not manage to acquire that information by writing a C program.
if I were to deference that address I would get the value of that int, right?
Yes.
what about the other 3 addresses, if I deference them?
If you have int *p = &some_integer;, then *(int *)((char *)p + 1) (dereferencing p "shifted" by one byte) would attempt to read 4 bytes from that new address and interpret them as an integer. Whether your program has permission to read that last byte that's right next to some_integer in memory, is another story: if it doesn't, you'll get a segmentation fault or other memory access issues.
Or you may get no errors and read garbage data.
Example
#include <stdio.h>
int main(void) {
int my_int = 0x12345678;
int *ptr = &my_int;
printf("%x\n", *ptr);
printf("%x\n", *(int *)((char *)ptr + 1));
}
Output:
~/test $ clang so.c && ./a.out
12345678
80123456
^^
|-- This "random" byte was read as part of
--- the "new" int shifted by 1 byte
Different microprocessors have different addressable units of memory. Most, including the x86 series and ARM, are addressable in units of one byte. So, for example, a 32-bit int will be stored in four consecutive memory addresses as you say (LSB first, unless the ARM is set to "Big Endian" mode).
Other processors, like PIC, may have one address point to a 16-bit memory word.
Your C code should probably not make assumptions either way, unless you're sure what the code will be run on.
You can't "dereference an address" -- you can dereference a pointer. A pointer value is an address, but the pointer also has a type. The result of the dereference depends on the pointer's type.
The result of derefencing a pointer is NOT the value stored at the memory location being pointed to. It is an expression that designates an object. This is known as an lvalue in C .
If all of this is unclear; first check that you understand what is happening in the code:
int x = 0;
x = 5;
In the second line, the use of x does not retrieve the value 0. The expression x is an lvalue which means that it designates a region of memory consisting of several bytes. Each byte has its own address. If you output &x you will likely see the same result as if you output the address of the first byte of x (although this is not a Standard requirement), but the types are different.
Whether or not the stored value is retrieved when an lvalue expression appears in the code, depends on the context of the expression. For example if it appears on the left-hand side of the assignment operator, the value is not retrieved.
Once you have understood x = 5; , then *p = 5; behaves identically; the meaning of *p is exactly as if the label x existed for the memory region that p points to.

Does pointer only store memory address?

I am learning C programming language these days. I have a question about pointer.
The textbook said pointer stores memory address, and using printf("%p",pointer) we can show where this pointer points in our memory.
But every pointer alse has a type, like int *pointer, long *p and so on. int *pointer means "p is a pointer to int".
My question if we write
int *p,i;
p=&i;
*p=99;
if the pointer only contains the address information, how could the programme know how many digits should be used for storing integer 99? Because an integer could be 16 bits int or 32 bits long.
So I was wondering if an int pointer in memory not only stores address information, but also stores the type information?
Because an integer could be 16 bits int or 32 bits long.
An integer could, but an int could not. Regardless of how large that is exactly in your environment, its size is set in stone (within that environment) and doesn't change at run time. An int * only points to an int, not to a long. Note that if there were any such problems, they would affect int x; equally.
So a pointer really only stores the memory address. Information about the size of the pointee is in the type (just as a non-pointer variable's type tells the compiler how large that variable is).
Make sure you don't confuse what the hardware does with what the compiler does. A pointer is an address to a memory location as far as the hardware is concerned. What that memory location contains, and how long that data is, does not matter. A pointer does not store anything either. It points to a location and that's all. In assembly, this would be similar to using a register that points to a memory location.
The compiler is what tracks the type of data contained at that pointed to location. It is the compiler's job to save you from making type errors. This is where some people complain that you can shoot yourself in the foot with C. It's possible to have a pointer point to a data location where that data can be most anything and any length.
So I was wondering if an int pointer in memory not only stores address information, but also stores the type information?
No, a pointer is a memory address (or in some cases, a value analogous to a memory address). A pointer doesn't contain the data -- it points to the data. In your example, the data is stored in another location (given the name i), and p contains the address of i.
how could the programme know how many digits should be used for storing integer 99?
All the bits (binary digits) of the value type are used to store the stored value. Any given type (like int) has a fixed size. int can have different sizes depending on the system being used, but the size is always determined when the code is compiled. That is, the type int could be 16 bits, 32 bits, 64 bits, or some other size, but the compiler will always use a single size for compiling an entire program.
The type information is retained in the source code code and is used by the compiler to perform type checking and to generate appropriate code, the type is implicit in the generated code rather than explicitly stored as data.

How does pointer type casting work in c

I understand that in c, a pointer points to a memory address. In the following code
char *cp;
someType *up; // Assuming that someType is a union of size 16 bytes
cp = sbrk(nu * sizeof(someType));
// Here is what confuses me
up = (someType *)cp;
According to this link http://en.wikibooks.org/wiki/C_Programming/Pointers_and_arrays
If cp points to an address of 0x1234, so 0x1234 should be the beginning of the newly allocated memory, right?
So when "cp" is casted to a pointer to someType and assigned to "up", it actually says "up" is a pointer that points to 0x1234, assumes in 32-bits system, each memory address takes 4 bytes, a someType object will use 4 memory address to store its value, so the addresses 0x1234, 0x1235, 0x1236, 0x1237 collectively store a someType object, is this correct?
Thanks in advance.
Pointer type casting doesn't do anything at the machine level. The value in memory is still just a plain old value in memory.
At the language level, that value is used as an address. In other words it is often used as the value passed to operations that require memory locations, such as the assembly "load" operation.
At the language level, extra constructs are added to the machine operations, and one of those "extras" are data types. The compiler is responsible for checking if type-rule violations exist, but at run time there is not such a constraint.
As a result, the cast does nothing at run time, but at compile time it directs the compiler to not emit an error as the type of the pointer's value will now be considered a compatible type to the variable holding the address.
Even on a 32-bit system, a memory address typically refers to a single byte, not four. So your Header will be covering the 16 memory addresses between 0x1234 and 0x1243.
To answer your questions:
cp points to the beginning of the newly allocated memory.
However, up will contain the same address as cp. Note it is bad form to try and assign a pointer from one type to a different type without "casting" it first - just to tell the compiler you really meant to do such a dangerous thing.
Typically assigning between types can cause all kinds of problems - as a result compilers will throw warnings if you do this without casting. The casting, however, doesn't actually have any effect on the program code itself - it is more a method of the programmer telling the compiler they really meant what they did.
By the way, the pointer up is allocated on the stack in your function. Assigning a value to up actually places the value in the memory location on the stack. So by assigning cp to up you are merely copying the value of the pointer in cp to the location in the stack assigned to up. There is nothing placed into the memory allocated by the sbrk() call.
First of all, remember that a pointer is an abstraction of a memory address; do not assume a one-to-one correspondence between a pointer value and a physical location in RAM. Pointers also have type semantics associated with them; adding 1 to a pointer value advances it to point to the next object of the pointed-to type:
char *cp;
int *ip;
struct huge *hp;
cp = cp + 1; // advances cp to the address of the next char (1 byte)
ip = ip + 1; // advances ip to the address of the next int (anywhere
// from 2 to 8 bytes depending on the architecture)
hp = hp + 1; // advances hp to the address of the next struct huge (however
// many bytes struct huge takes up)
This is exactly how array indexing works; the expression a[i] is treated as *(a + i); you offset i elements of whatever size from the base address, not i bytes.
As far as the cast is concerned...
A pointer to char is a different, incompatible type than a pointer to Header. They may have different sizes and representations depending on the underlying architecture. Because of this, the language won't allow an implicit conversion of one to the other through a simple assignment; you must use a cast to convert the rhs value to the type expected by the lhs.
If cp points to a address of 0x1234, so 0x1234 should be the beginning of the newly allocated memory, right?
Right.
So when cp is casted to a pointer to Header and assigned to up, it actually says up is a pointer that points to 0x1234, assumes in 32-bits system, each memory address takes 4 bytes, a Header object will use 4 memory address to store its value, so the addresses 0x1234, 0x1235, 0x1236, 0x1237 collectively store a Header object, is this correct?
Wrong. C is strictly byte addressed. The Header object will occupy addresses 0x1234 through 0x1234 + sizeof(Header) - 1, inclusive, whatever sizeof(Header) is (perhaps 16 in this case). To confuse the issue, adding 1 to any pointer increments its numeric value by the sizeof whatever it points to, so it is the case that up + 1 points beyond the end of allocated memory. However cp + 1 points to the second byte in the representation of the header object, whatever that is. (sizeof(char) is 1 by definition.)
This has nothing to do with your question, but I must warn you that calling sbrk with a nonzero argument will cause your program to crash at some indefinite point after the next call to malloc, realloc or free, and that nearly all standard library functions are allowed to call those functions "under the hood". If you want to make a large allocation directly from the operating system, use mmap.

Memory allocated without allocation using malloc, how?

Below is a simple code snippet:
int main()
{
int *p;
p=(int*)malloc(sizeof(int));//allocate m/y 4 1 int
printf("P=%p\tQ=%p",p,p+2);
}
In one sample run, it gave me the output as below:
P=0x8210008 Q=0x8210010
Starting address of P is-P=0x8210008,next byte is 0x8210009,next byte is 0x821000A,next byte is 0x821000B.So the 4 bytes for int is ending there.
We haven't allocated more memory using malloc.
Then how is p+2 leading us to 0x8210010,which is 8 bytes after P(0x8210008).
Because it's treating it as an integer-element offset from the pointer. You have allocated an array for a single integer. When you ask for p+2 it's the same as &p[2]. If you want two bytes from the beginning, you need to cast it to char* first:
char *highWordAddr = (char*)p + 2;
First, the fact that you have printed an address does not imply that memory is allocated at that address. You have simply added numbers and produced other numbers.
Second, the reason that you number you got by adding two was eight greater than the base address instead of two greater than the base address was because, when you add integers to pointers in C, the arithmetic is done in terms of pointed-to elements, not in terms of bytes in memory (unless the pointed-to elements are bytes). Suppose you have an array of int, say int x[8], and you have a pointer to x[3]. Adding two to that pointer produces a pointer to x[5], not a pointer to two bytes beyond the start of x[3]. It is important to remember that C is an abstraction, and the C standard specifies what happens inside that abstraction. Inside the C abstraction, pointer arithmetic works on numbers of elements, not on raw memory addresses. The C implementation (the compiler and the tools that turn C code into program execution) is required to perform whatever operations on raw memory addresses are required to implement the abstraction specified by the C standard. Typically, that means the compiler multiplies an integer by the size of an element when adding it to a pointer. So two is multiplied by four (on a machine where an int is four bytes), and the eight that results is added to the base address.
Third, you cannot rely on this behavior. The C standard defines pointer arithmetic only for pointers that point to objects inside arrays, including one fictitious object at the end of the array. Additionally, pointers to individual objects act like arrays of one element. So, if you have a pointer p that points to an int, you are allowed to calculate p+0 or p+1, because they point to the only object in the array (p+0) and the fictitious object one beyond the last element in the array (p+1). You are not allowed to calculate p-1 or p+2, because these are outside the array. Note that this is not a matter of dereferencing the pointer (attempting to read or write memory at the calculated address): Even merely calculating the address results in behavior that is not defined by the C standard: Your program could crash, it could give you “correct” results, or it could delete all files in your account, and all of those behaviors would be conforming to the C standard.
It is unlikely that merely calculating an out-of-bounds address would produce such weird behavior. However, the standard permits it because some computer processors have unusual address schemes that require more work than simple arithmetic. Perhaps the second-most common address scheme after the flat address space is a base address and offset scheme. In such a scheme, the high 16 bits of a four-byte pointer might contain a base address, and the low 16 bits might contain an offset. For a given base address b and offset o, the corresponding virtual address might be 4096*b+o. (Such a scheme is capable of addressing only 220 bytes, and many different values of base and offset can refer to the same address. For example, base 0 and offset 4096 refer to the same address as base 1 and offset 0.) With a base-and-offset scheme, the compiler might implement pointer arithmetic by adding only to the offset and ignoring the base. (Such a C implementation can support arrays only up to 65536 bytes, the extent addressable by the offset alone.) In such an implementation, if you have pointer-to-int p with an encoding of 0x0000fffc (base 0, offset 65532), and int is four bytes, then p+2 will have the value 0x00000004, not the value that is eight greater (0x00010004).
That is an example where pointer arithmetic produces values that you would not expect from a flat-address machine. It is harder to imagine an implementation where pointer arithmetic that is not valid according to the C standard would produce a crash. However, consider an implementation in which memory must be manually swapped by a process, because the processor does not have the hardware to support virtual memory. In such an implementation, pointers might contain addresses of structures in memory that describe disk locations and other information used to manage the memory swapping. In such an implementation, doing pointer arithmetic might require reading the structures in memory, and so doing invalid pointer arithmetic might reading invalid addresses.
C is happy to let you do whatever pointer arithmetic you like. Just because p+2 looks like any other address doesn't mean it's valid. In fact, in this case, it's not.
Be very careful any time you see pointer arithmetic that you're not going outside your allocated bounds.
This is called pointer arithmetic. http://www.learncpp.com/cpp-tutorial/68-pointers-arrays-and-pointer-arithmetic/

Resources