pointer typecasting - c

int main()
{
int *p,*q;
p=(int *)1000;
q=(int *)2000;
printf("%d:%d:%d",q,p,(q-p));
}
output
2000:1000:250
1.I cannot understand p=(int *)1000; line, does this mean that p is pointing to 1000 address location? what if I do *p=22 does this value is stored at 1000 address and overwrite the existing value? If it overwrites the value, what if another program is working with 1000 address space?
how q-p=250?
EDIT: I tried printf("%u:%u:%u",q,p,(q-p)); the output is the same
int main()
{
int *p;
int i=5;
p=&i;
printf("%u:%d",p,i);
return 0;
}
the output
3214158860:5
does this mean the addresses used by compiler are integers? there is no difference between normal integers and address integers?

does this mean that p is pointing to 1000 address location?
Yes.
what if I do *p=22
It's invoking undefined behavior - your program will most likely crash with a segfault.
Note that in modern OSes, addresses are virtual - you can't overwrite an other process' adress space like this, but you can attempt writing to an invalid memory location in your own process' address space.
how q-p=250?
Because pointer arithmetic works like this (in order to be compatible with array indexing). The difference of two pointers is the difference of their value divided by sizeof(*ptr). Similarly, adding n to a pointer ptr of type T results in a numeric value ptr + n * sizeof(T).
Read this on pointers.
does this mean the addresses used by compiler are integers?
That "used by compiler" part is not even necessary. Addresses are integers, it's just an abstraction in C that we have nice pointers to ease our life. If you were coding in assembly, you would just treat them as unsigned integers.
By the way, writing
printf("%u:%d", p, i);
is also undefined behavior - the %u format specifier expects an unsigned int, and not a pointer. To print a pointer, use %p:
printf("%p:%d", (void *)p, i);

Yes, with *p=22 you write to 1000 address.
q-p is 250 because size of int is 4 so it's 2000-1000/4=250

The meaning of p = (int *) 1000 is implementation-defined. But yes, in a typical implementation it will make p to point to address 1000.
Doing *p = 22 afterwards will indeed attempt to store 22 at address 1000. However, in general case this attempt will lead to undefined behavior, since you are not allowed to just write data to arbitrary memory locations. You have to allocate memory in one way or another in order to be able to use it. In your example you didn't make any effort to allocate anything at address 1000. This means that most likely your program will simply crash, because it attempted to write data to a memory region that was not properly allocated. (Additionally, on many platforms in order to access data through pointers these pointers must point to properly aligned locations.)
Even if you somehow succeed succeed in writing your 22 at address 1000, it does not mean that it will in any way affect "other programs". On some old platforms it would (like DOS, fro one example). But modern platforms implement independent virtual memory for each running program (process). This means that each running process has its own separate address 1000 and it cannot see the other program's address 1000.

Yes, p is pointing to virtual address 1000. If you use *p = 22;, you are likely to get a segmentation fault; quite often, the whole first 1024 bytes are invalid for reading or writing. It can't affect another program assuming you have virtual memory; each program has its own virtual address space.
The value of q - p is the number of units of sizeof(*p) or sizeof(*q) or sizeof(int) between the two addresses.

Casting arbitrary integers to pointers is undefined behavior. Anything can happen including nothing, a segmentation fault or silently overwriting other processes' memory (unlikely in the modern virtual memory models).
But we used to use absolute addresses like this back in the real mode DOS days to access interrupt tables and BIOS variables :)
About q-p == 250, it's the result of semantics of pointer arithmetic. Apparently sizeof int is 4 in your system. So when you add 1 to an int pointer it actually gets incremented by 4 so it points to the next int not the next byte. This behavior helps with array access.

does this mean that p is pointing to 1000 address location?
yes. But this 1000 address may belong to some other processes address.In this case, You illegally accessing the memory of another process's address space. This may results in segmentation fault.

Related

Dereferencing pointer to arbitrary address gives Segmentation fault

I have written a simple C code for pointers. As per my understanding, Pointer is a variable which holds the address of another variable.
Eg :
int x = 25; // address - 1024
int *ptr = &x;
printf("%d", *ptr); // *ptr will give value at address of x i.e, 25 at 1024 address.
However when I try below code I'm getting segmentation fault
#include "stdio.h"
int main()
{
int *ptr = 25;
printf("%d", *ptr);
return 0;
}
What's wrong in this? Why can't a pointer variable return the value at address 25? Shouldn't I be able to read the bytes at that address?
Unless you're running on an embedded system with specific known memory locations, you can't assign an arbitrary value to a pointer an expect to be able to dereference it successfully.
Section 6.5.3.2p4 of the C standard states the following regarding the indirection operator *:
The unary
* operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If
the operand has type "pointer to type", the result has
type "type". If an invalid value has been assigned to
the pointer, the behavior of the unary
* operator is undefined.
As mentioned in the passage above, the C standard only allows for pointers to point to known objects or to dynamically allocated memory (or NULL), not arbitrary memory locations. Some implementations may allow that in specific situations, but not in general.
Although the behavior of your program is undefined according to the C standard, your code is actually correct in the sense that it is doing exactly what you intend. It is attempting to read from memory address 25 and print the value at that address.
However, in most modern operating systems, such as Windows and Linux, programs use virtual memory and not physical memory. Therefore, you are most likely attempting to access a virtual memory address that is not mapped to a physical memory address. Accessing an unmapped memory location is illegal and causes a segmentation fault.
Since the memory address 0 (which is written in C as NULL) is normally reserved to specify an invalid memory address, most modern operating systems never map the first few kilobytes of virtual memory addresses to physical memory. That way, a segmentation fault will occur when an invalid NULL pointer is dereferenced (which is good, because it makes it easier to detect bugs).
For this reason, you can be reasonably certain that also the address 25 (which is very close to address 0) is never mapped to physical memory and will therefore cause a segmentation fault if you attempt to access that address.
However, most other addresses in your program's virtual memory address space will most likely have the same problem. Since the operating system tries to save physical memory if possible, it will not map more virtual memory address space to physical memory than necessary. Therefore, trying to guess valid memory addresses will fail, most of the time.
If you want to explore the virtual address space of your process to find memory addresses that you can read without a segmentation fault occuring, you can use the appropriate API supplied by your operating system. On Windows, you can use the function VirtualQuery. On Linux, you can read the pseudo-filesystem /proc/self/maps. The ISO C standard itself does not provide any way of determining the layout of your virtual memory address space, as this is operating system specific.
If you want to explore the virtual memory address layout of other running processes, then you can use the VirtualQueryEx function on Windows and read /proc/[pid]/maps on Linux. However, since other processes have a separate virtual memory address space, you can't access their memory directly, but must use the ReadProcessMemory and WriteProcessMemory functions on Windows and use /proc/[pid]/mem on Linux.
Disclaimer: Of course, I don't recommend messing around with the memory of other processes, unless you know exactly what you are doing.
However, as a programmer, you normally don't want to explore the virtual memory address space. Instead, you normally work with memory that has been assigned to your program by the operating system. If you want the operating system to give you some memory to play around with, which you are allowed to read from and write to at will (i.e. without segmentation faults), then you can just declare a large array of chars (bytes) as a global variable, for example char buffer[1024];. Be careful with declaring larger arrays as local variables, as this may cause a stack overflow. Alternatively, you can ask the operating system for dynamically allocated memory, for example using the malloc function.
You should consider all warnings that the compiler issues.
This statement
int *ptr = 25;
is incorrect. You are trying to assign an integer to a pointer as an address of memory. Thus in this statement
printf("%d", *ptr);
there is an attempt to access memory at address 25 that does not belong to your program.
What you mean is the following
#include "stdio.h"
int main( void )
{
int x = 25;
int *ptr = &x;
printf("%d", *ptr);
return 0;
}
Or
#include "stdio.h"
#include <stdlib.h>
int main( void )
{
int *ptr = malloc( sizeof( int ) );
*ptr = 25;
printf("%d", *ptr);
free( ptr );
return 0;
}

The maximum memory location my C stack pointer can points to during initialization

Consider the following code in a linux machine with 32 bit OS:
void foo(int *pointer){
int *buf;
int *buf1 = pointer;
....
}
What is the maximum memory address buf and buf1 can point to using the above declaration (OS allocates the address)? E.g., can it point to address 2^32-200?
The reason I asked is that I may do pointer arithmetic on these buffers and I am concern that this pointer arithmetic can wrap around. E.g., assume the len is smaller than the size of buf and buf1. Assume some_pointer points to the end of the buffer.
unsigned char len = 255;
if(buf + len > some_pointer)
//do something
if(buf1 + len > some_pointer)
//do something
The standard says that
For two elements of an array, the address of the element with the lower subscript will always compare less to the address of the object with the higher subscript.
Comparing any two elements that are not part of the same aggregate (array or struct) is undefined behavior.
So if buf + len and some_pointer point to elments in the same array as buf (or one past the array), you don't have to worry about wrap arround. If one of them doesn't, you have undefined behavior anyway.
You shouldn't ever rely on the addresses provided by the allocator falling within a specific range. Even if you could show that on a particular Linux setup, malloc can only generate addresses between X and Y, there is no guarantee--it could change with any future update. The only guarantee from malloc is that successful allocations won't start at NULL (address 0 in code, for Linux and most other typical platforms).
Yes, for a 32 bit or 64 bit OS. Whether there's anything usable there, or if you'll get an access violation trying to dereference the pointer, is up to the compiler and OS.
The OS can map pages of physical memory anywhere in the address space. The addresses you see don’t correspond to physical RAM chips at all. The OS might, for example, have virtual memory or copy-on-write pages.

Doubts about pointer and memory access

i am just started learning pointers in c. I have following few doubts. If i find the answers for the below questions. It Will be really useful for me to understand the concept of pointers in c. Thanks in advance.
i)
char *cptr;
int value = 2345;
cptr = (char *)value;
whats the use of (char *) and what it mean in the above code snippet.
ii)
char *cptr;
int value = 2345;
cptr = value;
This also compiles without any error .then whats the difference between i & ii code snippet
iii) &value is returning address of the variable. Is it a virtual memory address in RAM? Suppose another c program running in parallel, will that program can have same memory address as &value. Will each process can have duplicate memory address same as in other process and it is independent of each other?
iv)
#define MY_REGISTER (*(volatile unsigned char*)0x1234)
void main()
{
MY_REGISTER=12;
printf("value in the address tamil is %d",(MY_REGISTER));
}
The above snippet compiled successfully. But it outputs segmentation fault error. I don't know what's the mistake I am doing. I want to know how to access the value of random address, using pointers. Is there any way? Will program have the address 0x1234 for real?
v) printf("value at the address %d",*(236632));//consider the address 236632 available in
//stack
why does the above printf statement showing error?
That's a type cast, it tells the compiler to treat one type as some other (possibly unrelated) type. As for the result, see point 2 below.
That makes cptr point to the address 2345.
Modern operating systems isolate the processes. The address of one variable in one process is not valid in another process, even if started with the same program. In fact, the second process may have a completely different memory map due to Address Space Layout Randomisation (ASLR).
It's because you try to write to address 0x1234 which might be a valid address on some systems, but not on most, and almost never on a PC running e.g. Windows or Linux.
i)
(char *) means, that you cast the data stored in value to a pointer ptr, which points to a char. Which means, that ptr points to the memory location 2345. In your code snipet ptr is undefined though. I guess there is more in that program.
ii)
The difference is, that you now write to cptr, which is (as you defined) a pointer pointing to a char. There is not much of a difference as in i) except, that you write to a different variable, and that you use a implicit cast, which gets resolved by the compiler. Again, cptr points now to the location 2345 and expects there to be a char
iii)
Yes you can say it is a virtual address. Also segmentation plays some parts in this game, but at your stage you don't need to worry about it at all. The OS will resolve that for you and makes sure, that you only overwrite variables in the memory space dedicated to your program. So if you run a program twice at the same time, and you print a pointer, it is most likely the same value, but they won't point at the same value in memory.
iv)
Didn't see the write instruction at first. You can't just write anywhere into memory, as you could overwrite another program's value.
v)
Similar issue as above. You cannot just dereference any number you want to, you first need to cast it to a pointer, otherwise neither the compiler, your OS nor your CPU will have a clue, to what exactely it is pointing to
Hope I could help you, but I recommend, that you dive again in some books about pointers in C.
i.) Type cast, you cast the integer to a char
ii.) You point to the address of 2345.
iii.) Refer to answer from Joachim Pileborg. ^ ASLR
iv.) You can't directly write into an address without knowing if there's already something in / if it even exists.
v.) Because you're actually using a pointer to print a normal integer out, which should throw the error C2100: illegal indirection.
You may think pointers like numbers on mailboxes. When you set a value to a pointer, e.g cptr = 2345 is like you move in front of mailbox 2345. That's ok, no actual interaction with the memory, hence no crash. When you state something like *cptr, this refers to the actual "content of the mailbox". Setting a value for *cptr is like trying to put something in the mailbox in front of you (memory location). If you don't know who it belongs to (how the application uses that memory), it's probably a bad idea. You could use "malloc" to initialize a pointer / allocate memory, and "free" to cleanup after you finish the job.

Why do I get a segmentation fault if I print the contents of this memory location

Suppose I do the following
int *p = 1;
printf("%d", *p);
I get a segfault. Now, as I understand it, this memory location 1 is in the address space of my program. Why should there be a problem in reading this memory location ?
Your pointer doesn't point to anything valid. All you do is assign the value 1 to a pointer-to-int, but 1 isn't a valid memory location.
The only valid way to obtain an pointer value is either take the address-of a variable or call an allocation function:
int a;
int * p1 = &a; // OK
int * p2 = malloc(sizeof(int)); // also OK
*p1 = 2;
*p2 = 3;
As for "why there should be a problem": As far as the language is concerned, if you dereference an invalid pointer, you have undefined behaviour, so anything can happen -- this is really the only sensible way to specify the language if you don't want to introduce any arbitrary restrictions, and C is all about being easy-to-implement.
Practically, modern operating systems will usually have a clever virtual memory manager that needs to request memory when and as needed, and if the memory at address 1 isn't on a committed page yet, you'll actually get an error from the OS. If you try a pointer value near some actual address, you might not get an error (until you overstep the page boundary, perhaps).
To answer this properly will really depend upon a number of factors. However, it is quite likely that the address 0x00000001 (page 0) is not mapped and/or read protected.
Typically, addresses in the page 0 range are disallowed from a user application for one reason or another. Even in kernel space (depending upon the processor), addresses in page 0 are often protected and require that the page be both mapped and access enabled.
EDIT:
Another possible reason is that it could be segfaulting is that integer access is not aligned.
Your operating system won't let you access memory locations that don't belong to your program. It would work on kernel mode though...
No, address 1 is not in the address space of your program. You are trying to access a segment you do not own and receive segmentation fault.
You're trying to print 1, not the memory location. 1 isn't a valid memory location. To print the actual memory location do:
printf("%p", p);
Note the %p as icktoofay pointed out.

C pointers and the physical address

I'm just starting C. I have read about pointers in various books/tutorials and I understand the basics. But one thing I haven't seen explained is what are the numbers.
For example:
int main(){
int anumber = 10;
int *apointer;
apointer = &anumber;
printf("%i", &apointer);
}
may return a number like 4231168. What does this number represent? Is it some storage designation in the RAM?
Lots of PC programmer replies as always. Here is a reply from a generic programming point-of-view.
You will be quite interested in the actual numerical value of the address when doing any form of hardware-related programming. For example, you can access hardware registers in a computer in the following way:
#define MY_REGISTER (*(volatile unsigned char*)0x1234)
This code assumes you know that there is a specific hardware register located at address 0x1234. All addresses in a computer are by tradition/for convenience expressed in hexadecimal format.
In this example, the address is 16 bits long, meaning that the address bus on the computer used is 16-bits wide. Every memory cell in your computer has an address. So on a 16-bit address bus you could have a maximum of 2^16 = 65536 addressable memory cells.
On a PC for example, the address would typically be 32 bits long, giving you 4.29 billion addressable memory cells, ie 4.29 Gigabyte.
To explain that macro in detail:
0x1234 is the address of the register / memory location.
We need to access this memory location through a pointer, so therefore we typecast the integer constant 0x1234 into an unsigned char pointer = a pointer to a byte.
This assumes that the register we are interested in is 1 byte large. Had it been two bytes large, we would perhaps have used unsigned short instead.
Hardware registers may update themselves at any time (their contents are "volatile"), so the program can't be allowed to make any assumptions/optimizations of what's stored inside them. The program has to read the value from the register at every single time the register is used in the code. To enforce this behavior, we use the volatile keyword.
Finally, we want to access the register just as if it was a plain variable. Therefore the * is added, to take the contents of the pointer.
Now the specific memory location can be accessed by the program:
MY_REGISTER = 1;
unsigned char var = MY_REGISTER;
For example, code like this is used everywhere in embedded applications.
(But as already mentioned in other replies, you can't do things like this in modern PCs, since they are using something called virtual addressing, giving you a slap on the fingers should you attempt it.)
It's the address or location of the memory to which the pointer refers. However, it's best if you regard this as an opaque quantity - you are never interested in the actual value of the pointer, only that to which it refers.
How the address then relates to physical memory is a service that the system provides and actually varies across systems.
That's a virtual address of anumber variable. Every program has its own memory space and that memory space is mapped to the physical memory. The mapping id done by the processor and the service data used for that is maintained by the operating system. So your program never knows where it is in the physical memory.
It's the address of the memory1 location where your variable is stored. You shouldn't care about the exact value, you should just know that different variables have different addresses, that "contiguous memory" (e.g. arrays) has contiguous addresses, ...
By the way, to print the address stored in a pointer you should use the %p specifier in printf.
Notice that I did not say "RAM", because in most modern OSes the "memory" your process sees is virtual memory, i.e. an abstraction of the actual RAM managed by the OS.
A lot of people told you, that the numeric value of a pointer will designate its address. This is one way how implementations can do it, but it is very important, what the C standard has to say about pointers:
The nil pointer has always numeric value 0 when operated on in the C programming language. However the actual memory containing the pointer may have any value, as long as this special, architecture dependent value is consistently treated nil, and the implementation takes care that this value is seen as 0 by C source code. This is important to know, since 0 pointers may appear as a different value on certain architectures when inspected with a low level memory debugger.
There's no requirement whatsoever that the values of the pointer are in any way related to actual addresses. They may be as well abstract identifiers, resolved by a LUT or similar.
If a pointer addresses an array, the rules of pointer arithmetic must hold, i.e. int array[128]; int a, b; a = (int)&array[120]; b = (int)&array[100]; a - b == 20 ; array + (a-b) == &array[20]; &array[120] == (int*)a
Pointer arithmetic between pointers to different objects is undefined and causes undefined behaviour.
The mapping pointer to integer must be reversible, i.e. if a number corresponds to a valid pointer, the conversion to this pointer must be valid. However (pointer) arithmetic on the numerical representation of pointers to different objects is undefined.
Yes, exactly that - it's the address of the apointer data in memory. Local variable such as anumber and apointer will be allocated in your program's stack, so it will refer to an address in the main() function's frame in the stack.
If you had allocated the memory with malloc() instead it would refer to a position in your program's heap space. If it was a fixed string it may refer to a location in your program's data or rodata (read-only data) segments instead.
in this case &apointer represent the address in RAM memory of the pointer variable apointer
apointer is the "address" of the variable anumber. In theory, it could be the actual physical place in RAM where the value of anumber is stored, but in reality (on most OS'es) it's likely to be a place in virtual memory. The result is the same though.
It's a memory address, most likely to the current location in your program's stack. Contrary to David's comment, there are times when you'll calculate pointer offsets, but this is only if you have some kind of array that you are processing.
It's the address of the pointer.
"anumber" takes up some space in RAM, the data at this spot contains the number 10.
"apointer" also takes up some space in RAM, the data at this spot contains the location of "anumber" in RAM.
So, say you have 32 bytes of ram, addresses 0..31
At e.g. position 16 you have 4 bytes, the "anumber" value 10
At e.g. position 20 you have 4 bytes, the "apointer" value 16, "anumber"'s position in RAM.
What you print is 20, apointer's position in RAM.
Note that this isn't really directly in RAM, it's in virtual address space which is mapped to RAM. For the purpose of understanding pointers you can completely ignore virtual address space.
it is not the address of the variable anumber that is printed but it is the address of the pointer which gets printed.look carefully.had it been just "apointer",then we would have seen the address of the anumber variable.

Resources