I have an assembler/c question. I just read about segment prefixes, for example ds:varX and so on. The prefix is important for the calculation of the logical address. I read too, that default is "ds" and as soon as you use the ebp register to calculate an address, "ss" is used. For code "cs" is default. That all makes sense.
Now I have the following in c:
int x; // some static var in ds
void test(int *p){
...
*p =5;
}
... main(){
test(&x);
//now x is 5
}
If you now think about the implemention of test-function... you get the pointer to x on the stack. If you want to dereference the pointer, you first get the pointer-value(address of x) from the stack and save it in eax for example. Then you can dereference eax to change the value of x. But how does the c-compiler know if the given pointer(address) references memory on the stack (for example if i call test from another function and push the address of a localvariable as parameter for test) or the data segment? How is the full logical address calculated? The function cannot know which segment the given address offset relates to..?!
In general case, on a segmented platform your can't just read the pointer value "into eax" as you suggest. On a segmented platform the pointer would generally hold both the segment value and offset value, meaning that reading such a pointer would imply initializing at least two registers - segment and offset - not just one eax.
But in specific cases it depends on so called the memory model. Compilers on segmented platforms supported several memory models.
For starters, for obvious reasons it does not matter which segment register you use as long as the segment register holds the correct value. For example, if DS and ES registers hold the same value inside, then DS:<offset> will point to the same location in memory as ES:<offset>.
In so called "tiny" memory model, for one example, all segment registers were holding the same value, i.e. everything - code, data, stack - would fit in one segment (which is why it was called "tiny"). In this memory model each pointer was just an offset in this segment and, of course, it simply didn't matter which segment register to use with that offset.
In "larger" memory models you could have separate segments for code (CS), stack (SS) and data (DS). But on such memory models pointer object would normally hold both the offset and segment part of the address inside of it. In your example pointer p would actually be a two-part object, holding both segment value and offset value at the same time. In order to dereference such pointer the compiler would generate the code that would read both segment and offset values from p and use both of them. For example, the segment value would be read into ES register, while the offset value would be read into si register. The code would then access ES:[di] in order to read *p value.
There were also "intermediate" memory models, when code would be stored in one segment (CS), while data and stack would both be stored in another segment, so DS and SS would hold the same value. On that platform, obviously, there was no need to differentiate between DS and SS.
In the largest memory models you could have multiple data segments. In this case it is rather obvious that proper data addressing in segmented mode is not really a matter of choosing the proper segment register (as you seem to believe), but rather a matter of taking pretty much any segment register and initializing it with the correct value before performing the access.
What AndreyT described was what happened on DOS days. These days, modern operating systems use the so called flat memory model (or rather something very similar), in which all (protected mode) segments are setup so that they all can access the whole address space (i.e: they have a base of 0 and a limit = the whole address space).
On a machine with a segmented memory model, the C implementation must do one of the following things to be conformant:
Store the full address (with segment) in each pointer, OR
Ensure that all stack addresses that will be used for variables whose addresses are taken can be accessed via the data segment, either at the same relative address or via some magic offset the compiler can apply when taking the address of local variables, OR
Not use the stack for local variables whose addresses are taken, and perform a hidden malloc/free on every function entry/return (with special handling for longjmp!).
Perhaps there are other ways of doing it, but these are the only ones I can think of. Segmented memory models were really pretty disagreeable with C, and they were abandoned for good reason.
Segmentation is the legacy artifact of the Intel 16-bit 8086 processor. In reality, you probably operate in virtual memory, where everything is just a linear address. Compile with -S flag and see the resulting assembly.
Since you move the address to eax before dereferencing it, it defaults to the ds segment. However, as Nikolai mentioned, in user level code the segments probably all point to the same address.
Under x86, direct usage of the stack will use the stack segment, but indirect usage treats it as a data segment. You can see this if you disassemble a pointer dereference and write to a stack section pointer. Under x86 cs, ss and ds are treated pretty much the same(atleast in non kernel modes) due to linear addressing. the intel reference manuals should also have a section on segment addressing
Related
I was playing with C, and I just discovered that a and &a yield to the same result that is the address to the first element of the array. By surfing here over the topics, I discovered they are only formatted in a different way. So my question is: where is this address stored?
This is an interesting question! The answer will depend on the specifics of the hardware you're working with and what C compiler you have.
From the perspective of the C language, each object has an address, but there's no specific prescribed mechanism that accounts for how that address would actually be stored or accessed. That's left up to the compiler to decide.
Let's imagine that you've declared your array as a local variable, and then write something like array[137], which accesses the 137th element of the array. How does the generated program know how to find your array? On most systems, the CPU has a dedicated register called the stack pointer that keeps track of the position of the memory used for all the local variables of the current function. As the compiler translates your C code into an actual executable file, it maintains an internal table mapping each local variable to some offset away from where the stack pointer points. For example, it might say something like "because 64 bytes are already used up for other local variables in this function, I'm going to place array 64 bytes past where the stack pointer points." Then, whenever you reference array, the compiler generates machine instructions of the form "look 64 bytes past the stack pointer to find the array."
Now, imagine you write code like this:
printf("%p\n", array); // Print address of array
How does the compiler generate code for this? Well, internally, it knows that array is 64 bytes past the stack pointer, so it might generate code of the form "add 64 to the stack pointer, then pass that as an argument to printf."
So in that sense, the answer to your question could be something like "the hardware stores a single pointer called the stack pointer, and the generated code is written in a way that takes that stack pointer and then adds some value to it to get to the point in memory where the array lives."
Of course, there are a bunch of caveats here. For example, some systems have both a stack pointer and a frame pointer. Interpreters use a totally different strategy and maintain internal data structures tracking where everything is. And if the array is stored at global scope, there's a different mechanism used altogether.
Hope thi shelps!
It isn't stored anywhere - it's computed as necessary.
Unless it is the operand of the sizeof, _Alignof, or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" is converted ("decays") to an expression of type "pointer to T", and the value of the expression is the address of the first element of the array.
When you declare an array like
T a[N]; // for any non-function type T
what you get in memory is
+---+
| | a[0]
+---+
| | a[1]
+---+
...
+---+
| | a[N-1]
+---+
That's it. No storage is materialized for any pointer. Instead, whenever you use a in any expression, the compiler will compute the address of a[0] and use that instead.
Consider this C code:
int x;
void foo(void)
{
int y;
...
}
When implementing this program, a C compiler will need to generate instructions that access the int objects named x and y and the int object allocated by the malloc. How does it tell those instructions where the objects are?
Each processor architecture has some way of referring to data in memory. This includes:
The machine instruction includes some bits that identify a processor register. The address in memory is in that processor register.
The machine instruction includes some bits that specify an address.
The machine instruction includes some bits that specify a processor register and some bits that specify an offset or displacement.
So, the compiler has a way of giving an address to the processor. It still needs to know that address. How does it do that?
One way is the compiler could decide exactly where everything in memory is going to go. It could decide it is going to put all the program’s instructions at addresses 0 to 10,000, and it is going to put data at 10,000 and on, and that x will go at address 12300. Then it could write an instruction to fetch x from address 12300. This is called absolute addressing, and it is rarely used anymore because it is inflexible.
Another option is that the compiler can let the program loader decide where to put the data. When the software that loads the program into memory is running, it will read the executable, see how much space is needed for instructions, how much is needed for data that is initialized to zero, how much space is needed for data with initial values listed in the executable file, how much space is needed for data that does not need to be initialized, how much space is requested for the stack, and so on. Then the loader will decide where to put all of these things. As it does so, it will set some processor registers, or some tables in memory, to contain the addresses where things go.
In this case, the compiler may know that x goes at displacement 2300 from the start of the “zero-initialized data” section, and that the loader sets register r12 to contain the base address of that section. Then, when the compiler wants to access x, it will generate an instruction that says “Use register r12 plus the displacement 2300.” This is largely the method used today, although there are many embellishments involving linking multiple object modules together, leaving a placeholder in the object module for the name x that the linker or loader fills in with the actual displacement as they do their work, and other features.
In the case of y, we have another problem. There can be two or more instances of y existing at once. The function foo might call itself, which causes there to be a y for the first call and a different y for the second call. Or foo might call another function that calls foo. To deal with this, most C implementations use a stack. One register in the processor is chosen to be a stack pointer. The loader allocates a large amount of space and sets the stack pointer register to point to the “top” of the space (usually the high-address end, but this is arbitrary). When a function is called, the stack pointer is adjusted according to how much space the new function needs for its local data. When the function executes, it puts all of its local data in memory locations determined by the value of the stack pointer when the function started executing.
In this model, the compiler knows that the y for the current function call is at a particular offset relative to the current stack pointer, so it can access y using instructions with addresses such as “the contents of the stack pointer plus 84 bytes.” (This can be done with a stack pointer alone, but often we also have a frame pointer, which is a copy of the stack pointer at the moment the function was called. This provides a firmer base address for working with local data, one that might not change as much as the stack pointer does.)
In either of these models, the compiler deals with the address of an array the same way it deals with the address of a single int: It knows where the object is stored, relative to some base address for its data segment or stack frame, and it generates the same sorts of instruction addressing forms.
Beyond that, when you access an array, such as a[i], or possibly a multidimensional array, a[i][j][k], the compiler has to do more calculations. To do this, compiler takes the starting address of the array and does the arithmetic necessary to add the offsets for each of the subscripts. Many processors have instructions that help with these calculations—a processor may have an addressing form that says “Take a base address from one register, add a fixed offset, and add the contents of another register multiplied by a fixed size.” This will help access arrays of one dimension. For multiple dimensions, the compiler has to write extra instructions to do some of the calculations.
If, instead of using an array element, like a[i], you take its address, as with &a[i], the compiler handles it similarly. It will get a base address from some register (the base address for the data segment or the current stack pointer or frame pointer), add the offset to where a is in that segment, and then add the offset required for i elements. All of the knowledge of where a[i] is is built into the instructions the compiler writes, plus the registers that help manage the program’s memory layout.
Yet one more point of view, a TL;DR answer if you will: When the compiler produces the binary, it stores the address everywhere where it is needed in the generated machine code.
The address may be just plain number in the machine code, or it may be a calculation of some sort, such as "stack frame base address register + a fixed offset number", but in either case it is duplicated everywhere in the machine code where it is needed.
In other words, it is not stored in any one location. Talking more technically, &some_array is not an lvalue, and trying to take the address of it, &(&some_array), will produce compiler error.
This actually applies to all variables, array is not special in any way here. The address of a variable can be used in the machine code directly (and if compiler actually generates code which does store the address somewhere, you have no way to know that from C code, you have to look at the assembly code).
The one thing special about arrays, which seems to be the source of your confusion is, that some_array is bascially a more convenient syntax for &(some_array[0]), while &some_array means something else entirely.
Another way to look at it:
The address of the first element doesn't have to be stored anywhere.
An array is a chunk of memory. It has an address simply because it exists somewhere in memory. That address may or may not have to be stored somewhere depending on a lot of things that others have already mentioned.
Asking where the address of the array has to be stored is like asking where reality stores the location of your car. The location doesn't have to be stored - your car is located where your car happens to be - it's a property of existing. Sure, you can make a note that you parked your car in row 97, spot 114 of some huge lot, but you don't have to. And your car will be wherever it is regardless of your note-taking.
I am writing a program in C (32 bit) where I output a string (15 to 40 characters long). I have elected to use pointers and calloc instead of a formal array declaration. My program functions totally fine so this isn't a question about logic or function, I am simply curious about what's "going on under the hood" of my C code.
My understanding: When I use calloc I am allocating a section of memory in units of bytes. Variables are stored in memory locations of size 32 bits (or 4 bytes). In my program, I write characters using my pointer (i.e. *ptr = '!';) and then I increment the points (ptr++;) to move to the next memory location.
My question: If memory locations are 32-bits and I am writing only 8-bits to that memory location, are the remaining 24-bits unused? If not, then are the pointers I'm using pointing to some kind of 8-bit sub-memory location, pointing to 8-bit sections of memory locations?
Register usage -- and, technically, even the existence of registers at all -- is a characteristic of the C implementation and the hardware on which it runs. There is therefore no definitive answer to your question at its level of generality. This is for the most part true of any question about "what's going on under the hood".
Speaking in terms of typical implementations for commodity hardware, though,
My understanding: When I use calloc I am allocating a section of memory in units of bytes.
A reasonable characterization.
Variables are stored in registers of size 32 bits (or 4 bytes).
No. Values are stored in registers. Implementations generally provide storage for the values of variables in regular memory, though those values may be copied into registers for computation.
Under some implementation-specific circumstances, certain variables might not have an associated memory location, their values instead being maintained only in registers. Generally speaking, however, this is never the case for variables or allocated space that is, was, or ever could be referenced by a pointer.
In my program, I write characters using my pointer (i.e. *ptr = '!';) and then I increment the points (ptr++;) to move to the next register.
No, absolutely not. Incrementing the pointer causes it to point to the next element of your dynamic storage, measured in units of the size of the pointed-to type. This has nothing to do with registers. Writing to the pointed-to object probably involves register use (because that's how CPUs work), but ultimately the character written ends up in regular memory.
My question: If registers are 32-bits and I am writing only 8-bits to that register, are the remaining 24-bits unused?
As I already explained, this question is based on a misconception. The target of your write is not a register. In any case, there are no gaps in memory between the elements you are writing.
It is conceivable that under some circumstances, a clever compiler might optimize your code to minimize writes to memory by collecting bytes in a register and performing writes in chunks of that size. Whether it can or will do so depends on the implementation and the options in effect.
If not, then are the pointers I'm using pointing to some kind of 8-bit sub-register allocation, pointing to 8-bit sections of registers?
Your pointers are (logically) pointing to main memory, which is (logically) addressable in byte-sized units. They are not pointing to registers.
Nopes, there's no register involved, in general, they are scarce resource.
What happens actually is, you are writing the values in the memory locations pointed to by the returned pointer. The pointers and pointer arithmetic regards data type, so the returned pointer, casted to proper type, takes care of access.
I write characters using my pointer (i.e. *ptr = '!';) and then I increment the points (ptr++;) to move to the next register.
Not exactly, you are talking about memory location pointed to by the pointer ptr. In case, ptr is defined as char *, ptr++ is the same as ptr = ptr + 1, which, increases the ptr by the size of the pointing data type, char. So, after the expression, ptr points to the next element in the memory location.
Those pointers are not certain to be stored in registers, normally they will be just stored on the stack.
This is an outcome of the compiler optimizations.
In some compilers you can use the register statement to ensure usage of register.
Also, there is no "next" registers, registers does not have addresses. Register file is a special hardware unit integrated to the cpu and usually named by a certain set of bits.
I advise you to use your compiler or disassembly tool to see exactly how it looks in assembly.
You can specify in c that a var goes into a register, and most compilers will optimize this, but where the var goes depends on what kind of variable it is. Local variables will go on the stack, memory allocation functions should put it on the heap and give you the address. Constants and string literals will go into the read only data segment.
As Sourav pointed out you are using registers wrong. There is a memory called register and there is a keyword register in C. But this has not much to do with pointers.
The typical size for an aligned memory block is 16/32/64bit depending on your architecture. You are thinking that you increase your pointer by that blocksize. This is not correct.
Depending on what type of pointer you have, your stepsize on incrementation differs. It is always the size of your corresponding data type in bytes.
*char gets increase by 1 byte if you do ++
while *(long long) gets increased by 8.
As arrays can decay to pointers on some occasions, the mechanics are quite similar.
What you think of is what happens if you declare two char (or a char and an int in a struct), their addresses differ by a multiple of the blocksize and the rest of the memory is "wasted".
But as you allocated the memory it is yours to control, you can pack it similar to an array.
There seems to be confusion about what a register is. A register is a storage location within the processor. Registers have different functions. However, programmers are generally concerned with GENERAL REGISTERS and the Process Status Register.
General Registers are scratch locations for performing computations. On some systems all operations are performed in registers. Thus, if you want to add two values, you have to load both into registers, then add them. Most non-RISC systems these days allow operations to take place directly to memory.
My understanding: When I use calloc I am allocating a section of memory in units of bytes. Variables are stored in registers of size 32 bits (or 4 bytes). In my program, I write characters using my pointer (i.e. *ptr = '!';) and then I increment the points (ptr++;) to move to the next register.
Your compiler may assign variables to exist in registers, rather than memory. However, any time you dereference a pointer (e.g. *ptr) you have to access memory.
If you call
char *ptr = calloc (...)
The variable ptr may (or may not) be placed in a register. It's all up to your compiler. The value returned by calloc is the location of memory, not registers.
What you should do to learn this is to generate assembly language code from your compiler. Most compilers have such an option and they typically interleave your C code with the generated assembly code.
If you do:
In my program, I write characters using my pointer (i.e. *ptr = '!';) and then I increment the points (ptr++;) to move to the next register.
Your generated code might look like (assuming ptr is mapped to R1):
MOVB '!', (R0)+
Which on several systems, moves the value '!' to the address pointed to by R0, then increments R0 by one.
My question: If registers are 32-bits and I am writing only 8-bits to that register, are the remaining 24-bits unused? If not, then are the pointers I'm using pointing to some kind of 8-bit sub-register allocation, pointing to 8-bit sections of registers?
In your case, you are not reading and writing bytes to registers. However, many systems do have REGISTER subdividing.
I am writing a user space C program to read the hard disk.
I need to convert an assembler instruction to C program code. How can this be done?
mov eax, [rsi+0x0C]
Here eax can be any variable. However, rsi is the base address register with value 0xc1617000. This value does not change.
You can assign values to pointers in C. Try this:
uint8_t *rsi = (uint8_t*)(uintptr_t) 0xc1617000; // The uintptr_t cast isn't really needed, but might help portability.
uint32_t value = *(uint32_t *)(rsi + 0x0C);
A shorter version, of course is:
uint32_t value = *(uint32_t *)0xc161700C;
Basically you interpret that constant as a pointer to uint32_t, and then dereference it.
Following http://www.cs.virginia.edu/~evans/cs216/guides/x86.html:
mov eax, [rsi+0x0C]
means
move the 4 Byte word at the address rsi+0x0C to the EAX register
that's what this line of assembler means; you say
Here eax can be any variable
Typically, EAX is the return value of some function, but I'll not go into this.
Since this is trivial:
int variable = *((unsigned int*) 0xc161700C;
notice that it's totally up to your compiler whether it actually copies over that value -- in many cases, the compiler will be able to do that only when the value of variable is actually used. If asking for the address of variable, you might either be getting a new address, or actually 0xc161700C.
Since this is basic C, I'm not so confident I want to let you play with my hard drive ;) notice that for programs running in unprivileged (non-kernel mode), access to physical memory addresses is impossible in general.
EDIT
On linux the program is crashing when accessing the location. May be because its outside the bound of process memory. Any idea how to access the memory outside the bound of process memory
As I said here and in the comments:
If your code is running as a program (in userland), you can never access raw physical memory addresses. Your process sees its own memory with physical memory being mapped there in pages -- there's no possibility to access raw physical memory without the help of kernel mode. That is the beauty of memory mapping as done on any modern CPU: programs can't fiddle directly with hardware.
Under Linux, things might be relatively easy: open or mmap /dev/mem as root and access the right position in that file -- it's an emulation of direct access to memory as accessible by the operating system.
However, what you're doing is hazardous, and Linux usually already supports as much AHCI as it should -- are you sure you're already using a linux kernel of the last ten years?
Take a simple program like this:
int main(void)
{
char p;
char *q;
q = &p;
return 0;
}
How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime? If at runtime, is there some table of variables or something where it looks these things up? Does the OS keep track of them and it just asks the OS?
My question may not even make sense in the context of the correct explanation, so feel free to set me straight.
How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime?
This is an implementation detail of the compiler. Different compilers can choose different techniques depending on the kind of operating system they are generating code for and the whims of the compiler writer.
Let me describe for you how this is typically done on a modern operating system like Windows.
When the process starts up, the operating system gives the process a virtual address space, of, let's say 2GB. Of that 2GB, a 1MB section of it is set aside as "the stack" for the main thread. The stack is a region of memory where everything "below" the current stack pointer is "in use", and everything in that 1MB section "above" it is "free". How the operating system chooses which 1MB chunk of virtual address space is the stack is an implementation detail of Windows.
(Aside: whether the free space is at the "top" or "bottom" of the stack, whether the "valid" space grows "up" or "down" is also an implementation detail. Different operating systems on different chips do it differently. Let's suppose the stack grows from high addresses to low addresses.)
The operating system ensures that when main is invoked, the register ESP contains the address of the dividing line between the valid and free portions of the stack.
(Aside: again, whether the ESP is the address of the first valid point or the first free point is an implementation detail.)
The compiler generates code for main that pushes the stack pointer by lets say five bytes, by subtracting from it if the stack is growing "down". It decreases by five because it needs one byte for p and four for q. So the stack pointer changes; there are now five more "valid" bytes and five fewer "free" bytes.
Let's say that q is the memory that is now in ESP through ESP+3 and p is the memory now in ESP+4. To assign the address of p to q, the compiler generates code that copies the four byte value ESP+4 into the locations ESP through ESP+3.
(Aside: Note that it is highly likely that the compiler lays out the stack so that everything that has its address taken is on an ESP+offset value that is divisible by four. Some chips have requirements that addresses be divisible by pointer size. Again, this is an implementation detail.)
If you do not understand the difference between an address used as a value and an address used as a storage location, figure that out. Without understanding that key difference you will not be successful in C.
That's one way it could work but like I said, different compilers can choose to do it differently as they see fit.
The compiler cannot know the full address of p at compile-time because a function can be called multiple times by different callers, and p can have different values.
Of course, the compiler has to know how to calculate the address of p at run-time, not only for the address-of operator, but simply in order to generate code that works with the p variable. On a regular architecture, local variables like p are allocated on the stack, i.e. in a position with fixed offset relative to the address of the current stack frame.
Thus, the line q = &p simply stores into q (another local variable allocated on the stack) the address p has in the current stack frame.
Note that in general, what the compiler does or doesn't know is implementation-dependent. For example, an optimizing compiler might very well optimize away your entire main after analyzing that its actions have no observable effect. The above is written under the assumption of a mainstream architecture and compiler, and a non-static function (other than main) that may be invoked by multiple callers.
This is actually an extraordinarily difficult question to answer in full generality because it's massively complicated by virtual memory, address space layout randomization and relocation.
The short answer is that the compiler basically deals in terms of offsets from some “base”, which is decided by the runtime loader when you execute your program. Your variables, p and q, will appear very close to the “bottom” of the stack (although the stack base is usually very high in VM and it grows “down”).
Address of a local variable cannot be completely calculated at compile time. Local variables are typically allocated in the stack. When called, each function allocates a stack frame - a single continuous block of memory in which it stores all its local variables. The physical location of the stack frame in memory cannot be predicted at compile time. It will only become known at run-time. The beginning of each stack frame is typically stored at run-time in a dedicated processor register, like ebp on Intel platform.
Meanwhile, the internal memory layout of a stack frame is pre-determined by the compiler at compile-time, i.e. it is the compiler who decides how local variables will be laid out inside the stack frame. This means that the compiler knows the local offset of each local variable inside the stack frame.
Put this all together and we get that the exact absolute address of a local variable is the sum of the address of the stack frame itself (the run-time component) and the offset of this variable inside that frame (the compile-time component).
This is basically exactly what the compiled code for
q = &p;
will do. It will take the current value of the stack frame register, add some compile-time constant to it (offset of p) and store the result in q.
In any function, the function arguments and the local variables are allocated on the stack, after the position (program counter) of the last function at the point where it calls the current function. How these variables get allocated on the stack and then deallocated when returning from the function, is taken care of by the compiler during compile time.
For e.g. for this case, p (1 byte) could be allocated first on the stack followed by q (4 bytes for 32-bit architecture). The code assigns the address of p to q. The address of p naturally then is 5 added or subtracted from the the last value of the stack pointer. Well, something like that, depends on how the value of the stack pointer is updated and whether the stack grows upwards or downwards.
How the return value is passed back to the calling function is something that I'm not certain of, but I'm guessing that it is passed through the registers and not the stack. So, when the return is called, the underlying assembly code should deallocate p and q, place zero into the register, then return to the last position of the caller function. Of course, in this case, it is the main function, so it is more complicated in that, it causes the OS to terminate the process. But in other cases, it just goes back to the calling function.
In ANSI C, all the local variables should be placed at the top of the function and is allocated once into the stack when entering the function and deallocated when returning from the function. In C++ or later versions of C, this becomes more complicated when local variables can also be declared inside blocks (like if-else or while statement blocks). In this case, the local variable is allocated onto the stack when entering the block and deallocated when leaving the block.
In all cases, the address of a local variable is always a fixed number added or subtracted from the stack pointer (as calculated by the compiler, relative to the containing block) and the size of the variable is determined from the variable type.
However, static local variables and global variables are different in C. These are allocated in fixed locations in the memory, and thus there's a fixed address for them (or a fixed offset relative to the process' boundary), which is calculated by the linker.
Yet a third variety is memory allocated on the heap using malloc/new and free/delete. I think this discussion would be too lengthy if we include that as well.
That said, my description is only for a typical hardware architecture and OS. All of these are also dependent on a wide variety of things, as mentioned by Emmet.
p is a variable with automatic storage. It lives only as long as the function it is in lives. Every time its function is called memory for it is taken from the stack, therefore, its address can change and is not known until runtime.
How does a pointer look like in assembly, I know a instruction like 'mov' for lets say a pic, is converted to a sequence of bits, these bits activate the circuits to do the job, but a pointer, how is it managed to assembly and then manage to control circuits?.
a simngle pointer is transformed in several assembly instructions, how do they look like?
Pointer is nothing but a unsigned integer indicating position in a memory (virtual address space to be precise).
The instruction
mov eax,[ebp]
Moves value stored in a memory location whose address is stored in ebp into eax. Here ebp is a pointer.
Coming back to your question. Code is also data stored somewhere in memory and address of that memory is pointer. So using [] for dereferencing the pointer we can get that instruction(as done in above statement) and then cpu can interpret the code and execute.
Actually on 32bit x36 machines register eip stores a pointer which points to the memory which got current instruction which is getting executed.
The main feature of assembly is that a pointer can be present as label so that assemblers compiler can translate it to instructions op-code.
Actually any pointer is composed from one or more unsigned integers. At 8 bit PIC16 MCPUs you must first set appropriate memory bank, after that you can write in to memory, the reason is that address op-code size is only 7-bit in case that you are using direct memory addressing. You can also use indirect memory addressing in that case you must use FSR register that is composed from two 8-bit registers FSRL and FSRH. After setting pointer in FSR register you can read the result in INDF register as byte of that address.
Pointers are memory offsets. They look like integers with a width appropriate for the addressable size of the memory segment.