How to post-modify a C pointer? - c

In modern processors it is possible to load a register from memory and then post-modify the indexing pointer by a desired value. For example, in our embedded processor, this will be done by:
ldr r0, [r1], +12
which means - load the value pointed to by r1 into r0 and then increment r1 by 12:
r0 = [r1]
r1 = r1 + 12
In the C language, using pointer arithmetics, one can assign a value using a pointer and then advance the pointer by 1:
char i, *p, a[3]={10, 20, 30};
p = &(a[0]);
i = *p++;
// now i==10 and p==&(a[1]).
I am looking for a way to dereference a pointer while post-modifying it by an offset other than 1. Is this possible in C, so it maps nicely to the similar asm instruction?
Note that:
i = *p+=2;
increases the value in a[0] w/o modifying the pointer, and:
i = *(p+=2);
pre-modifies the pointer, so in this case i==30.

Yes this is possible.
You shouldn't be doing weird pointer math to make it happen.
Not only is it about optimization settings, your GCC back-end needs to tell GCC that it has such a feature (i.e. when GCC itself is being compiled). Based on this knowledge, GCC automatically combines the relevant sequence into a single instruction.
i.e. if your back-end is written right, even something like:
a = *ptr;
ptr += SOME_CONST;
should become a single post-modify instruction.
How to correctly set this up when writing a back-end? (ask your friendly neighbourhood GCC back-end developer to do it for you):
If your GCC back-end is called foo:
In the GCC source tree, the back-end description and hooks will be located at gcc/config/foo/.
Among the files there (which get compiled along with GCC), there is usually a header foo.h which contains a lot of #defines describing machine features.
GCC expects that a back-end which supports post-increment define the macro HAVE_POST_INCREMENT to evaluate to true, and if it supports post-modify, then define the macro HAVE_POST_MODIFY_DISP to true. (post-increment => ptr++, post-modify => ptr += CONST). Maybe there are a few other things to be handled as well.
Assuming that your processor's back-end has got this right, lets move to what happens when you compile your code containing said post-modify sequence:
There is a specific GCC optimization pass that goes through instruction pairs that fall into this category and combines them. The source for that pass is here, and has a rather clear description of what GCC will do and how to get it to do it.
But this, in the end, is not in your control as a GCC user. It is in the control of the developer who wrote your GCC back-end. All you should be doing, like the most upvoted comment says, is:
a = *ptr;
ptr += SOME_CONST;

You can do it this way, but don't do it:
i = *((p += 2) - 2);
(not exactly post-modify)

The closest I can think of:
#define POST_INDEX_ASSIGN(lhs, ptr, index) (lhs = *(ptr), (ptr) += (index))
POST_INDEX_ASSIGN(i, p, 2);

i = *p;
p = (unsigned char*)p + 12;
where i is any kind of type and p is a pointer to that type.
If you don't add the typecast, the pointer increment will be done in steps with size == sizeof(*p), which would make the code completely different from the posted assembler.
For example, had p been an int* on a 32-bit system, the pointer would have been incremented 4*12 bytes without the typecast.

Related

How to directly access a memory address without using a pointer?

Below is my code.
Please note that I am not running this code on a normal machine, but on an architecture simulator (gem5).
#include <stdio.h>
int main()
{
int *p;
int x;
p = &x;
p[0] = 3;
// *(0xffff) = 6;
return 0;
}
If I uncomment the line, I get a compiler error (which is expected) "indirection requires pointer operand".
Since I am not running it on an actual machine, but on a simulator, I have control over how the hardware behaves, and the address space.
I want to store value 6 in address 0xffff. One way to do this is:
int *p;
p = (int *)0xffff;
p[0] = 6;
Please note that this will not result in a segmentation fault for me, because I am running it on a simulator, and I control the address space.
But, this is an inefficient way, because every time variable 'p' should be accessed, get 0xffff, and then store 6. Accessing variable 'p' even if declared as register, will take 1 cycle which is costly for me in the long run.
Since I know 'p' will always have 0xffff, can't I write something like
*(0xffff) = 3;
How do I allow the compiler to generate this code?
Any pointers will be of help.
I don't know if you can do it without a pointer at all, but if you don't want to declare a pointer on a separate line, you could do this:
*(int*)(0xffff) = 3;
If you are concerned with the extra cycles, you can use some inline assembly. Typically, the MOV instruction (MOV r/m32, imm32) allows you to move the immediate value into the register, or into the address pointed to by the register. If you wish to bypass the use of the register completely, you could use the C6 instruction (from this post).
c6 04 25 0xffff 3
opcode modr/m sib address immediate
Assuming you are using an x86 architecture, you can include that instruction in inline assembly in your source file.

How to dereference zero address with GCC? [duplicate]

This question already has answers here:
C standard compliant way to access null pointer address?
(5 answers)
Closed 7 years ago.
Suppose I need to write to zero address (e.g. I've mmapped something there and want to access it, for whatever reason including curiosity), and the address is known at compile time. Here're some variants I could think of to obtain the pointer, one of these works and another three don't:
#include <stdint.h>
void testNullPointer()
{
// Obviously UB
unsigned* p=0;
*p=0;
}
void testAddressZero()
{
// doesn't work for zero, GCC detects it as NULL
uintptr_t x=0;
unsigned* p=(unsigned*)x;
*p=0;
}
void testTrickyAddressZero()
{
// works, but the resulting assembly is not as terse as it could be
unsigned* p;
asm("xor %0,%0\n":"=r"(p));
*p=0;
}
void testVolatileAddressZero()
{
// p is updated, but the code doesn't actually work
unsigned*volatile p=0;
*p=0; // because this doesn't dereference p! // EDIT: pointee should also be volatile, then this will work
}
I compile this with
gcc test.c -masm=intel -O3 -c -o test.o
and then objdump -d test.o -M intel --no-show-raw-insn gives me (alignment bytes are skipped here):
00000000 <testNullPointer>:
0: mov DWORD PTR ds:0x0,0x0
a: ud2a
00000010 <testAddressZero>:
10: mov DWORD PTR ds:0x0,0x0
1a: ud2a
00000020 <testTrickyAddressZero>:
20: xor eax,eax
22: mov DWORD PTR [eax],0x0
28: ret
00000030 <testVolatileAddressZero>:
30: sub esp,0x10
33: mov DWORD PTR [esp+0xc],0x0
3b: mov eax,DWORD PTR [esp+0xc]
3f: add esp,0x10
42: ret
Here the testNullPointer obviously has UB since it dereferences what is null pointer by definition.
The principle of testAddressZero would give the expected code for any other than 0 address, e.g. 1, but for zero GCC appears to detect that address zero corresponds to null pointer, so also generates UD2.
The asm way of getting the zero address certainly inhibits the compiler's checks, but the price of that is that one has to write different assembly code for each architecture even if the principle of testAddressZero might have been successful (i.e. the same flat memory model on each arch) if not UD2 and similar traps. Also, the code appears not as terse as in the above two variants.
The way of volatile pointer would seem to be the best, but the code generated here appears to not dereference the address for some reason, so it's also broken.
The question now: if I'm targeting GCC, how can I seamlessly access zero address without any traps or other consequences of UB, and without the need to write in assembly?
As a workaround you can use the GCC option -fno-delete-null-pointer-checks that refrain the compiler to actively check for null pointer dereferencing.
While this option is intended to be used to speed-up code optimization it can be used in specific cases as this.
I would put the pointer into a global variable:
const uintptr_t zero = 0;
unsigned* zeroAddress= (unsigned *)zero;
void testZeroAddressPointer()
{
*zeroAddress=0;
}
Provided you expose the address beyond the scope of optimization (so the compiler can't figure out you don't set it somewhere else), that should do the trick, albeit slightly less efficiently.
Edit: make this code independent of implicit zero to null conversion.
The 0 address is the C99 NULL pointer (actually the "implementation" of the null pointer, which you can often write as 0....) on all the architectures I know about.
The null pointer has a very specific status in hosted C99: when a pointer can be (or was) dereferenced, it is guaranteed (by the language specification) to not be NULL (otherwise, it is undefined behavior).
Hence, the GCC compiler has the right to optimize (and actually will optimize)
int *p = something();
int x = *p;
/// the compiler is permitted to skip the following
/// because p has been dereferenced so cannot be NULL
if (p == NULL) { doit(); return; };
In your case, you might want to compile for the freestanding subset of the C99 standard. So compile with gcc -ffreestanding (beware, this option can bring some infelicities).
BTW, you might declare some extern char strange[] __attribute__((weak)); (perhaps even add asm("0") ...) and have some assembler or linker trick to make that strange have a 0 address. The compiler would not know that such a strange symbol is in fact at the 0 address...
My strong suggestion is to avoid dereferencing the 0 address.... See this. If you really need to deference the address 0, be prepared to suffer.... (so code some asm, lower the optimization, etc...).
(If you have mmap-ed the first page, just avoid using its first byte at address 0; that is often not a big deal.)
(IIRC, you are touching a grey area of GCC optimizations - and perhaps even of the C99 language specification, and you certainly want the free standing flavor of C; notice that -O3 optimization for free standing C is not well tested in the GCC compiler and might have residual bugs....)
You could consider changing the GCC compiler so that the null pointer has the numerical address 42. That would take some work.

Performance difference when accessing using pointer and double pointer

Is there any performance difference when we access a memory location by using a pointer and double pointer?
If so, which one is faster ?
There is no simple answer it, as the answer might depend in the actual machine. If I remember correctly some legacy machines (such as PDP11) offered a 'double pointer' access in a single instruction.
However, this is not the situation today. accessing memory is not as simple as it looks and requires a lot of work, due to virtual memory. For this reason - my guess is that double reference should in fact be slower on most modern machines - more work has to be done to translate two addresses from virtual addresses to physical addresses and retrieving them - but that's just educated guess.
Note however, that the compiler might optimize 'redundant' accesses for you already.
For my best knowledge however, there is no machine that has faster 'double access' than 'single access', so we can say that single access is not worse than double access.
As a side note, I believe in real life programs, the difference is neglectable (comparing to anything else done in the program), and unless done in a very performance sensitive loop - just do whatever is more readable. Also, the compiler might optimize it for you already if it can.
Assuming you are talking about something like
int a = 10;
int *aptr = &a;
int **aptrptr = &aptr;
Then the cost of
*aptr = 20;
Is one dereference. The address pointed to by aptr must first be retrieved and then the address can be stored to.
The cost of
**aptrptr = 30;
Is two dereferences. The address pointed to by aptrptr must first be retrieved. Then the addess stored in that address must be retrieved. Then this address can be stored to.
Is this what you were asking?
Therefore, to conclude using a single pointer is faster if that suits your needs.
Note, that if you access a pointer or double pointer in a loop, for example,
while(some condition)
*aptr = something;
or
while(some condition)
**aptrptr = something;
The compiler will likely optimize so that the dereferencing is only done once at the start of the loop, so the cost is only 1 extra address fetch rather than N, where N is the numnber of times the loop executes.
EDIT:
(1) As Amit correctly points out the "how" of pointer access is not explicitly a C thing... it does depend on the underlying architecture. If your machine supports a double dereference as a single instruction then there might not be a big difference. He is using the index deferred addressing mode of the PDP11 as an example. You might find out that such an instruction still chews up more cycles... consult the hardware documentation and look at the optimization that your C compiler is able to apply for your specific architecture.
The PDP11 architecture is circa the 1970s. As far as I know (if someone knows are modern architecture that can do this pleas post!), most RISC architectures and don't have such a double dereference and will probably need to do two fetches as far as I am aware.
Therefore, to conclude using a single pointer is probably faster generally, but with the caveat that specific architectures may handle this better than others and compiler optimizations, as I discussed, could make the difference negligible... to be sure you just have to profile your code and read up about your architecture :)
Let's see it in this way:
int var = 5;
int *ptr_to_var = &var;
int **ptr_to_ptr = &ptr;
When the variable var is accessed then you need to
1.get the address of the variable
2.fetch its value from that address.
In case of pointer ptr_to_var you need to
1.get the address of the pointer variable
2.fetch its value from that address (i.e, address of the variable var)
3.fetch the value at the address pointed to.
In third case, pointer to pointer to int variable ptr_to_ptr, you need to
1.get the address of the pointer to pointer variable
2.fetch its value from that address (i.e, address of the pointer to variable ptr_var)
3.again fetch its value from the address fetched in the second step(i.e, address of the variable var)
4.fetch the value at the address pointed to.
So we can say that accessing via pointer to pointer variable is slower than that of pointer variable which in turn slower than that of normal variable accessing.
I got curious and set up the following scenario:
int v = 0;
int *pv = &v;
int **ppv = &pv;
I tried dereferencing the pointers and took a look at the disassembly, which showed the following:
int x;
x = *pv;
00B33C5B mov eax,dword ptr [pv]
00B33C5E mov ecx,dword ptr [eax]
00B33C60 mov dword ptr [x],ecx
x = **ppv;
00B33C63 mov eax,dword ptr [ppv]
00B33C66 mov ecx,dword ptr [eax]
00B33C68 mov edx,dword ptr [ecx]
00B33C6A mov dword ptr [x],edx
You can see that there is an additional mov instruction for dereferencing there so my best guess is: double dereferencing is inevitably slower.

Can I make a pointer to the code, and pass to the next instruction?

Like this link http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Labels-as-Values.html
I can get the memory address of an label, so if I declare a label, get your address, and add your address, i will pass to next instruction? some ilustration >
int main () {
void *ptr;
label:
instruction 1;
instruction 2;
ptr = &&label;
// So if I do it...
ptr = ptr + 1;
// I will get the instruction 2 correct??
Thanks for all answers.
No, I don't think so.
First of, you seem to take the address of a label, which doesn't work. The label is interpreted by the compiler but it does not represent an actual adress in your code.
Second, every statement in C/C++ (in fact any language) can be translated to many machine language instructions, so instruction 1 could be translated to 3, 5, 10 or even more machine instructions.
Third, your pointer points to void. The C compiler does not know how to increment a void pointer. Normally when you increment a pointer, it adds the size of the data type you are pointing to to the address. So incrementing a long-pointer will add 4 bytes; incrementing a char-pointer will add 1 byte. In this case you have a void-pointer, which points to nothing, and thus cannot be incremented.
Fourth, I don't think that all instructions in x86 machine language are represented by the same number of bytes. So you cannot expect from adding something to a pointer that it gets to the next instruction. You might also end up in the middle of the next instruction.
You can't perform arithmetic on a void*, and the compiler wouldn't know what to add to the pointer to have it point to the next 'instruction' anyway - there is no 1 to 1 correspondence between C statement and the machine code emitted by the compiler. Even for CPUs which have a 'regular' instruction set where instructions are the same size (as opposed to something like the x86 where instructions have a variable number of bytes), a single C statement may result in several CPU instructions (or maybe only one - who knows?).
Expanding on an example in the GCC docs, you might be able to get by with something like the following, but it requires a label for each statement you want to target:
void *statements[] = { &&statement1, &&statement2 };
void** ptr;
statement1:
instruction 1;
statement2:
instruction 2;
ptr = statements;
// goto **ptr; // <== this will jump to 'instruction 1'
// goto **(ptr+1); // <== this will jump to 'instruction 2'
Note that the &&label syntax is described under C Extensions section in GCC docs. It's not C, it's GCC.
Plus, void* does not allow pointer arithmetic - it's a catch-all sort of type in C for pointing at anything. The assumption is that the compiler does not know size of the object it points to (but the programmer should :).
Even more, instruction sizes are widely different on different architectures - four bytes on SPARC, but variable length on x86, for example.
I.e. it doesn't work in C. You will have to use inline assembler for this sort of things.
No, because you can't increment void *.
void fcn() { printf("hello, world\n"); }
int main()
{
void (*pt2Function)() = fcn;
pt2Function(); // calls fcn();
// error C2171: '++' : illegal on operands of type 'void (__cdecl *)(void)'
// ++pt2Function;
return 0;
}
This is VC++, but I suspect gcc is similar.
Edited to add
Just for fun, I tried this—it crashed:
int nGlobal = 0;
__declspec(naked) void fcn()
{
// nop is 1-byte instruction that does nothing
_asm { nop }
++nGlobal;
_asm { ret }
}
int main()
{
void (*pt2Function)() = fcn;
// this works, incrementing nGlobal:
pt2Function();
printf("nGlobal: %d", nGlobal);
char *p = (char *) pt2Function;
++p; // point past the NOP?
pt2Function = (void (*)()) p;
// but this crashes...
pt2Function();
printf("nGlobal: %d", nGlobal);
return 0;
}
It crashed because this line doesn't do what I thought it did:
void (*pt2Function)() = fcn;
I thought it would take the address of the first instruction of fcn(), and put it in pt2Function. That way my ++p would make it point to the second instruction (nop is one byte long).
It doesn't. It puts the address of a jmp instruction (found in a big jump table) into pt2Function. When you increment it by one byte, it points to a meaningless location in the jump table.
I assume this is implementation-specific.
I would say "probably not". The value of the pointer will be right, because the compiler knows, but I doubt that the + 1 will know the length of instructions.
Let us suppose there's a way to get the address of a label (that is no an extension of a specific compiler). Then the problem would really be "the next instruction" idea: it can be very hard to know which is the next instruction. It depends on the processor, and on processors like x86 to know the length of an instruction you have to decode it, not fully of course but it is anyway some complex job... on notable RISC architectures, instructions' length is a lot easier and getting the next instruction could be as easy as incrementing the address by 4. But there's no a general way to do it at runtime, while at compile time it could be easier, but to allow it in a C-coherent way, C should have the type "instruction", so that "instruction *" can be a pointer to an instruction, and incrementing such a pointer would point correctly to the next instruction, provided the code is known at compile time (so, such a pointer can't point really to everything pointer can point to in general). At compile time the compiler could implement this feature easily adding another "label" just beyond the generated instruction pointed by the "first" "label". But it would be cheating...
Moreover, let us suppose you get the address of a C label, or C function, or whatever. If you skip the first instruction, likely you won't be able to "use" that address to execute the code (less the first instruction), since without that single instruction the code may become buggy... unless you know for sure you can skip that single instruction and obtain what you want, but you can't be sure... unless you take a look at the code (which can be different from compiler to compiler), and then all the point of doing such a thing from C disappears.
So, briefly, the answer is no, you can't compute the pointer to the next instruction; and if you do someway, the fact that you're pointing to code becomes meaningless since you can't jump to that address and be sure of the final behaviour.

Printf the current address in C program

Imagine I have the following simple C program:
int main() {
int a=5, b= 6, c;
c = a +b;
return 0;
}
Now, I would like to know the address of the expression c=a+b, that is the program address
where this addition is carried out. Is there any possibility that I could use printf?
Something along the line:
int main() {
int a=5, b= 6, c;
printf("Address of printf instruction in memory: %x", current_address_pointer_or_something)
c = a +b;
return 0;
}
I know how I could find the address out by using gdb and then info line file.c:line. However, I should know if I could also do that directly with the printf.
In gcc, you can take the address of a label using the && operator. So you could do this:
int main()
{
int a=5, b= 6, c;
sum:
c = a+b;
printf("Address of sum label in memory: %p", &&sum);
return 0;
}
The result of &&sum is the target of the jump instruction that would be emitted if you did a goto sum. So, while it's true that there's no one-to-one address-to-line mapping in C/C++, you can still say "get me a pointer to this code."
Visual C++ has the _ReturnAddress intrinsic, which can be used to get some info here.
For instance:
__declspec(noinline) void PrintCurrentAddress()
{
printf("%p", __ReturnAddress);
}
Which will give you an address close to the expression you're looking at. In the event of some optimizations, like tail folding, this will not be reliable.
Tested in Visual Studio 2008:
int addr;
__asm
{
call _here
_here: pop eax
; eax now holds the PC.
mov [addr], eax
}
printf("%x\n", addr);
Credit to this question.
Here's a sketch of an alternative approach:
Assume that you haven't stripped debug symbols, and in particular you have the line number to address table that a source-level symbolic debugger needs in order to implement things like single step by source line, set a break point at a source line, and so forth.
Most tool chains use reasonably well documented debug data formats, and there are often helper libraries that implement most of the details.
Given that and some help from the preprocessor macro __LINE__ which evaluates to the current line number, it should be possible to write a function which looks up the address of any source line.
Advantages are that no assembly is required, portability can be achieved by calling on platform-specific debug information libraries, and it isn't necessary to directly manipulate the stack or use tricks that break the CPU pipeline.
A big disadvantage is that it will be slower than any approach based on directly reading the program counter.
For x86:
int test()
{
__asm {
mov eax, [esp]
}
}
__declspec(noinline) int main() // or whatever noinline feature your compiler has
{
int a = 5;
int aftertest;
aftertest = test()+3; // aftertest = disasms to 89 45 F8 mov dword ptr [a],eax.
printf("%i", a+9);
printf("%x", test());
return 0;
}
I don't know the details, but there should be a way to make a call to a function that can then crawl the return stack for the address of the caller, and then copy and print that out.
Using gcc on i386 or x86-64:
#include <stdio.h>
#define ADDRESS_HERE() ({ void *p; __asm__("1: mov 1b, %0" : "=r" (p)); p; })
int main(void) {
printf("%p\n", ADDRESS_HERE());
return 0;
}
Note that due to the presence of compiler optimizations, the apparent position of the expression might not correspond to its position in the original source.
The advantage of using this method over the &&foo label method is it doesn't change the control-flow graph of the function. It also doesn't break the return predictor unit like the approaches using call :)
On the other hand, it's very much architecture-dependent... and because it doesn't perturb the CFG there's no guarantee that jumping to the address in question would make any sense at all.
If the compiler is any good this addition happens in registers and is never stored in memory, at least not in the way you are thinking. Actually a good compiler will see that your program does nothing, manipulating values within a function but never sending those values anywhere outside the function can result in no code.
If you were to:
c = a+b;
printf("%u\n",c);
Then a good compiler will also never store that value C in memory it will stay in registers, although it depends on the processor as well. If for example compilers for that processor use the stack to pass variables to functions then the value for c will be computed using registers (a good compiler will see that C is always 11 and just assign it) and the value will be put on the stack while being sent to the printf function. Naturally the printf function may well need temporary storage in memory due to its complexity (cant fit everything it needs to do in registers).
Where I am heading is that there is no answer to your question. It is heavily dependent on the processor, compiler, etc. There is no generic answer. I have to wonder what the root of the question is, if you were hoping to probe with a debugger, then this is not the question to ask.
Bottom line, disassemble your program and look at it, for that compile on that day with those settings, you will be able to see where the compiler has placed intermediate values. Even if the compiler assigns a memory location for the variable that doesnt mean the program will ever store the variable in that location. It depends on optimizations.

Resources