Consider the following code:
bool isFoo(const char* bar) {
return !strcmp(bar, "some_long_complicated_name");
}
Here, the string literal "some_long_complicated_name" is immediately passed to strcmp. Does this mean that everytime isFoo is called, accordingly many bytes of this string literal is allocated on that stack frame? If that was the case, wouldn't this:
const char FOO_NAME[] = "some_long_complicated_name";
bool isFoo(const char* bar) {
return !strcmp(bar, FOO_NAME);
}
be more efficient?
No, they are not inefficient. They are usually placed in the read-only memory part of your compiled binary, as their size is known at compile time and they can't be modified during runtime.
The expensive parts of strings (in terms of runtime performance) is the memory allocation. In both versions of isFoo, there is no memory allocation taking place, so I'd assume that it's quite hard to measure a performance difference between the two. FOO_NAME technically occupies some bytes somewhere, but is likely to be optimized away by the compiler.
Here are both versions on compiler explorer. The assembly with -O3 is not identical, but to be honest, I am not able to further exploit these results.
Constant strings do not get allocated, they are merely stored within the compiled binary and accessed via pointer. So no, there is no difference in speed between either approach.
There is absolute no change in the compiled file. It will result in the exact same binary!
If you compile both versions in a single executable like this:
bool isFoo(const char* bar) {
return !strcmp(bar, "some_long_complicated_name");
}
const char FOO_NAME[] = "some_long_complicated_name";
bool isFoo2(const char* bar) {
return !strcmp(bar, FOO_NAME);
}
int main()
{
isFoo( "nnn" );
isFoo2( "nnn" );
}
You can investigate the binary:
0000000000401156 <isFoo(char const*)>:
401156: 55 push %rbp
401157: 48 89 e5 mov %rsp,%rbp
40115a: 48 83 ec 10 sub $0x10,%rsp
40115e: 48 89 7d f8 mov %rdi,-0x8(%rbp)
401162: 48 8b 45 f8 mov -0x8(%rbp),%rax
401166: be c0 20 40 00 mov $0x4020c0,%esi
40116b: 48 89 c7 mov %rax,%rdi
40116e: e8 cd fe ff ff callq 401040 <strcmp#plt>
401173: 85 c0 test %eax,%eax
401175: 0f 94 c0 sete %al
401178: c9 leaveq
401179: c3 retq
000000000040117a <isFoo2(char const*)>:
40117a: 55 push %rbp
40117b: 48 89 e5 mov %rsp,%rbp
40117e: 48 83 ec 10 sub $0x10,%rsp
401182: 48 89 7d f8 mov %rdi,-0x8(%rbp)
401186: 48 8b 45 f8 mov -0x8(%rbp),%rax
40118a: be e0 20 40 00 mov $0x4020e0,%esi
40118f: 48 89 c7 mov %rax,%rdi
401192: e8 a9 fe ff ff callq 401040 <strcmp#plt>
401197: 85 c0 test %eax,%eax
401199: 0f 94 c0 sete %al
40119c: c9 leaveq
40119d: c3 retq
and here the strings are located:
4020c0 736f6d65 5f6c6f6e 675f636f 6d706c69 some_long_compli
4020d0 63617465 645f6e61 6d650000 00000000 cated_name......
4020e0 736f6d65 5f6c6f6e 675f636f 6d706c69 some_long_compli
4020f0 63617465 645f6e61 6d65006e 6e6e00 cated_name.nnn.
You also see the "nnn" string here!
The output was generated with:
objdump -s -S go | c++filt > x
Attention: You have to compile with -O0 as otherwise the compiler is smart enough to do all the stuff already in compile time. If I use -O2 none of the strings can be seen anymore and all call results are present already in the binary. Good to see how much a compiler can do in compile time!
So exactly NO difference, exactly the same binary code. But with standard optimization, no code generated for string compare, already done in compile time!
I modified main to see that the result of the comparison is used somewhere with:
int main()
{
volatile bool x;
x = isFoo( "nnn" );
x = isFoo2( "nnn" );
}
The resulting binary:
0000000000401060 <main>:
}
int main()
{
volatile bool x;
x = isFoo( "nnn" );
401060: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
x = isFoo2( "nnn" );
}
401065: 31 c0 xor %eax,%eax
x = isFoo2( "nnn" );
401067: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
}
40106c: c3 retq
As you can see, the result of the comparison is already present in the compiled code. No string is compared anymore in runtime.
For all questions regarding speed and memory usage: Measure! As you can see in the example, the results are different to most assumptions we see in other answers. If speed or memory footprint is really important: Take a look on the compiler generated results. Mostly it is much more perfect as you think!
Related
I wrote a very simple program in C and try to understand the function calling process.
#include "stdio.h"
void Oh(unsigned x) {
printf("%u\n", x);
}
int main(int argc, char const *argv[])
{
Oh(0x67611c8c);
return 0;
}
And its assembly code seems to be
0000000100000f20 <_Oh>:
100000f20: 55 push %rbp
100000f21: 48 89 e5 mov %rsp,%rbp
100000f24: 48 83 ec 10 sub $0x10,%rsp
100000f28: 48 8d 05 6b 00 00 00 lea 0x6b(%rip),%rax # 100000f9a <_printf$stub+0x20>
100000f2f: 89 7d fc mov %edi,-0x4(%rbp)
100000f32: 8b 75 fc mov -0x4(%rbp),%esi
100000f35: 48 89 c7 mov %rax,%rdi
100000f38: b0 00 mov $0x0,%al
100000f3a: e8 3b 00 00 00 callq 100000f7a <_printf$stub>
100000f3f: 89 45 f8 mov %eax,-0x8(%rbp)
100000f42: 48 83 c4 10 add $0x10,%rsp
100000f46: 5d pop %rbp
100000f47: c3 retq
100000f48: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
100000f4f: 00
0000000100000f50 <_main>:
100000f50: 55 push %rbp
100000f51: 48 89 e5 mov %rsp,%rbp
100000f54: 48 83 ec 10 sub $0x10,%rsp
100000f58: b8 8c 1c 61 67 mov $0x67611c8c,%eax
100000f5d: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
100000f64: 89 7d f8 mov %edi,-0x8(%rbp)
100000f67: 48 89 75 f0 mov %rsi,-0x10(%rbp)
100000f6b: 89 c7 mov %eax,%edi
100000f6d: e8 ae ff ff ff callq 100000f20 <_Oh>
100000f72: 31 c0 xor %eax,%eax
100000f74: 48 83 c4 10 add $0x10,%rsp
100000f78: 5d pop %rbp
100000f79: c3 retq
Well, I don't quite understand the argument passing process, since there is only one parameter passed to Oh function, I could under stand this
100000f58: b8 8c 1c 61 67 mov $0x67611c8c,%eax
So what does the the code below do? Why rbp? Isn't it abandoned in X86-64 assembly? If it is a x86 style assembly, how can I generate the x86-64 style assembly using clang? If it is x86, it doesn't matter, could any one explains the below code line by line for me?
100000f5d: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
100000f64: 89 7d f8 mov %edi,-0x8(%rbp)
100000f67: 48 89 75 f0 mov %rsi,-0x10(%rbp)
100000f6b: 89 c7 mov %eax,%edi
100000f6d: e8 ae ff ff ff callq 100000f20 <_Oh>
You might get cleaner code if you turned optimizations on, or you might not. But, here’s what that does.
The %rbp register is being used as a frame pointer, that is, a pointer to the original top of the stack. It’s saved on the stack, stored, and restored at the end. Far from being removed in x86_64, it was added there; the 32-bit equivalent was %ebp.
After this value is saved, the program allocates sixteen bytes off the stack by subtracting from the stack pointer.
There then is a very inefficient series of copies that sets the first argument of Oh() as the second argument of printf() and the constant address of the format string (relative to the instruction pointer) as the first argument of printf(). Remember that, in this calling convention, the first argument is passed in %rdi (or %edi for 32-bit operands) and the second in %rsi This could have been simplified to two instructions.
After calling printf(), the program (needlessly) saves the return value on the stack, restores the stack and frame pointers, and returns.
In main(), there’s similar code to set up the stack frame, then the program saves argc and argv (needlessly), then it moves around the constant argument to Oh into its first argument, by way of %eax. This could have been optimized into a single instruction. It then calls Oh(). On return, it sets its return value to 0, cleans up the stack, and returns.
The code you’re asking about does the following: stores the constant 32-bit value 0 on the stack, saves the 32-bit value argc on the stack, saves the 64-bit pointer argv on the stack (the first and second arguments to main()), and sets the first argument of the function it is about to call to %eax, which it had previously loaded with a constant. This is all unnecessary for this program, but would have been necessary had it needed to use argc and argv after the call, when those registers would have been clobbered. There’s no good reason it used two steps to load the constant instead of one.
As Jester mentions you still have frame pointers on (to aid debugging)so stepping through main:
0000000100000f50 <_main>:
First we enter a new stack frame, we have to save the base pointer and move the stack to the new base. Also, in x86_64 the stack frame has to be aligned to a 16 byte boundary (hence moving the stack pointer by 0x10).
100000f50: push %rbp
100000f51: mov %rsp,%rbp
100000f54: sub $0x10,%rsp
As you mention, x86_64 passes parameters by register, so load the param in to the register:
100000f58: mov $0x67611c8c,%eax
??? Help needed
100000f5d: movl $0x0,-0x4(%rbp)
From here: "Registers RBP, RBX, and R12-R15 are callee-save registers", so if we want to save other resisters then we have to do it ourselves ....
100000f64: mov %edi,-0x8(%rbp)
100000f67: mov %rsi,-0x10(%rbp)
Not really sure why we didn't just load this in %edi where it needs to be for the call to begin with, but we better move it there now.
100000f6b: mov %eax,%edi
Call the function:
100000f6d: callq 100000f20 <_Oh>
This is the return value (passed in %eax), xor is a smaller instruction than load 0, so is a cmmon optimization:
100000f72: xor %eax,%eax
Clean up that stack frame we added earlier (not really sure why we saved those registers on it when we didn't use them)
100000f74: add $0x10,%rsp
100000f78: pop %rbp
100000f79: retq
Assembly newbie here... I wrote the following simple C program:
void fun(int x, int* y)
{
char arr[4];
int* sp;
sp = y;
}
int main()
{
int i = 4;
fun(i, &i);
return 0;
}
I compiled it with gcc and ran objdump with -S, but the Assembly code output is confusing me:
000000000040055d <fun>:
void fun(int x, int* y)
{
40055d: 55 push %rbp
40055e: 48 89 e5 mov %rsp,%rbp
400561: 48 83 ec 30 sub $0x30,%rsp
400565: 89 7d dc mov %edi,-0x24(%rbp)
400568: 48 89 75 d0 mov %rsi,-0x30(%rbp)
40056c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
400573: 00 00
400575: 48 89 45 f8 mov %rax,-0x8(%rbp)
400579: 31 c0 xor %eax,%eax
char arr[4];
int* sp;
sp = y;
40057b: 48 8b 45 d0 mov -0x30(%rbp),%rax
40057f: 48 89 45 e8 mov %rax,-0x18(%rbp)
}
400583: 48 8b 45 f8 mov -0x8(%rbp),%rax
400587: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
40058e: 00 00
400590: 74 05 je 400597 <fun+0x3a>
400592: e8 a9 fe ff ff callq 400440 <__stack_chk_fail#plt>
400597: c9 leaveq
400598: c3 retq
0000000000400599 <main>:
int main()
{
400599: 55 push %rbp
40059a: 48 89 e5 mov %rsp,%rbp
40059d: 48 83 ec 10 sub $0x10,%rsp
int i = 4;
4005a1: c7 45 fc 04 00 00 00 movl $0x4,-0x4(%rbp)
fun(i, &i);
4005a8: 8b 45 fc mov -0x4(%rbp),%eax
4005ab: 48 8d 55 fc lea -0x4(%rbp),%rdx
4005af: 48 89 d6 mov %rdx,%rsi
4005b2: 89 c7 mov %eax,%edi
4005b4: e8 a4 ff ff ff callq 40055d <fun>
return 0;
4005b9: b8 00 00 00 00 mov $0x0,%eax
}
4005be: c9 leaveq
4005bf: c3 retq
First, in the line:
400561: 48 83 ec 30 sub $0x30,%rsp
Why is the stack pointer decremented so much in the call to 'fun' (48 bytes)? I assume it has to do with alignment issues, but I cannot visualize why it would need so much space (I only count 12 bytes for local variables (assuming 8 byte pointers))?
Second, I thought that in x86_64, the arguments to a function are either stored in specific registers, or if there are a lot of them, just 'above' (with a downward growing stack) the base pointer, %rbp. Like in the picture at http://en.wikipedia.org/wiki/Call_stack#Structure except 'upside-down'.
But the lines:
400565: 89 7d dc mov %edi,-0x24(%rbp)
400568: 48 89 75 d0 mov %rsi,-0x30(%rbp)
suggest to me that they are being stored way down from the base of the stack (%rsi and %edi are where main put the arguments, right before calling 'fun', and 0x30 down from %rbp is exactly where the stack pointer is pointing...). And when I try to do stuff with them , like assigning their values to local variables, it grabs them from those locations near the head of the stack:
sp = y;
40057b: 48 8b 45 d0 mov -0x30(%rbp),%rax
40057f: 48 89 45 e8 mov %rax,-0x18(%rbp)
... what is going on here?! I would expect them to grab the arguments from either the registers they were stored in, or just above the base pointer, where I thought they are 'supposed to be', according to every basic tutorial I read. Every answer and post I found on here related to stack frame questions confirms my understanding of what stack frames "should" look like, so why is my Assembly output so darn weird?
Because that stuff is a hideously simplified version of what really goes on. It's like wondering why Newtonian mechanics doesn't model the movement of the planets down to the millimeter. Compilers need stack space for all sorts of things. For example, saving callee-saved registers.
Also, the fundamental fact is that debug-mode compilations contain all sorts of debugging and checking machinery. The compiler outputs all sorts of code that checks that your code is correct, for example the call to __stack_chk_fail.
There are only two ways to understand the output of a given compiler. The first is to implement the compiler, or be otherwise very familiar with the implementation. The second is to accept that whatever you understand is a gross simplification. Pick one.
Because you're compiling without optimization, the compiler does lots of extra stuff to maybe make things easier to debug, which use lots of extra space.
it does not attempt to compress the stack frame to reuse memory for anything, or get rid of any unused things.
it redundantly copies the arguments into the stack frame (which requires still more memory)
it copies a 'canary' on to the stack to guard against stack smashing buffer overflows (even though they can't happen in this code).
Try turning on optimization, and you'll see more real code.
This is 64 bit code. 0x30 of stack space corresponds to 6 slots on the stack. You have what appears to be:
2 slots for function arguments (which happen also to be passed in registers)
2 slots for local variables
1 slot for saving the AX register
1 slot looks like a stack guard, probably related to DEBUG mode.
Best thing is to experiment rather than ask questions. Try compiling in different modes (DEBUG, optimisation, etc), and with different numbers and types of arguments and variables. Sometimes asking other people is just too easy -- you learn better by doing your own experiments.
I wrote the following program:
#include <stdio.h>
int main()
{
int i = 0;
for (; i < 4; i++)
{
printf("%i",i);
}
return 0;
}
I compiled it using gcc test.c -o test.o, then disassembled it using objdump -d -Mintel test.o. The assembly code I got (at least the relevant part) is the following:
0804840c <main>:
804840c: 55 push ebp
804840d: 89 e5 mov ebp,esp
804840f: 83 e4 f0 and esp,0xfffffff0
8048412: 83 ec 20 sub esp,0x20
8048415: c7 44 24 1c 00 00 00 mov DWORD PTR [esp+0x1c],0x0
804841c: 00
804841d: eb 19 jmp 8048438 <main+0x2c>
804841f: 8b 44 24 1c mov eax,DWORD PTR [esp+0x1c]
8048423: 89 44 24 04 mov DWORD PTR [esp+0x4],eax
8048427: c7 04 24 e8 84 04 08 mov DWORD PTR [esp],0x80484e8
804842e: e8 bd fe ff ff call 80482f0 <printf#plt>
8048433: 83 44 24 1c 01 add DWORD PTR [esp+0x1c],0x1
8048438: 83 7c 24 1c 03 cmp DWORD PTR [esp+0x1c],0x3
804843d: 7e e0 jle 804841f <main+0x13>
804843f: b8 00 00 00 00 mov eax,0x0
8048444: c9 leave
8048445: c3 ret
I noticed that, although my compare operation was i < 4, the assembly code is (after disassembly) i <= 3. Why does that happen? Why would it use JLE instead of JL?
Loops that count upwards, and have constant limit, are very common. The compiler has two options to implement the check for loop termination - JLE and JL. While the two ways seem absolutely equivalent, consider the following.
As you can see in the disassembly listing, the constant (3 in your case) is encoded in 1 byte. If your loop counted to 256 instead of 4, it would be impossible to use such an efficient encoding for the CMP instruction, and the compiler would have to use a "larger" encoding. So JLE provides a marginal improvement in code density (which is ultimately good for performance because of caching).
It would JLE because it shifted the value by one.
if (x < 4) {
// ran when x is 3, 2, 1, 0, -1, ... MIN_INT.
}
is logically equivalent to
if (x <= 3) {
// ran when x is 3, 2, 1, 0, -1, ... MIN_INT.
}
Why the compiler chose one internal representation over another is often a matter of optimization, but really it is hard to know if optimization was the true driver. In any case, functional equivalents like this is the reason why back-mapping isn't 100% accurate. There are many ways to write a condition that has the same effect over the same inputs.
I'm doing some experimenting with x86-64 assembly. Having compiled this dummy function:
long myfunc(long a, long b, long c, long d,
long e, long f, long g, long h)
{
long xx = a * b * c * d * e * f * g * h;
long yy = a + b + c + d + e + f + g + h;
long zz = utilfunc(xx, yy, xx % yy);
return zz + 20;
}
With gcc -O0 -g I was surprised to find the following in the beginning of the function's assembly:
0000000000400520 <myfunc>:
400520: 55 push rbp
400521: 48 89 e5 mov rbp,rsp
400524: 48 83 ec 50 sub rsp,0x50
400528: 48 89 7d d8 mov QWORD PTR [rbp-0x28],rdi
40052c: 48 89 75 d0 mov QWORD PTR [rbp-0x30],rsi
400530: 48 89 55 c8 mov QWORD PTR [rbp-0x38],rdx
400534: 48 89 4d c0 mov QWORD PTR [rbp-0x40],rcx
400538: 4c 89 45 b8 mov QWORD PTR [rbp-0x48],r8
40053c: 4c 89 4d b0 mov QWORD PTR [rbp-0x50],r9
400540: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400544: 48 0f af 45 d0 imul rax,QWORD PTR [rbp-0x30]
400549: 48 0f af 45 c8 imul rax,QWORD PTR [rbp-0x38]
40054e: 48 0f af 45 c0 imul rax,QWORD PTR [rbp-0x40]
400553: 48 0f af 45 b8 imul rax,QWORD PTR [rbp-0x48]
400558: 48 0f af 45 b0 imul rax,QWORD PTR [rbp-0x50]
40055d: 48 0f af 45 10 imul rax,QWORD PTR [rbp+0x10]
400562: 48 0f af 45 18 imul rax,QWORD PTR [rbp+0x18]
gcc very strangely spills all argument registers onto the stack and then takes them from memory for further operations.
This only happens on -O0 (with -O1 there are no problems), but still, why? This looks like an anti-optimization to me - why would gcc do that?
I am by no means a GCC internals expert, but I'll give it a shot. Unfortunately most of the information on GCCs register allocation and spilling seems to be out of date (referencing files like local-alloc.c that don't exist anymore).
I'm looking at the source code of gcc-4.5-20110825.
In GNU C Compiler Internals it is mentioned that the initial function code is generated by expand_function_start in gcc/function.c. There we find the following for handling parameters:
4462 /* Initialize rtx for parameters and local variables.
4463 In some cases this requires emitting insns. */
4464 assign_parms (subr);
In assign_parms the code that handles where each arguments is stored is the following:
3207 if (assign_parm_setup_block_p (&data))
3208 assign_parm_setup_block (&all, parm, &data);
3209 else if (data.passed_pointer || use_register_for_decl (parm))
3210 assign_parm_setup_reg (&all, parm, &data);
3211 else
3212 assign_parm_setup_stack (&all, parm, &data);
assign_parm_setup_block_p handles aggregate data types and is not applicable in this case and since the data is not passed as a pointer GCC checks use_register_for_decl.
Here the relevant part is:
1972 if (optimize)
1973 return true;
1974
1975 if (!DECL_REGISTER (decl))
1976 return false;
DECL_REGISTER tests whether the variable was declared with the register keyword. And now we have our answer: Most parameters live on the stack when optimizations are not enabled, and are then handled by assign_parm_setup_stack. The route taken through the source code before it ends up spilling the value is slightly more complicated for pointer arguments, but can be traced in the same file if you're curious.
Why does GCC spill all arguments and local variables with optimizations disabled? To help debugging. Consider this simple function:
1 extern int bar(int);
2 int foo(int a) {
3 int b = bar(a | 1);
4 b += 42;
5 return b;
6 }
Compiled with gcc -O1 -c this generates the following on my machine:
0: 48 83 ec 08 sub $0x8,%rsp
4: 83 cf 01 or $0x1,%edi
7: e8 00 00 00 00 callq c <foo+0xc>
c: 83 c0 2a add $0x2a,%eax
f: 48 83 c4 08 add $0x8,%rsp
13: c3 retq
Which is fine except if you break on line 5 and try to print the value of a, you get
(gdb) print a
$1 = <value optimized out>
As the argument gets overwritten since it's not used after the call to bar.
A couple of reasons:
In the general case, an argument to a function has to be treated like a local variable because it could be stored to or have its address taken within the function. Therefore, it is simplest to just allocate a stack slot for every arguments.
Debug information becomes much simpler to emit with stack locations: the argument's value is always at some specific location, instead of moving around between registers and memory.
When you're looking at -O0 code in general, consider that the compiler's top priorities are reducing compile-time as much as possible and generating high-quality debugging information.
Can a C compiler assume that two different extern globals cannot be aliased to the same address?
In my case, I have a situation like this:
extern int array_of_int[], array_end;
void some_func(void)
{
int *t;
for (t = &array_of_int[0]; t != &array_end; t++)
{
...
The resulting binary compiled with optimization on does not test the t != &array_end condition before entering the loop. The compiler's optimization is that the loop must execute at least once since t cannot immediately equal &array_end at the outset.
Of course we found this the hard way. Apparently, some assembler hackery with linker sections resulted in a case where the two externs are the same address.
Thanks for any advice!
In short, yes, it's free to make that assumption. There is nothing special about extern variables. Two variables may not be aliases of each other. (If the answer was any different, think about the chaos that would ensue. extern int a, b could alias each other, which would make the semantics of any code using those variables completely insane!)
In fact, you are relying on undefined behaviour here, full stop. It is not valid to compare addresses of unrelated variables in this way.
The C99 says in 6.2.2 "Linages of identifiers":
An identifier declared in different
scopes or in the same scope more than
once can be made to refer to the same
object or function by a process called
linkage. (Footnote 21)
...
Footnote 21: There is no linkage
between different identifiers.
So unfortunately, this somewhat common assembly language trick (that I've used...) isn't well-defined. You'd be better to have your assembly module define array_end to be be actual pointer that the asm code loads with the address of the end of the array. That way the C code can be well-defined since the array_end pointer would be a separate object.
I think here's the fixed code
#include <stdio.h>
extern int array_of_int[];
extern int *array_end;
int main()
{
int *t;
for (t = &array_of_int[0]; t != array_end; t++)
{
printf("%i\n", *t);
}
return 0;
}
in another compilation unit:
int array_of_int[] = { }; // { 1,2,3,4 };
int *array_end = array_of_int + (sizeof(array_of_int)/sizeof(array_of_int[0]));
It compiles into this (-O3, gcc 4.4.5 i686)
080483f0 <main>:
80483f0: 55 push %ebp
80483f1: 89 e5 mov %esp,%ebp
80483f3: 83 e4 f0 and $0xfffffff0,%esp
80483f6: 53 push %ebx
80483f7: 83 ec 1c sub $0x1c,%esp
80483fa: 81 3d 24 a0 04 08 14 cmpl $0x804a014,0x804a024
8048401: a0 04 08
8048404: 74 2f je 8048435 <main+0x45>
8048406: bb 14 a0 04 08 mov $0x804a014,%ebx
804840b: 90 nop
804840c: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
8048410: 8b 03 mov (%ebx),%eax
8048412: 83 c3 04 add $0x4,%ebx
8048415: c7 44 24 04 00 85 04 movl $0x8048500,0x4(%esp)
804841c: 08
804841d: c7 04 24 01 00 00 00 movl $0x1,(%esp)
8048424: 89 44 24 08 mov %eax,0x8(%esp)
8048428: e8 d7 fe ff ff call 8048304 <__printf_chk#plt>
804842d: 39 1d 24 a0 04 08 cmp %ebx,0x804a024
8048433: 75 db jne 8048410 <main+0x20>
8048435: 83 c4 1c add $0x1c,%esp
8048438: 31 c0 xor %eax,%eax
804843a: 5b pop %ebx
804843b: 89 ec mov %ebp,%esp
804843d: 5d pop %ebp
804843e: c3 ret
804843f: 90 nop
Its very simple in case if we do it in arm code -
we have an attribute for it ..
#include <stdio.h>
int oldname = 1;
extern int newname __attribute__((alias("oldname"))); // declaration
void foo(void)
{
printf("newname = %d\n", newname); // prints 1
}
and only extern is enough here.
To import it in other files - its seamless.
for assembly file - you can use IMPORT command and you have alias there. :)