following is a sample code for GCC variable attribute extension,
#include<stdio.h>
int main(void){
int sam __attribute__((unused))= 10;
int p = sam+1;
printf("\n%d" , p);
}
for the assembly code of above program generated using:
gcc -S sample.c
the .s file dosen't contain the variable sam in it,whereas the output of program is "11" which is correct.
So does the compiler neglect completely the unused variable and not output it in the executable? If so why is the output of program correct?Can anyone explain the working of unused and used variable attributes in gcc.
Thanks
So does the compiler neglect completely the unused variable
Depends what you mean by "neglect". The compiler optimizes the code, but doesn't completely ignore the variable, as otherwise compiler couldn't calculate the result.
not output it in the executable?
Yes.
If so why is the output of program correct?
Because this is what a compiler does - generates programs with the output as described by the programming language. Code is not 1:1 to assembly, code is a language that describes the behavior of a program. Theoretically, as long as the output of the compiled program is correct with what is in the source code, compiler can generate any assembly instructions it wants.
You may want to research the terms side effect and the as-if rule in the context of C programming language.
Can anyone explain the working of unused and used variable attributes in gcc.
There is no better explanation then in GCC documentation about Variable Attributes:
unused
This attribute, attached to a variable, means that the variable is meant to be possibly unused. GCC does not produce a warning for this variable.
used
This attribute, attached to a variable with static storage, means that the variable must be emitted even if it appears that the variable is not referenced.
When applied to a static data member of a C++ class template, the attribute also means that the member is instantiated if the class itself is instantiated.
The attribute unused is to silence the compiler warning from -Wunused-* warnings.
The attribute used is meant to be used on variables (but I think works also on functions) so that the variable is generated to the assembly code. even if it is not unused anywhere.
try compiling with no optimisation
gcc -O0 sample.c -S -masm=intel
generated assembly code
.file "sample.c"
.intel_syntax noprefix
.text
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "\12%d\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
push rbp
.seh_pushreg rbp
mov rbp, rsp
.seh_setframe rbp, 0
sub rsp, 48
.seh_stackalloc 48
.seh_endprologue
call __main
mov DWORD PTR -4[rbp], 10
mov eax, DWORD PTR -4[rbp]
add eax, 1
mov DWORD PTR -8[rbp], eax
mov eax, DWORD PTR -8[rbp]
mov edx, eax
lea rcx, .LC0[rip]
call printf
mov eax, 0
add rsp, 48
pop rbp
ret
.seh_endproc
.ident "GCC: (GNU) 10.2.0"
.def printf; .scl 2; .type 32; .endef
Related
Can the following code:
float f = sinf(0.5f);
be optimized to the following code (actually pseudocode to give an idea):
float f = 0x1.eaee88p-2f;
feraiseexcept(FE_INEXACT);
The same principle for the other C standard library functions. As I understand, since the "whole language is based on the (rather unhelpful) concept of an "abstract machine" (link), then an "implementation is free to do anything in the ways of optimizations as long as side effects and "the observable behavior" are respected" (the same link).
Some compilers will optimize it to the simple constant expression. It is because the compiler knows how functions from the standard library work. The compiler knows that sinf does not have side effects.
int main(void)
{
float f = sinf(0.5f);
printf("%f\n", f);
}
.LC1:
.string "%f\n"
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC1
mov eax, 1
movsd xmm0, QWORD PTR .LC0[rip]
call printf
xor eax, eax
add rsp, 8
ret
.LC0:
.long -2147483648
.long 1071558376
Let's say I have the following assembly program:
.globl _start
_start:
mov $1, %eax
int $0x80
And I assemble/link it with:
$ as file.s
$ ld a.out -o a
This will run fine, and return the status code of 0 to linux. However, when I remove the line .globl start I get the following error:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400078
What does 0000000000400078 mean? And also, if ld expects the _start symbol on entry, why is it even necessary to declare .globl _start ?
However, when I remove the line .globl _start ...
The .globl line means that the name _start is "visible" outside the file file.s. If you remove that line, the name _start is only for use inside the file file.s and in a larger program (containing multiple files) you could even use the name _start in multiple files.
(This is similar to static variables in C/C++: If you generate assembler code from C or C++, the difference between real global variables and static variables is that there is a .globl line for the global variables and no .globl line for static variables. And if you are familiar with C, you know that static variables cannot be used in other files.)
The linker (ld) is also not able to use the name _start if it can be used inside the file only.
What does 0000000000400078 mean?
Obviously 0x400078 is the address of the first byte of your program. ld assumes that the program starts at the first byte if no symbol named _start is found.
... why is it even necessary to declare .globl _start?
It is not guaranteed that _start is located at the first byte of your program.
Counterexample:
.globl _start
write_stdout:
mov $4, %eax
mov $1, %ebx
int $0x80
ret
exit:
mov $1, %eax
mov $0, %ebx
int $0x80
jmp exit
_start:
mov $text, %ecx
mov $(textend-text), %edx
call write_stdout
mov $text2, %ecx
mov $(textend2-text2), %edx
call write_stdout
call exit
text:
.ascii "Hello\n"
textend:
text2:
.ascii "World\n"
textend2:
If you remove the .globl line, ld will not be able to find the _start: line and assume that your program starts at the first byte - which is the write_stdout: line!
... and if you have multiple .s files in a larger program (or even a combination of .s, .c and .cc), you don't have control about which code is located at the first byte of your program!
I'm attempting to get through a book on X86 that was written using examples from Visual C++ and Visual Studio. I'm trying to convert the examples for use with gcc. After a number of problems, I finally wound up with code that would at least compile, but now I'm getting segfaults. Here's the code:
assembly.s:
.intel_syntax noprefix
.section .text
.globl CalcSum
.type CalcSum, #function
// extern "C" int CalcSum_(int a, int b, int c)
CalcSum:
// Initialize a stack frame pointer
pushq rbp
mov ebp,esp
// Load the argument values
mov eax,[ebp+8]
mov ecx,[ebp+12]
mov edx,[ebp+16]
// Calculate the sum
add eax, ecx
add eax, edx
// Restore the caller's stack frame pointer
popq rbp
ret
test.c:
#include <stdio.h>
extern int CalcSum(int a, int b, int c);
int main() {
int sum = CalcSum(5,6,7);
printf(" result: %d\n",sum);
return 0;
}
I'm using gcc -o execute test.c assembly.s to compile. If I change all the 32 bit instructions to 64 bit (i.e. ebp to rbp) it will run but give completely random output. Could anyone point out what I'm doing wrong here? Thanks!
As hinted in the comments, it's a matter of calling convention. 32-bit C functions follow the CDECL calling convention in Windows and in Linux. In 64-bit Linux you have to use the System V AMD64 ABI. The 64-bit calling convention of Windows is different. There might be specifics to use functions of the operating system.
32-bit C (GCC):
.intel_syntax noprefix
.section .text
.globl CalcSum
.type CalcSum, #function
// extern "C" int CalcSum_(int a, int b, int c)
CalcSum: // with underscore in Windows: _CalcSum
// Initialize a stack frame pointer
push ebp
mov ebp,esp
// Load the argument values
mov eax,[ebp+8]
mov ecx,[ebp+12]
mov edx,[ebp+16]
// Calculate the sum
add eax, ecx
add eax, edx
// Restore the caller's stack frame pointer
pop ebp
ret
64-bit Linux (GCC):
.intel_syntax noprefix
.section .text
.globl CalcSum
.type CalcSum, #function
// extern "C" int CalcSum_(int a, int b, int c)
CalcSum:
// Load the argument values
mov rax, rdi
add rax, rsi
add rax, rdx
ret
64-bit Windows (MingW-GCC):
.intel_syntax noprefix
.section .text
.globl CalcSum
// .type CalcSum, #function
// extern "C" int CalcSum_(int a, int b, int c)
CalcSum:
// Load the argument values
mov rax, rcx
add rax, rdx
add rax, r8
ret
How to specify Win32 as output when invoking GCC using MinGW on Windows.
Below I've posted my source code. My objective is to interface assembly with C code and produce an executable.
I start assembling the add.asm to Win32 using the following NASM command:
nasm -f win32 add.asm
Then it should be possible to invoke GCC using both C and object files?
gcc -o add add.obj call_asm.c
However, this results in an a linkage error:
C:\Users\nze\AppData\Local\Temp\cckUvRyC.o:call_asm.c:(.text+0x1e): undefined reference to `add'
collect2.exe: error: ld returned 1 exit status
If I instead compile to ELF using
nasm -f elf add.asm
the command (this time using the ELF file add.o)
gcc -o add add.o call_asm.c
works perfectly.
How can I tell GCC that my object files are in Win32 format, so that it should compile call_asm.c to Win32 before linking? (I guess this is the core of the problem, please comment whether I'm correct).
call_add.c:
#include <stdio.h>
extern int add(int a, int b);
int main()
{
printf("%d", add(7, 6));
}
add.asm:
BITS 32
global _add
_add:
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ebx, [ebp+12]
add eax, ebx
mov esp, ebp
pop ebp
ret
The problem isn't what you assume it is. GCC is generating "win32" format (more commonly know as PECOFF) object files. The problem is that your assembly code doesn't define a section, and this results in NASM not defining the symbol _add in the generated object file.
If you add a SECTION directive your code links and runs without error:
BITS 32
SECTION .text
global _add
_add:
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ebx, [ebp+12]
add eax, ebx
mov esp, ebp
pop ebp
ret
Telling NASM to generate and ELF object file changes its behaviour, for whatever reason, and causes it to define the _add symbol in the ELF object file.
Just add this before the label:
.globl _add
To get that symbol to export in a .DLL you should add this at the end of the file:
.section .drectve
.ascii " -export:\"add\""
Note that the leading underscore is left out.
I'm trying to call a simple piece of assembly (as a test for something more complex later), however when I try and run the program it crashes (This program has stopped responding).
main.c:
#include <stdio.h>
#include <stdlib.h>
extern int bar(int param);
int main()
{
int i=8;
i = bar(i);
printf("Hello world! - %i\n",i);
return 0;
}
bar.S
.file "bar.S"
.text
.align 8
.global bar
bar:
add %rdi,1000;
mov %rax,%rdi;
ret;
I'm concerned that it might be something to do with the way my compiler is configured (I'm more used to the hand holding of Visual Studio than dealing with a real environment).
You are using at&t syntax assembly but you are apparently not familiar with it. The simple solution would be to stick .intel_syntax noprefix into bar.S so you can use intel syntax.
At&t syntax uses reversed operand order and different effective address format, among other things. You got a crash because add %rdi, 1000 means add [1000], rdi in intel syntax, that is add the content of rdi to memory location 1000 which is out of bounds. Presumably you wanted to do add $1000, %rdi. To return the value you need to swap the operands of the mov %rax, %rdi too.
This code is incorrect:
add %rdi,1000;
mov %rax,%rdi;
Remember that in AT&T syntax the operand order is source, destination. Also, immediate values should be prefixed by a $. So the code should be:
add $1000,rdi
mov %rdi,%rax
I removed the semicolons since they're not necessary.
Also, since you seem to be compiling for Windows you should be following Microsoft's 64-bit calling convention, not the System V one. So the argument will be in rcx, not in rdi.
start with this
int bar ( int param )
{
return(param);
}
compile separately and link with main, and see what main is doing and passing, note main is using edi not rdi.
Now dissassemble the function above.
0000000000000000 <bar>:
0: 89 f8 mov %edi,%eax
2: c3 retq
edi and eax as well. Also note that this is ATT syntax not intel, so it is backwards the destination is on the right instead of the left.
so make different flavors of our own:
.global bark
bark:
mov %edi,%eax
addl $1000,%eax
retq
.global barf
barf:
addl $1000,%edi
mov %edi,%eax
retq
.global bar
bar:
add $1000,%edi
mov %edi,%eax
retq
assemble and link with main instead of the C version. And
./main
Hello world! - 1008
Basically, whatever compiler you are using, get it to generate similar/simple code which will follow its convention, then mimic that.
Note, I am using gcc not necessarily the same as what you are running, but the process is the same.