I am using GCC in 32-bit mode on a Windows 7 machine under cygwin. I have the following function:
unsigned f1(unsigned x, unsigned y)
{
return x*y;
}
I want the code to do an unsigned multiply and as such I would expect it to generate the mul instruction, not the imul instruction. I compile the program
with the following command:
gcc -m32 -S t4.c
The generated assembly code is:
.file "t4.c"
.text
.globl _f1
.def _f1; .scl 2; .type 32; .endef
_f1:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
imull 12(%ebp), %eax
popl %ebp
ret
.ident "GCC: (GNU) 4.8.2"
I believe that the generated code has the wrong multiply instruction in it but I find it hard to believe that GCC has such a simple bug. Please comment.
The compiler relies on the "as-if" rule: No standard conforming program can detect a difference between what this program does and what the program should do, since the lowest 32 bits of the result are the same for both instructions.
Related
I was putting together a C riddle for a couple of my friends when a friend drew my attention to the fact that the following snippet (which happens to be part of the riddle I'd been writing) ran differently when compiled and run on OSX
#include <stdio.h>
#include <string.h>
int main()
{
int a = 10;
volatile int b = 20;
volatile int c = 30;
int data[3];
memcpy(&data, &a, sizeof(data));
printf("%d %d %d\n", data[0], data[1], data[2]);
}
What you'd expect the output to be is 10 20 30, which happens to be the case under Linux, but when the code is built under OSX you'd get 10 followed by two random numbers. After some debugging and looking at the compiler-generated assembly I came to the conclusion that this is due to how the stack is built. I am by no means an assembly expert, but the assembly code generated on Linux seems pretty straightforward to understand while the one generated on OSX threw me off a little. Perhaps I could use some help from here.
This is the code that was generated on Linux:
.file "code.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.section .text.startup,"ax",#progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB23:
.cfi_startproc
movl $10, -12(%rsp)
xorl %eax, %eax
movl $20, -8(%rsp)
movl $30, -4(%rsp)
ret
.cfi_endproc
.LFE23:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609"
.section .note.GNU-stack,"",#progbits
And this is the code that was generated on OSX:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 12
.globl _main
.p2align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $20, -8(%rbp)
movl $30, -4(%rbp)
leaq L_.str(%rip), %rdi
movl $10, %esi
xorl %eax, %eax
callq _printf
xorl %eax, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "%d %d %d\n"
.subsections_via_symbols
I'm really only interested in two questions here.
Why is this happening?
Are there any get-arounds to this issue?
I know this is not a practical way to utilize the stack as I'm a professional C developer, which is really the only reason I found this problem interesting to invest some of my time into.
Accessing memory past the end of a declared variable is undefined behaviour - there is no guarantee as to what will happen when you try to do that. Because of how the compiler generated the assembly under Linux, you happened to get the 3 variables directly in a row on the stack, however that behaviour is just a coincidence - the compiler could legally add extra data in between the variables on the stack or really do anything - the result is not defined by the language standard. So in answer to your first question, it's happening because what you're doing is not part of the language by design. In answer to your second, there's no way to reliably get the same result from multiple compilers because the compilers are not programmed to reliably reproduce undefined behaviour.
undefined behavior. You don't expect to copy 10, 20 ,30. You hope not to seg-fault.
There is nothing to guarantee that a,b, and c are sequential memory addresses, which is your naive assumption. On Linux, the compiler happened to make them sequential. You can't even rely on gcc always doing that.
You already know that the behavior is undefined. A good reason for the behavior to be different on OS/X and Linux is these systems use a different compiler, that generates different code:
When you run gcc in Linux, you invoke the installed version the Gnu C compiler.
When you run gcc in your version of OS/X, you most likely invoke the installed version of clang.
Try gcc --version on both systems and amaze your friends.
I'm reading some C code embedded with a few assembly code. I understand that __asm__ is a statement to run assembly code, but what does __asm__ do in the following code? According to the output (i.e., r = 16), it seems that __asm__ does not effect the variable r. Isn't it?
#include <stdio.h>
static void foo()
{
static volatile unsigned int r __asm__ ("0x0019");
r |= 1 << 4;
printf("foo: %u\n", r);
}
Platform: Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn) on OSX Yosemite
Strictly speaking, your "asm" snippet simply loads a constant (0x0019).
Here's a 32-bit example:
#include <stdio.h>
static void foo()
{
static volatile unsigned int r __asm__ ("0x0019");
static volatile unsigned int s __asm__ ("0x1122");
static volatile unsigned int t = 0x3344;
printf("foo: %u %u %u\n", r, s, t);
}
gcc -O0 -S x.c
cat x.c
.file "x.c"
.data
.align 4
.type t.1781, #object
.size t.1781, 4
t.1781:
.long 13124 # Note: 13124 decimal == 0x3344 hex
.local 0x1122
.comm 0x1122,4,4
.local 0x0019
.comm 0x0019,4,4
.section .rodata
.LC0:
.string "foo: %u %u %u\n"
.text
.type foo, #function
foo:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl t.1781, %eax
movl 0x1122, %edx
movl 0x0019, %ecx
movl %eax, 12(%esp)
movl %edx, 8(%esp)
movl %ecx, 4(%esp)
movl $.LC0, (%esp)
call printf
leave
ret
PS:
The "asm" syntax is applicable to all gcc-based compilers.
PPS:
I absolutely encourage you to experiment with assembly anywhere you please: embedded systems, Ubuntu, Mac OSX - whatever pleases you.
Here is an excellent book. It's about Linux, but it's also very largely applicable to your OSX:
Programming from the Ground Up, Jonathan Bartlett
Also:
https://www.hackerschool.com/blog/7-understanding-c-by-learning-assembly
http://fabiensanglard.net/macosxassembly/
PPS:
x86 assembly syntax comes in two variants: "Intel" and "ATT" syntax. Gcc uses ATT. The ATT syntax is also applicable for any other architecture supported by GCC (MIPS, PPC, etc etc). I encourage you to start off with ATT syntax ("gcc/gas"), rather than Intel ("nasm").
I am going to write my first "hello world" bootloader program.I found an article on CodeProject website.Here is link of it.
http://www.codeproject.com/Articles/664165/Writing-a-boot-loader-in-Assembly-and-C-Part
Up-to assembly level programming it was going well, but when I wrote program using c,same as given in this article, I faced a runtime error.
Code written in my .c file is as below.
__asm__(".code16\n");
__asm__("jmpl $0x0000,$main\n");
void printstring(const char* pstr)
{
while(*pstr)
{
__asm__ __volatile__("int $0x10": :"a"(0x0e00|*pstr),"b"(0x0007));
++pstr;
}
}
void main()
{
printstring("Akatsuki9");
}
I created image file floppy.img and checking output using bochs.
It was displaying something like this
Booting from floppy...
S
It should be Akatsuki9. I don't know where did I mistake? Can any one help me to find why am I facing this runtime error?
Brief Answer: The problem is with gcc (in fact, this specific application of generated code) and not with the C program itself. It's hidden in the assembly code.
Long Answer: A longer (more elaborate) explanation with specific details of the problem:
(It would be helpful to have the assembly code. It can be obtained using the -S switch of gcc or use the one that I got from gcc; I've attached it at the end). If you don't already know about opcode-prefixing, c-parameter passing in assembly, etc. then have a look at the following background information section. Looking at the assembly source, it's evident that it's 32bit code. gcc with '.code16' produces 16bit code for 32bit-mode processor (using operand-size prefixes). When this same exact code is run in real (i.e. 16bit) mode, it is treated as 32bit code. This is not an issue (80386 and later processors can execute it as such, previous processors just ignore the operand-size prefix). The problem occurs because gcc calculates offsets based on 32bit-mode of (processor) operation, which is not true (by default) when executing boot-code.
Some background information (experienced assembly language programmers should skip this):
1. Operand-size prefix: In x86, prefix bytes (0x66, 0x67, etc.) are used to obtain variants of an instruction. 0x66 is the operand-size prefix to obtain instruction for non-default operand size; gas uses this technique to produce code for '.code16'. For example, in real (i.e. 16bit) mode, 89 D8 corresponds to movw %bx,%ax while 66 89 D8 corresponds to movl %ebx,%eax. This relationship gets reversed in 32bit mode.
2. parameter passing in C: Parameters are passed on stack and accessed through the EBP register.
3. Call instruction: Call is a branching operation with the next instruction's address saved on stack (for resuming). near Call saves only the IP (when in 16bit mode) or EIP ( when in 32bit mode). far Call saves the CS (code-segment register) along with IP/EIP.
4. Push operation: Saves the value on stack. The size of object is subtracted from ESP.
Exact problem
We start at the
movl %esp, %ebp in main: {{ %ebp is set equal to %esp }}
pushl $.LC0 subtracts 4 from Stack Pointer {{ .LC0 addresses the char* "Akatsuki9"; it is getting saved on stack (to be accessed by printstring function) }}
call printstring subtracts 2 from Stack Pointer (16bit Mode; IP is 2bytes)
pushl %ebp in printstring: {{ 4 is subtracted from %esp }}
movl %esp, %ebp {{ %ebp and %esp are currently at 2+4(=6) bytes from the char *pstr }}
pushl %ebx changes %esp but not %ebp
movl 8(%ebp), %edx {{ Accessing 'pstr' at %ebp+8 ??? }}
Accessing 'pstr' at %ebp+8 instead of %ebp+6 (gcc had calculated an offset of 8, assuming 32bit EIP); the program has just obtained an invalid pointer and it's going to cause problem when the program dereferences it later: movsbl (%edx), %eax.
Fix
As of now I don't know of a good fix for this that will work with gcc. For writing boot-sector code, a native 16bit code-generator, I think, is more effective (size-limit & other quirks as explained above). If you insist on using gcc which currently only generates code for 32bit mode, the fix would be to avoid passing function parameters. For more information, refer to the gcc and gas manuals. And please let me know if there is a workaround or some option that works with gcc.
EDIT
I have found a fix for the program to make it work for the desired purpose while still using gcc. Kinda hackish & clearly not-recommended. Why post then? Well, sort of proof of concept. Here it is: (just replace your printstring function with this one)
void printstring(const char* pstr)
{
const char *hackPtr = *(const char**)((char *)&pstr-2);
while(*hackPtr)
{
__asm__ __volatile__("int $0x10": :"a"(0x0e00|*hackPtr),"b"(0x0007));
++hackPtr;
}
}
I invite #Akatsuki and others (interested) to verify that it works. From my above answer and the added C-pointer arithmetic, you can see why it should.
My Assembly-Source file
.file "bootl.c"
#APP
.code16
jmpl $0x0000,$main
#NO_APP
.text
.globl printstring
.type printstring, #function
printstring:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
pushl %ebx
.cfi_offset 3, -12
movl 8(%ebp), %edx
movl $7, %ebx
.L2:
movsbl (%edx), %eax
testb %al, %al
je .L6
orb $14, %ah
#APP
# 8 "bootl.c" 1
int $0x10
# 0 "" 2
#NO_APP
incl %edx
jmp .L2
.L6:
popl %ebx
.cfi_restore 3
popl %ebp
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size printstring, .-printstring
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Akatsuki9"
.section .text.startup,"ax",#progbits
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
pushl $.LC0
call printstring
popl %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",#progbits
I have the same problem, and found a solution that may work for you. It works on the emulators (I tested on bochs and qemu), but can't make it work on real hardware.
Solution
One thing is to use gcc-4.9.2, and to change the code generation to .code16gcc.
Thus, your code becomes:
__asm__(".code16gcc\n");
__asm__("jmpl $0x0000,$main\n");
void printstring(const char* pstr)
{
while(*pstr)
{
__asm__ __volatile__("int $0x10": :"a"(0x0e00|*pstr),"b"(0x0007));
++pstr;
}
}
void main()
{
printstring("Akatsuki9");
}
and to compile it use the -m16 flag on gcc, in my case I tried
gcc -c -m16 file.c
Note that you can change the architecture according to your needs, by setting -march. Or if you want to keep the flags of the tutorial
gcc -c -g -Os -march=i386 -ffreestanding -Wall -Werror -m16 file.c
tl;dr
Set .code16gcc instead of .code16, and use -m16 with gcc-4.9.2.
This question already has answers here:
Is the gcc insane optimisation level (-O3) not insane enough?
(2 answers)
Closed 9 years ago.
When compiling a simple function that does not even alter the ebp register GCC still saves the value at the start of the function and then restores the same value at the end:
#add.c
int add( int a, int b )
{
return ( a + b );
}
gcc -c -S -m32 -O3 add.c -o add.S
#add.S
.file "add.c"
.text
.p2align 4,,15
.globl add
.type add, #function
add:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
addl 8(%ebp), %eax
popl %ebp
ret
.size add, .-add
.ident "GCC: (GNU) 4.4.6"
.section .note.GNU-stack,"",#progbits
It would seem like a simple optimisation to leave ebp untouched, calculate offsets relative to esp and save 3 instructions.
Why does GCC not do this?
Thanks,
Andrew
Tools such as debuggers and stack walkers used to expect code to have a prologue that constructed a frame pointer, and couldn't understand code that didn't have it. Over time, the restriction has been removed.
The compiler itself has no difficulty generating code without a frame pointer, and you can ask for it to be removed with -fomit-frame-pointer. I believe that recent versions of gcc (~4.8) and gcc on x86-64 omit the frame pointer by default.
I wrote this simple C program:
int main() {
int i;
int count = 0;
for(i = 0; i < 2000000000; i++){
count = count + 1;
}
}
I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). So:
gcc test.c and then time on a.out gives:
real 0m7.717s
user 0m7.710s
sys 0m0.000s
$ gcc -O2 test.c and then time ona.out` gives:
real 0m0.003s
user 0m0.000s
sys 0m0.000s
Then I disassembled both with gcc -S. First one seems quite clear:
.file "test.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movl $0, -8(%rbp)
movl $0, -4(%rbp)
jmp .L2
.L3:
addl $1, -8(%rbp)
addl $1, -4(%rbp)
.L2:
cmpl $1999999999, -4(%rbp)
jle .L3
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",#progbits
L3 adds, L2 compare -4(%rbp) with 1999999999 and loops to L3 if i < 2000000000.
Now the optimized one:
.file "test.c"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
rep
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",#progbits
I can't understand at all what's going on there! I've got little knowledge of assembly, but I expected something like
addl $2000000000, -8(%rbp)
I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one.
Can someone briefly explain:
The gcc -S -O2 output.
If the loop is optimized as I expected (one sum instead of many sums)?
The compiler is even smarter than that. :)
In fact, it realizes that you aren't using the result of the loop. So it took out the entire loop completely!
This is called Dead Code Elimination.
A better test is to print the result:
#include <stdio.h>
int main(void) {
int i; int count = 0;
for(i = 0; i < 2000000000; i++){
count = count + 1;
}
// Print result to prevent Dead Code Elimination
printf("%d\n", count);
}
EDIT : I've added the required #include <stdio.h>; the MSVC assembly listing corresponds to a version without the #include, but it should be the same.
I don't have GCC in front of me at the moment, since I'm booted into Windows. But here's the disassembly of the version with the printf() on MSVC:
EDIT : I had the wrong assembly output. Here's the correct one.
; 57 : int main(){
$LN8:
sub rsp, 40 ; 00000028H
; 58 :
; 59 :
; 60 : int i; int count = 0;
; 61 : for(i = 0; i < 2000000000; i++){
; 62 : count = count + 1;
; 63 : }
; 64 :
; 65 : // Print result to prevent Dead Code Elimination
; 66 : printf("%d\n",count);
lea rcx, OFFSET FLAT:??_C#_03PMGGPEJJ#?$CFd?6?$AA#
mov edx, 2000000000 ; 77359400H
call QWORD PTR __imp_printf
; 67 :
; 68 :
; 69 :
; 70 :
; 71 : return 0;
xor eax, eax
; 72 : }
add rsp, 40 ; 00000028H
ret 0
So yes, Visual Studio does this optimization. I'd assume GCC probably does too.
And yes, GCC performs a similar optimization. Here's an assembly listing for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86):
.file "test.c"
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "%d\n"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $2000000000, 8(%esp)
movl $.LC0, 4(%esp)
movl $1, (%esp)
call __printf_chk
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",#progbits
Compilers have a few tools at their disposal to make code more efficient or more "efficient":
If the result of a computation is never used, the code that performs the computation can be omitted (if the computation acted upon volatile values, those values must still be read but the results of the read may be ignored). If the results of the computations that fed it weren't used, the code that performs those can be omitted as well. If such omission makes the code for both paths on a conditional branch identical, the condition may be regarded as unused and omitted. This will have no effect on the behaviors (other than execution time) of any program that doesn't make out-of-bounds memory accesses or invoke what Annex L would call "Critical Undefined Behaviors".
If the compiler determines that the machine code that computes a value can only produce results in a certain range, it may omit any conditional tests whose outcome could be predicted on that basis. As above, this will not affect behaviors other than execution time unless code invokes "Critical Undefined Behaviors".
If the compiler determines that certain inputs would invoke any form of Undefined Behavior with the code as written, the Standard would allow the compiler to omit any code which would only be relevant when such inputs are received, even if the natural behavior of the execution platform given such inputs would have been benign and the compiler's rewrite would make it dangerous.
Good compilers do #1 and #2. For some reason, however, #3 has become fashionable.