Examining code generated by the Visual Studio C++ compiler, part 1 [duplicate]

Examining code generated by the Visual Studio C++ compiler, part 1 [duplicate] - c

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is such complex code emitted for dividing a signed integer by a power of two?
Background
I'm just learning x86 asm by examining the binary code generated by the compiler.
Code compiled using the C++ compiler in Visual Studio 2010 beta 2.
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.21003.01 for 80x86
C code (sandbox.c)
int mainCRTStartup()
{
int x=5;int y=1024;
while(x) { x--; y/=2; }
return x+y;
}
Compile it using the Visual Studio Command Prompt
cl /c /O2 /Oy- /MD sandbox.c
link /NODEFAULTLIB /MANIFEST:NO /SUBSYSTEM:CONSOLE sandbox.obj
Disasm sandbox.exe in OllyDgb
The following starts from the entry point.
00401000 >/$ B9 05000000 MOV ECX,5
00401005 |. B8 00040000 MOV EAX,400
0040100A |. 8D9B 00000000 LEA EBX,DWORD PTR DS:[EBX]
00401010 |> 99 /CDQ
00401011 |. 2BC2 |SUB EAX,EDX
00401013 |. D1F8 |SAR EAX,1
00401015 |. 49 |DEC ECX
00401016 |.^75 F8 \JNZ SHORT sandbox.00401010
00401018 \. C3 RETN
Examination
MOV ECX, 5 int x=5;
MOV EAX, 400 int y=1024;
LEA ... // no idea what LEA does here. seems like ebx=ebx. elaborate please.
// in fact, NOPing it does nothing to the original procedure and the values.
CQD // sign extends EAX into EDX:EAX, which here: edx = 0. no idea why.
SUB EAX, EDX // eax=eax-edx, here: eax=eax-0. no idea, pretty redundant.
SAR EAX,1 // okay, y/= 2
DEC ECX // okay, x--, sets the zero flag when reaches 0.
JNZ ... // okay, jump back to CQD if the zero flag is not set.
This part bothers me:
0040100A |. 8D9B 00000000 LEA EBX,DWORD PTR DS:[EBX]
00401010 |> 99 /CDQ
00401011 |. 2BC2 |SUB EAX,EDX
You can nop it all and the values of EAX and ECX will remain the same at the end. So, what's the point of these instructions?

The whole thing
00401010 |> 99 /CDQ
00401011 |. 2BC2 |SUB EAX,EDX
00401013 |. D1F8 |SAR EAX,1
stands for the y /= 2. You see, a standalone SAR would not perform the signed integer division the way the compiler authors intended. C++98 standard recommends that signed integer division rounds the result towards 0, while SAR alone would round towards the negative infinity. (It is permissible to round towards negative infinity, the choice is left to the implementation). In order to implement rounding to 0 for negative operands, the above trick is used. If you use an unsigned type instead of a signed one, then the compiler will generate just a single shift instruction, since the issue with negative division will not take place.
The trick is pretty simple: for negative y sign extension will place a pattern of 11111...1 in EDX, which is actually -1 in 2's complement representation. The following SUB will effectively add 1 to EAX if the original y value was negative. If the original y was positive (or 0), the EDX will hold 0 after the sign extension and EAX will remain unchanged.
In other words, when you write y /= 2 with signed y, the compiler generates the code that does something more like the following
y = (y < 0 ? y + 1 : y) >> 1;
or, better
y = (y + (y < 0)) >> 1;
Note, that C++ standard does not require the result of the division to be rounded towards zero, so the compiler has the right to do just a single shift even for signed types. However, normally compilers follow the recommendation to round towards zero (or offer an option to control the behavior).
P.S. I don't know for sure what the purpose of that LEA instruction is. It is indeed a no-op. However, I suspect that this might be just a placeholder instruction inserted into the code for further patching. If I remember correctly, MS compiler has an option that forces the insertion of placeholder instructions at the beginning and at the end of each function. In the future this instruction can be overwritten by the patcher with a CALL or JMP instruction that will execute the patch code. This specific LEA was chosen just because it produces the a no-op placeholder instruction of the correct length. Of course, it could be something completely different.

The lea ebx,[ebx] is just a NOP operation. Its purpose is to align the beginning of the loop in memory, which will make it faster. As you can see here, the beginning of the loop starts at address 0x00401010, which is divisible by 16, thanks to this instruction.
The CDQ and SUB EAX,EDX operations make sure that the division will round a negative number towards zero - otherwise SAR would round it down, giving incorrect results for negative numbers.

The reason that the compiler emits this:
LEA EBX,DWORD PTR DS:[EBX]
instead of the semantically equivalent:
NOP
NOP
NOP
NOP
NOP
NOP
..is that it's faster for the processor to execute one 6-byte instruction than six 1-byte instructions. That's all.

This doesn't really answer the question, but is a helpful hint. Instead of mucking around with the OllyDbg.exe thing, you can make Visual Studio generate the asm file for you, which has the added bonus that it can put in the original source code as comments. This isn't a big deal for your current small project, but as your project grows, you may end up spending a fair amount of time figuring out which assembly code matches which source code.
From the command line, you want the /FAs and /Fa options (MSDN).
Here's part of the output for your example code (I compiled debug code, so the .asm is longer, but you can do the same thing for your optimized code):
_wmain PROC ; COMDAT
; 8 : {
push ebp
mov ebp, esp
sub esp, 216 ; 000000d8H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-216]
mov ecx, 54 ; 00000036H
mov eax, -858993460 ; ccccccccH
rep stosd
; 9 : int x=5; int y=1024;
mov DWORD PTR _x$[ebp], 5
mov DWORD PTR _y$[ebp], 1024 ; 00000400H
$LN2#wmain:
; 10 : while(x) { x--; y/=2; }
cmp DWORD PTR _x$[ebp], 0
je SHORT $LN1#wmain
mov eax, DWORD PTR _x$[ebp]
sub eax, 1
mov DWORD PTR _x$[ebp], eax
mov eax, DWORD PTR _y$[ebp]
cdq
sub eax, edx
sar eax, 1
mov DWORD PTR _y$[ebp], eax
jmp SHORT $LN2#wmain
$LN1#wmain:
; 11 : return x+y;
mov eax, DWORD PTR _x$[ebp]
add eax, DWORD PTR _y$[ebp]
; 12 : }
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
_wmain ENDP
Hope that helps!

Related

Which of these is more efficient to choose a function?

So lets say I have 2 functions to choose from based on whether number is even or odd. I came up with this:
(void (*[])()){f1 ,f2}[n%2]();.
Is it more or less efficient than simply:
n%2 ? f2() : f1();

Profile it; most likely too small to measure.
Assuming a naive compiler, the second will be a lot shorter to execute. But the first is written terribly; could be squashed down to
((n1&1) ? f1 : f2)();
now it's pretty much a toss-up. the first generates something like
test al, 1
jz +3
call f1
jmp +1
call f2
and the second something like
test al, 1
jmp +3
lea rcx, [f1]
jmp +1
lea rcx, [f2]
call rcx
but a good optimizer could flatten that down to
lea rcx, [f1]
test al, 1
cmovcc rcx, f2
call rcx
while all this is true; the initial statement applies. It's most likely too small to measure.
Additional question in comments involving "easier to expand"; well yes. After a surprisingly short number of functions, the array lookup becomes faster. I'm skeptical of things and would not write the array inline but somebody could come along and prove me wrong. If there's more than two I would write
static void (*dispatch_table[])() = {f1 ,f2};
dispatch_table[n % (sizeof(dispatch_table) / sizeof(dispatch_table[0]))]();
This complex mod expression compiles to a compile-time constant; which lets the compiler optimize out the % to something more performant; it's written as so so that adding more entries doesn't require changing the second argument to % when adding to the array.
As is being pointed out in comments I didn't handle n negative. Most RNG sources don't generate negative numbers.

Godbolt to the rescue (64-bit Clang 11.0 with -O3 optimizations): https://godbolt.org/z/MWjPnn
First implementation:
void impl1(int n)
{
(void (*[])()){evenfunc ,oddfunc}[n%2]();
}
mov qword ptr [rsp - 16], offset evenfunc()
mov qword ptr [rsp - 8], offset oddfunc()
mov eax, edi
shr eax, 31
add eax, edi
and eax, -2
sub edi, eax
movsxd rax, edi
jmp qword ptr [rsp + 8*rax - 16] # TAILCALL
Second implementation:
void impl2(int n)
{
n%2 ? oddfunc() : evenfunc();
}
test dil, 1
jne .LBB1_1
jmp evenfunc() # TAILCALL
.LBB1_1:
jmp oddfunc() # TAILCALL

Assembler Intel x86 Loop n times with user input

I'm learning x86 assembly and I'm trying to write a program that reads a number n (2 digits) from user input and iterate n times.
I've tried many ways but I get an infinite loop or segment fault.
input:
push msgInputQty
call printf
add esp, 4
push quantity
call gets
add esp, 4
mov ecx, 2
mov eax, 0
mov ebx, 0
mov edi, 0
mov dl, 10
transform:
mul dl
mov ebx, 0
mov bl, byte[quantity+edi]
sub bl, 30h
add eax, ebx
inc edi
loop transform
mov ecx, eax
printNTimes:
push msgDig
call printf
add esp, 4
loop printNTimes
I'd like to save in ecx and iterate n times this number

Your ecx register is being blown away by the call to printf.
ecx is a volatile register in some calling conventions and its likely that your loop is being corrupted by what printf is leaving in there.
To begin with, I would follow Raymond's advice in the comment attached to your original question and attach a debugger to witness this behaviour for yourself.
As for a solution, you can try preserving ecx and restoring it after the call to see the difference:
; for example
mov edi,ecx
call printf
mov ecx,edi
There may be more issues here (hard to know for sure since your code is incomplete ... but things like your stack allocations that don't appear to be for any reason are interesting) - but that is a good place to start.
Peter has left a comment under my answer to point out that you could remove the issue and optimize my solution by just not using ecx for your loop at all and instead do it manually, making your code change:
mov edi, eax
printNTimes:
push msgDig
call printf
add esp, 4
dec edi
jnz printNTimes

Why does this 16-bit DOS example from a book crash when I call it from C (compiled with visual studio?)

my OS is window 7 64-bit.
here is my code
first.c :
#include <stdio.h>
extern long second(int, int);
void main()
{
int val1, val2;
long result;
scanf("%d %d", &val1, &val2);
result = second(val1, val2);
printf("%ld", result);
}
second.asm :
.model small
.code
public _second
_second proc near
push bp
mov bp,sp
mov ax,[bp+4]
mov bx,[bp+6]
add ax,bx
pop bp
ret
_second endp
end
compiled OK, but "mov ax,[bp+4]" this line has error "0xC0000005: Access violation reading location 0x00000004."
what's wrong?

You're assembling code in 16-bit mode and linking it into a 32-bit program which is executed in 32-bit mode. The machine code that makes up your second function ends up getting interpreted differently than you expected. This this code that is actually executed:
_second:
00407800: 55 push ebp
00407801: 8B EC mov ebp,esp
00407803: 8B 46 04 mov eax,dword ptr [esi+4]
00407806: 8B 5E 06 mov ebx,dword ptr [esi+6]
00407809: 03 C3 add eax,ebx
0040780B: 5D pop ebp
0040780C: C3 ret
Instead of using 16-bit registers the code uses 32-bit registers. Instead using the BP register as a base when addressing the arguments on the stack, it uses ESI as a base. Since ESI is not initialized to anything in the function, it holds whatever random value it happened to have before the call (eg. 0). Wherever that is isn't valid readable address so accessing it causes a crash.
Your problem is that you've taken assembly code meant to be used with a 16-bit compiler for a 16-bit operating operating system (eg. MS-DOS) and using it with a 32-bit compiler for Windows. You can't blindly cut & paste code examples like that. Here's 32-bit version of your assembly code:
.MODEL FLAT
.CODE
PUBLIC _second
_second PROC
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov edx, [ebp+12]
add eax, edx
pop ebp
ret
_second ENDP
END
The .MODEL FLAT directive tells the assembler you're generating 32-bit code. I've changed the code to use 32-bit registers, and adjusted the frame pointer (EBP) relative offsets to reflect the fact that stack slots in 32-bit mode are 4 bytes long. I also changed the code to use EDX instead of EBX because in 32-bit C calling convention the EBX register needs to preserved by the function, while EDX (like BX in the 16-bit C calling convention) doesn't.

SP and BP are probably 0 in this specific case. Note however that SP and BP are the lowest 16-bit quarters of RSP and RBP respectively, so the stack pointer isn't really 0.

Another solution to pass parameters from .c to .asm is to use the "fastcall" convention, which let you pass two parameters in registers CX and DX (actually it's ECX and EDX, but you are using 16 bit registers in your code). Next is a short example tested in VS 2013, it sends two ints (2, 5) to the asm function and the function returns the addition of those values (7) :
first.cpp
#include "stdafx.h"
extern "C" int __fastcall second(int,int); // ◄■■ KEYWORDS "C" AND __FASTCALL.
int _tmain(int argc, _TCHAR* argv[])
{
short int result = second(2,5); // ◄■■ "RESULT" = 7.
return 0;
}
second.asm
.model small
.code
public #second#8 ◄■■ NOTICE THE # AND THE 8.
#second#8 proc near ◄■■ NOTICE THE # AND THE 8.
mov ax,cx ◄■■ AX = 2.
add ax,dx ◄■■ AX + 5 (RETURN VALUE).
ret
#second#8 endp ◄■■ NOTICE THE # AND THE 8.
end

Did I translate the very short C code correctly into assembler?

I'm currently learning assembly x86 and I have made a little task for myself.
The C code:
if (a == 4711) { a = a + 2 } else
{ a = a - 2 }
Assembler Code (eax is a register, cmp is compare, jne is jump if not equal and jmp is jump if equal):
mov eax, a
cmp eax, 4711
jmp equal
equal: add eax, 2
jne unequal
unequal: sub eax, 2
I think a little more efficient than that would be:
mov eax, a
cmp eax, 4711
jne unequal
add eax, 2
unequal: sub eax, 2
Edit:
mov eax, a
cmp eax, 4711
jne unequal
equal: add eax, 2
jmp continue
unequal: sub eax, 2
continue: ...
Did I translate it correctly?

Nope.
In the first case, your jne unequal does nothing since control would go there anyway. You need to jump to after that.
In your second case, if the comparison is true, you both add and subtract 2, doing nothing.
You also don't store the result back where the original value was, you just leave it in eax.

Your edit is correct except for one thing.
mov eax, a
moves the address of "a" into eax, not the contents/value
This short snippet is done with NASM on Ubuntu 16.04 elf64
section .text
global _start
_start
mov eax, a
cmp eax, 4711
jnz unequal
add eax, 2
jmp Done
unequal:
sub eax, 2
Done: mov [a], eax
section .rodata
a dd 180308
It disassembles to;
00400080 B89C004000 mov eax,0x40009c
00400085 3D67120000 cmp eax,0x1267
0040008A 7505 jnz 0x400091
0040008C 83C002 add eax,byte +0x2
0040008F EB03 jmp short 0x400094
00400091 83E802 sub eax,byte +0x2
00400094 8904259C004000 mov [0x40009c],eax
Variable "a" lives here
0040009C 54C00200
Note that # 4000080 the address of a is moved into EAX, but # Done (400091), whatever is in EAX is moved into that address. Notice too, the value # "a" is stored in reverse order (little endian. Usually in code you'd see it as 0x2c054

Let's get back to your first code:
mov eax, a
cmp eax, 4711
jmp equal
equal: add eax, 2
jne unequal
unequal: sub eax, 2
Let's pretend the first instruction load eax with "a" (it actually would in TASM/MASM, but rather stick to explicit and accurate [a], it's easier to read the source and works also in NASM).
Second instruction is cmp, which does subtract 4711 from eax, throws the result away (not storing it anywhere), and only flag register is affected. If "a" was 4711, then result of subtraction is zero, so ZF=1 then. Otherwise ZF=0. (for other flags affected by CMP see some documentation).
So on line 3 the eax still contains value from "a", and flag register contains result of cmp eax,4711. And you do jmp. This is unconditional jump, happening no matter what, so you directly continue to instruction at "equal" address, which is add eax,2. => You add 2 to "a" in every case.
Also the add itself affects flags, so for "a" == -2 the ZF=1, otherwise ZF=0!
Then comes the first conditional jump, branching the code, based on current flag register content. The jne is abbreviation of "jump not equal", and "equal" in this context means set zero flag (ZF=1).
So when "a" was -2, ZF is 1 ("is equal") ahead of jne, thus jne will NOT jump to the "unequal" address, but will continue to the next instruction (which is actually at "unequal" address anyway, so the jne is meaningless).
For "a" different from -2 the ZF will be 0 ("is not equal"), so jne will execute the jump on the provided label, continuing with instruction at address "unequal".
So you have to navigate the CPU away from instructions you don't want to execute.
xor eax,eax ; sets eax to 0, and ZF=1
jz label_1 ; ZF is 1, so jump is executed, CPU goes to "label_1"
inc eax ; this instruction is then skipped and not executed
label_1:
; eax being still 0, and ZF being still set ON
; whatever instruction is here, CPU will execute it after the "jz"
Slightly modified example to show the case when condition is false
xor eax,eax ; sets eax to 0, and CF=0, ZF=1, ...
jc label_1 ; CF is 0, so "jump carry" is NOT executed
inc eax ; this instruction is executed after "jc"
label_1:
; here eax is 1
; CF is still 0 (not affected by INC)
; but ZF is 0 (affected by INC)
Summary: you should have pretty good idea what instructions affect what flags, and in what way. When unsure, keep CMP + Jcc pair together (to not affect flag results from cmp accidentally). Jcc stands for any "conditional jump" instruction. When the condition is met, the jump to provided label is executed. Otherwise the Jcc instruction is ignored, and execution does continue with instruction right after it.
BTW, I personally would write that C code:
if (a == 4711) { a = a + 2 } else
{ a = a - 2 }
as:
cmp [a],DWORD 4711
mov eax,2
je a_is_4711
neg eax ; -2 for non 4711 value
a_is_4711:
add [a],eax

What's the point of LEA EAX, [EAX]?

LEA EAX, [EAX]
I encountered this instruction in a binary compiled with the Microsoft C compiler. It clearly can't change the value of EAX. Then why is it there?

It is a NOP.
The following are typcially used as NOP. They all do the same thing but they result in machine code of different length. Depending on the alignment requirement one of them is chosen:
xchg eax, eax = 90
mov eax, eax = 89 C0
lea eax, [eax + 0x00] = 8D 40 00

From this article:
This trick is used by MSVC++ compiler
to emit the NOP instructions of
different length (for padding before
jump targets). For example, MSVC++
generates the following code if it
needs 4-byte and 6-byte padding:
8d6424 00 lea [ebx+00],ebx
; 4-byte padding 8d9b 00000000
lea [esp+00000000],esp ; 6-byte
padding
The first line is marked as "npad 4"
in assembly listings generated by the
compiler, and the second is "npad 6".
The registers (ebx, esp) can be chosen
from the rarely used ones to avoid
false dependencies in the code.
So this is just a kind of NOP, appearing right before targets of jmp instructions in order to align them.
Interestingly, you can identify the compiler from the characteristic nature of such instructions.

LEA EAX, [EAX]
Indeed doesn't change the value of EAX. As far as I understand, it's identical in function to:
MOV EAX, EAX
Did you see it in optimized code, or unoptimized code?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight