So I am trying to translate the following assignment from C to inline assembly
resp = (0x1F)&(letter >> (3 - numB));
Assuming that the declaration of the variables are the following
unsigned char resp;
unsigned char letter;
int numB;
So I have tried the following:
_asm {
mov ebx, 01fh
movzx edx, letter
mov cl,3
sub cl, numB // Line 5
shr edx, cl
and ebx, edx
mov resp, ebx
}
or the following
_asm {
mov ebx, 01fh
movzx edx, letter
mov ecx,3
sub ecx, numB
mov cl, ecx // Line 5
shr edx, cl
and ebx, edx
mov resp, ebx
}
In both cases I get size operand error in Line 5.
How can I achieve the right shift?
The E*X registers are 32 bits, while the *L registers are 8 bits. Similarly, on Windows, the int type is 32 bits wide, while the char type is 8 bits wide. You cannot arbitrarily mix these sizes within a single instruction.
So, in your first piece of code:
sub cl, numB // Line 5
this is wrong because the cl register stores an 8-bit value, whereas the numB variable is of type int, which stores a 32-bit value. You cannot subtract a 32-bit value from an 8-bit value; both operands to the SUB instruction must be the same size.
Similarly, in your second piece of code:
mov cl, ecx // Line 5
you are trying to move the 32-bit value in ECX into the 8-bit CL register. That can't happen without some kind of truncation, so you have to indicate it explicitly. The MOV instruction requires that both of its operands have the same size.
(MOVZX and MOVSX are obvious exceptions to this rule that the operand types must match for a single instruction. These instructions zero-extend or sign-extend, respectively, a smaller value so that it can be stored into a larger-sized register.)
However, in this case, you don't even need the MOV instruction. Remember that CL is just the lower 8 bits of the full 32-bit ECX register. Therefore, setting ECX also implicitly sets CL. If you only need the lower 8 bits, you can just use CL in a subsequent instruction. Thus, your code becomes:
mov ebx, 01fh ; move constant into 32-bit EBX
movzx edx, BYTE PTR letter ; zero-extended move of 8-bit variable into 32-bit EDX
mov ecx, 3 ; move constant into ECX
sub ecx, DWORD PTR numB ; subtract 32-bit variable from ECX
shr edx, cl ; shift EDX right by the lower 8 bits of ECX
and ebx, edx ; bitwise AND of EDX and EBX, leaving result in EBX
mov BYTE PTR resp, bl ; move lower 8 bits of EBX into 8-bit variable
For the same operand-size matching issue discussed above, I've also had to change the final MOV instruction. You cannot move the value stored in a 32-bit register directly into an 8-bit variable. You will have to move either the lower 8 bits or the upper 8 bits, allowing you to use either the BL or BH registers, which are 8 bits and therefore match the size of resp. In the above code, I assumed that you want only the lower 8 bits, so I've used BL.
Also note that I've used the BYTE PTR and DWORD PTR specifications. These are not strictly necessary in MASM (or Visual Studio's inline assembler), since it can deduce the sizes of the types from the types of the variables. However, I think it increases readability, and is generally a recommended practice. DWORD means 32 bit; it is the same size as int and a 32-bit register (E*X). WORD means 16 bit; it is the same size as short and a 16-bit register (*X). BYTE means 8 bits; it is the same size as char and an 8-bit register (*L or *H).
Related
I am trying to read a binary file's content as signed ints, but fread keeps interpreting them as unsigned.
I am reading the file using the fopen function from within x86 assembly code:
mov rdi, file ;;File is entered as input before
mov rsi, mode ;;mode db "rb",0
sub rsp, 8
call fopen
add rsp, 8
mov [handle], rax ;I save the file handler in a variable
Said file, contains numbers stored in binary, one next to the other. If you read the file you see: ..2
Using a hex editor, we see that those numbers are:
Hex
Sign 8 bit value in base 10
CA
-5
90
-1
32
50
Later I want to read byte by byte each value stored:
mov rdi, number
mov sil, 1 ;;Each number is 1 byte in size
mov rdx, 1
mov rcx, [handle]
sub rsp, 8
call fread
add rsp, 8
cmp rax, 0
jle EOF
That part is successful and I manage to save CA in the number variable. However, the debugger tells me that the number variable's value is: 202
Which is CA's UNsigned value. So I tried using sscanf to turn it into its signed counterpart:
mov rdi, number
mov rsi, signed ;; signed db "%hhi",0 (As I understand it, this represents the 8 bit signed format)
mov rdx, numberInt
sub rsp, 8
call sscanf
add rsp, 8
Yet, this does not work, seen in how the RAX register returns a 0 value and the number still evaluates to 202.
Example in C
char ch;
int i;
fread(&ch, 1, 1, stdin);
i = ch;
printf("%d\n", i);
}
and generated code:
sub rsp, 24
mov rcx, QWORD PTR stdin[rip]
mov edx, 1
mov esi, 1
lea rdi, [rsp+15]
call fread
movsx esi, BYTE PTR [rsp+15] <<<<<<<<<<
------------------------------------------------
mov edi, OFFSET FLAT:.LC0
xor eax, eax
call printf
xor eax, eax
add rsp, 24
ret
int main(){
unsigned long a = 5;
int b = -6;
long c = a + b;
return 0;
}
I wanted to follow the rules explined in this link and confirm my understanding
of how the compiler emits code for a + b:
https://en.cppreference.com/w/c/language/conversion
1- b is first converted to an unsigned long:
If the unsigned type has conversion rank greater than or equal to the rank of the signed type, then the operand with the signed type is implicitly converted to the unsigned type.
So the the compiler essentially does this:
unsigned long implicit_conversion_of_b = (unsigned long) b;
2- The above implicit conversion itself is covered with this rule under Integer conversions:
if the target type is unsigned, the value 2b
, where b is the number of bits in the target type, is repeatedly subtracted or added to the source value until the result fits in the target type.
3- We finlay end up with these 64-bit values in a register before addition takes place:
a = 0x5
b = 0xfffffffffffffffa
Is the above a correct mapping to the rules?
Edit:
4- The final result is an unsigned long which needs to be converted to long needed b c using this rule:
otherwise, if the target type is signed, the behavior is implementation-defined (which may include raising a signal)
Is the above a correct mapping to the rules?
Yes.
I'm certainly no assembly expert, but it's interesting to see what's happening here. Not sure what optimizations you're talking about compiling with -O0 (seen below), obviously -O2 and -O3 look a lot different:
main:
// setup stack
push rbp
mov rbp, rsp
// move 5 into 8-byte offset from the stack frame.
// rbp is the stack frame pointer, offset of 8 shows long is 8
// bytes on this architecture
mov QWORD PTR [rbp-8], 5
// move -6 to 12 bytes offset from rbp (or 4 byte offset from
// last value. This tells you int is 4 bytes on this architecture.
mov DWORD PTR [rbp-12], -6
// move the int into eax register. This is a 32-bit general
// purpose register
mov eax, DWORD PTR [rbp-12]
// movsx is a "move with sign extension" instruction. rdx is a
// 64-bit register, so this is your conversion from 32 to 64
// bits, preserving the sign
movsx rdx, eax
// moves 5 to the 64-bit rax register
mov rax, QWORD PTR [rbp-8]
// performs the 64-bit add
add rax, rdx
// not using the result, so cleanup, prepare to return
// from function
mov QWORD PTR [rbp-24], rax
mov eax, 0
pop rbp
ret
This assembly was generated with gcc 11.2 on x64-86
; test.asm
segment .bss
extern _a, _b, _x, _y
segment .text
global _compute
_compute:
mov ax,[_a]
mov dx,[_x]
imul dx
mov dword[_y],eax
mov ebx,[_b]
add dword[_y],ebx
ret
There is no problem when calculating positive numbers, but when calculating negative numbers, the results look strange, such as "65544" and "65542." How shall I do it?
The 16-bit form of one-operand imul leaves the product in dx:ax, not in eax, even in 32-bit mode. It matches the behavior from 16-bit 8086 which had no 32-bit registers.
So you could do
mov ax, [_a]
mov dx, [_x]
imul dx
mov [_y], ax
mov [_y+2], dx
But in 32-bit mode it may be nicer to sign-extend your 16 bit operands to 32 bits, and then do a non-widening 32-bit multiply (two-operand imul). 16-bit instructions are awkward because they need operand-size prefixes, and because they leave the high 16 bits of the 32 bit registers intact, potentially leading to performance problems because of the dependency. So another choice is something like
movsx eax, word [_a]
movsx edx, word [_x]
imul eax, edx
mov [_y], eax
I don't understand what is the problem because the result is right, but there is something wrong in it and i don't get it.
1.This is the x86 code I have to convert to C:
%include "io.inc"
SECTION .data
mask DD 0xffff, 0xff00ff, 0xf0f0f0f, 0x33333333, 0x55555555
SECTION .text
GLOBAL CMAIN
CMAIN:
GET_UDEC 4, EAX
MOV EBX, mask
ADD EBX, 16
MOV ECX, 1
.L:
MOV ESI, DWORD [EBX]
MOV EDI, ESI
NOT EDI
MOV EDX, EAX
AND EAX, ESI
AND EDX, EDI
SHL EAX, CL
SHR EDX, CL
OR EAX, EDX
SHL ECX, 1
SUB EBX, 4
CMP EBX, mask - 4
JNE .L
PRINT_UDEC 4, EAX
NEWLINE
XOR EAX, EAX
RET
2.My converted C code, when I input 0 it output me the right answer but there is something false in my code I don't understand what is:
#include "stdio.h"
int main(void)
{
int mask [5] = {0xffff, 0xff00ff, 0xf0f0f0f, 0x33333333, 0x55555555};
int eax;
int esi;
int ebx;
int edi;
int edx;
char cl = 0;
scanf("%d",&eax);
ebx = mask[4];
ebx = ebx + 16;
int ecx = 1;
L:
esi = ebx;
edi = esi;
edi = !edi;
edx = eax;
eax = eax && esi;
edx = edx && edi;
eax = eax << cl;
edx = edx >> cl ;
eax = eax || edx;
ecx = ecx << 1;
ebx = ebx - 4;
if(ebx == mask[1]) //mask - 4
{
goto L;
}
printf("%d",eax);
return 0;
}
Assembly AND is C bitwise &, not logical &&. (Same for OR). So you want eax &= esi.
(Using &= "compound assignment" makes the C even look like x86-style 2-operand asm so I'd recommend that.)
NOT is also bitwise flip-all-the-bits, not booleanize to 0/1. In C that's edi = ~edi;
Read the manual for x86 instructions like https://www.felixcloutier.com/x86/not, and for C operators like ~ and ! to check that they are / aren't what you want. https://en.cppreference.com/w/c/language/expressions https://en.cppreference.com/w/c/language/operator_arithmetic
You should be single-stepping your C and your asm in a debugger so you notice the first divergence, and know which instruction / C statement to fix. Don't just run the whole thing and look at one number for the result! Debuggers are massively useful for asm; don't waste your time without one.
CL is the low byte of ECX, not a separate C variable. You could use a union between uint32_t and uint8_t in C, or just use eax <<= ecx&31; since you don't have anything that writes CL separately from ECX. (x86 shifts mask their count; that C statement could compile to shl eax, cl. https://www.felixcloutier.com/x86/sal:sar:shl:shr). The low 5 bits of ECX are also the low 5 bits of CL.
SHR is a logical right shift, not arithmetic, so you need to be using unsigned not int at least for the >>. But really just use it for everything.
You're handling EBX completely wrong; it's a pointer.
MOV EBX, mask
ADD EBX, 16
This is like unsigned int *ebx = mask+4;
The size of a dword is 4 bytes, but C pointer math scales by the type size, so +1 is a whole element, not 1 byte. So 16 bytes is 4 dwords = 4 unsigned int elements.
MOV ESI, DWORD [EBX]
That's a load using EBX as an address. This should be easy to see if you single-step the asm in a debugger: It's not just copying the value.
CMP EBX, mask - 4
JNE .L
This is NASM syntax; it's comparing against the address of the dword before the start of the array. It's effectively the bottom of a fairly normal do{}while loop. (Why are loops always compiled into "do...while" style (tail jump)?)
do { // .L
...
} while(ebx != &mask[-1]); // cmp/jne
It's looping from the end of the mask array, stopping when the pointer goes past the end.
Equivalently, the compare could be ebx !-= mask - 1. I wrote it with unary & (address-of) cancelling out the [] to make it clear that it's the address of what would be one element before the array.
Note that it's jumping on not equal; you had your if()goto backwards, jumping only on equality. This is a loop.
unsigned mask[] should be static because it's in section .data, not on the stack. And not const, because again it's in .data not .rodata (Linux) or .rdata (Windows))
This one doesn't affect the logic, only that detail of decompiling.
There may be other bugs; I didn't try to check everything.
if(ebx != mask[1]) //mask - 4
{
goto L;
}
//JNE IMPLIES a !=
I am trying to define a calculator in C language based on the Linux command dc the structure of the program is not so important all you need to know that I get two numbers and I want to divide them when typing /. Therefore, I send this two numbers to an assembly function that makes the division (see code below). But this works for positive numbers only.
When typing 999 3 / it returns 333 which is correct but when typing -999 3 / I get the strange number 1431655432 and also when typing both negative numbers like -999 -3 / I get 0 every time for any two negative numbers.
The code in assembly is:
section .text
global _div
_div:
push rbp ; Save caller state
mov rbp, rsp
mov rax, rdi ; Copy function args to registers: leftmost...
mov rbx, rsi ; Next argument...
cqo
idiv rbx ; divide 2 arguments
mov [rbp-8], rax
pop rbp ; Restore caller state
Your comments say you are passing integers to _idiv. If you are using int those are 32-bit values:
extern int _div (int a, int b);
When passed to the function a will be in the bottom 32-bits of RDI and b will be in the bottom 32-bits of RSI. The upper 32-bits of the arguments can be garbage but often they are zero, but doesn't have to be the case.
If you use a 64-bit register as a divisor with IDIV then the division is RDX:RAX / 64-bit divisor (in your case RBX). The problem here is that you are using the full 64-bit registers to do 32-bit division. If we assume for arguments sake that the upper bits of RDI and RSI were originally 0 then RSI would be 0x00000000FFFFFC19 (RAX) and RDI would be 0x0000000000000003 (RBX). CQO would zero extend RAX to RDX. The upper most bit of RAX is zero so RDX would be zero. The division would look like:
0x000000000000000000000000FFFFFC19 / 0x0000000000000003 = 0x55555408
0x55555408 happens to be 1431655432 (decimal) which is the result you were seeing. One fix for this is to use 32-bit registers for the division. To sign extend EAX (lower 32-bit of RAX) into EDX you can use CDQ instead of CQO.You can then divide EDX:EAX by EBX. This should get you the 32-bit signed division you are looking for. The code would look like:
cdq
idiv ebx ; divide 2 arguments EDX:EAX by EBX
Be aware that RBX, RBP, R12 to R15 all need to be preserved by your function of you modify them (they are volatile registers in the AMD 64-bit ABI). If you modify RBX you need to make sure you save and restore it like you do with RBP. A better alternative is to use one of the volatile registers like RCX instead of RBX.
You don't need the intermediate register to place the divisor into. You could have used RSI (or ESI in the fixed version) directly instead of moving it to a register like RBX.
Your issue has to do with how arguments are passed to _div.
Assuming your _div's prototype is:
int64_t _div(int32_t, int32_t);
Then, the arguments are passed in edi and esi (i.e., 32-bit signed integers), the upper halves of the registers rdi and rsi are undefined.
Sign extension is needed when assigning edi and esi to rax and rbx for performing a 64-bit signed division (for performing a 64-bit unsigned division zero extension would be needed instead).
That is, instead of:
mov rax, rdi
mov rbx, rsi
use the instruction movsx, which sign extends the source, on edi and esi:
movsx rax, edi
movsx rbx, esi
Using true 64-bit operands for the 64-bit division
The previous approach consits of performing a 64-bit division on "fake" 64-bit operands (i.e., sign-extended 32-bit operands). Mixing 64-bit instructions with "32-bit operands" is usually not a very good idea because it may result in worse performance and larger code size.
A better approach would be to simply change the C prototype of your _div function to accept actual 64-bit arguments, i.e.:
int64_t _div(int64_t, int64_t);
This way, the argument will be passed to rdi and rsi (i.e., already 64-bit signed integers) and a 64-bit division will be performed on true 64-bit integers.
Using a 32-bit division instead
You may also want to consider using the 32-bit idiv if it suits your needs, since it performs faster than a 64-bit division and the resulting code size is smaller (no REX prefix):
...
mov eax, edi
mov ebx, esi
cdq
idiv ebx
...
_div's prototype would be:
int32_t _div(int32_t, int32_t);