This program must search for all occurrences of string 2 in string 1.
It works fine with all the strings i have tried except with
s1="Ciao Cia Cio Ociao ciao Ocio CiCiao CieCiaCiu CiAo eeCCia"
s2="Cia"
in this case the correct result would be: 0 5 31 39 54
instead, it prints 0 5 39.
I don't understand why, the operation seems the same as
s1="Sette scettici sceicchi sciocchi con la sciatica a Shanghai"
s2="icchi"
with which the program works correctly.
I can't find the error!
The code:
#include <stdio.h>
void main()
{
#define MAX_LEN 100
// Input
char s1[] = "Ciao Cia Cio Ociao ciao Ocio CiCiao CieCiaCiu CiAo eeCCia";
unsigned int lengthS1 = sizeof(s1) - 1;
char s2[] = "Cia";
unsigned int lengthS2 = sizeof(s2) - 1;
// Output
unsigned int positions[MAX_LEN];
unsigned int positionsLen;
// Blocco assembler
__asm
{
MOV ECX, 0
MOV EAX, 0
DEC lenghtS1
DEC lengthS2
MOV EBX, lengthS1
CMP EBX, 0
JZ fine
MOV positionsLen, 0
XOR EBX, EBX
XOR EDX, EDX
uno: CMP ECX, lengthS1
JG fine
CMP EAX, lengthS2
JNG restart
XOR EAX, EAX
restart : MOV BH, s1[ECX]
CMP BH, s2[EAX]
JE due
JNE tre
due : XOR EBX, EBX
CMP EAX, 0
JNE duedue
MOV positions[EDX * 4], ECX
INC ECX
INC EAX
JMP uno
duedue : CMP EAX, lengthS2
JNE duetre
INC ECX
INC EDX
INC positionsLen
XOR EAX, EAX
JMP uno
duetre : INC EAX
INC ECX
JMP uno
tre : XOR EBX, EBX
XOR EAX, EAX
INC ECX
JMP uno
fine:
}
// Stampa su video
{
unsigned int i;
for (i = 0; i < positionsLen; i++)
printf("Sottostringa in posizione=%d\n", positions[i]);
}
}
please,help.
The trickier programming gets, the more systematic and thoughtful your approach should be. If you programmed x86 assembly for a decade, you will be able to skip a few of the steps I line out below. But especially if you are a beginner, you are well advised to not expect from yourself, that you can just hack in assembly with confidence and without safety nets.
The code below is just a best guess (I did not compile or run or debug the C-code). It is there, to give the idea.
Make a plan for your implementation
So you will have 2 nested loops, comparing the characters and then collecting matches.
Implement the "assembly" in low level C, which already resembles the end product.
C is nearly an assembly language itself...
Write yourself tests, debug and analyze your "pseudo assembly" C-version.
Translate the C lines step by step by assembly lines, "promoting" the c-lines to comments.
This is my first shot at doing that - the initial c-version, which might or might not work. But it is still faster and easier to write (with the assembly code in mind). And easier to debug and step through. Once this works, it is time to "translate".
#include <stdint.h>
#include <stddef.h>
#include <string.h>
size_t substring_positions(const char *s, const char* sub_string, size_t* positions, size_t positions_capacity) {
size_t positions_index = 0;
size_t i = 0;
size_t j = 0;
size_t i_max = strlen(s) - strlen(sub_string);
size_t j_max = strlen(sub_string) - 1;
loop0:
if (i > i_max)
goto end;
j = 0;
loop1:
if (j == j_max)
goto match;
if (s[i+j] == sub_string[j])
goto go_on;
i++;
goto loop0;
go_on:
j++;
goto loop1;
match:
positions[positions_index] = i;
positions_index++;
if (positions_index < positions_capacity)
goto loop0;
goto end;
end:
return positions_index;
}
As you can see, I did not use "higher level language features" for this function (does C even have such things?! :)). And now, you can start to "assemble". If RAX is supposed to hold your i variable, you could replace size_t i = 0; with XOR RAX,RAX. And so on.
With that approach, other people even have a chance to read the assembly code and with the comments (the former c-code), you state the intent of your instructions.
Related
I don't understand what is the problem because the result is right, but there is something wrong in it and i don't get it.
1.This is the x86 code I have to convert to C:
%include "io.inc"
SECTION .data
mask DD 0xffff, 0xff00ff, 0xf0f0f0f, 0x33333333, 0x55555555
SECTION .text
GLOBAL CMAIN
CMAIN:
GET_UDEC 4, EAX
MOV EBX, mask
ADD EBX, 16
MOV ECX, 1
.L:
MOV ESI, DWORD [EBX]
MOV EDI, ESI
NOT EDI
MOV EDX, EAX
AND EAX, ESI
AND EDX, EDI
SHL EAX, CL
SHR EDX, CL
OR EAX, EDX
SHL ECX, 1
SUB EBX, 4
CMP EBX, mask - 4
JNE .L
PRINT_UDEC 4, EAX
NEWLINE
XOR EAX, EAX
RET
2.My converted C code, when I input 0 it output me the right answer but there is something false in my code I don't understand what is:
#include "stdio.h"
int main(void)
{
int mask [5] = {0xffff, 0xff00ff, 0xf0f0f0f, 0x33333333, 0x55555555};
int eax;
int esi;
int ebx;
int edi;
int edx;
char cl = 0;
scanf("%d",&eax);
ebx = mask[4];
ebx = ebx + 16;
int ecx = 1;
L:
esi = ebx;
edi = esi;
edi = !edi;
edx = eax;
eax = eax && esi;
edx = edx && edi;
eax = eax << cl;
edx = edx >> cl ;
eax = eax || edx;
ecx = ecx << 1;
ebx = ebx - 4;
if(ebx == mask[1]) //mask - 4
{
goto L;
}
printf("%d",eax);
return 0;
}
Assembly AND is C bitwise &, not logical &&. (Same for OR). So you want eax &= esi.
(Using &= "compound assignment" makes the C even look like x86-style 2-operand asm so I'd recommend that.)
NOT is also bitwise flip-all-the-bits, not booleanize to 0/1. In C that's edi = ~edi;
Read the manual for x86 instructions like https://www.felixcloutier.com/x86/not, and for C operators like ~ and ! to check that they are / aren't what you want. https://en.cppreference.com/w/c/language/expressions https://en.cppreference.com/w/c/language/operator_arithmetic
You should be single-stepping your C and your asm in a debugger so you notice the first divergence, and know which instruction / C statement to fix. Don't just run the whole thing and look at one number for the result! Debuggers are massively useful for asm; don't waste your time without one.
CL is the low byte of ECX, not a separate C variable. You could use a union between uint32_t and uint8_t in C, or just use eax <<= ecx&31; since you don't have anything that writes CL separately from ECX. (x86 shifts mask their count; that C statement could compile to shl eax, cl. https://www.felixcloutier.com/x86/sal:sar:shl:shr). The low 5 bits of ECX are also the low 5 bits of CL.
SHR is a logical right shift, not arithmetic, so you need to be using unsigned not int at least for the >>. But really just use it for everything.
You're handling EBX completely wrong; it's a pointer.
MOV EBX, mask
ADD EBX, 16
This is like unsigned int *ebx = mask+4;
The size of a dword is 4 bytes, but C pointer math scales by the type size, so +1 is a whole element, not 1 byte. So 16 bytes is 4 dwords = 4 unsigned int elements.
MOV ESI, DWORD [EBX]
That's a load using EBX as an address. This should be easy to see if you single-step the asm in a debugger: It's not just copying the value.
CMP EBX, mask - 4
JNE .L
This is NASM syntax; it's comparing against the address of the dword before the start of the array. It's effectively the bottom of a fairly normal do{}while loop. (Why are loops always compiled into "do...while" style (tail jump)?)
do { // .L
...
} while(ebx != &mask[-1]); // cmp/jne
It's looping from the end of the mask array, stopping when the pointer goes past the end.
Equivalently, the compare could be ebx !-= mask - 1. I wrote it with unary & (address-of) cancelling out the [] to make it clear that it's the address of what would be one element before the array.
Note that it's jumping on not equal; you had your if()goto backwards, jumping only on equality. This is a loop.
unsigned mask[] should be static because it's in section .data, not on the stack. And not const, because again it's in .data not .rodata (Linux) or .rdata (Windows))
This one doesn't affect the logic, only that detail of decompiling.
There may be other bugs; I didn't try to check everything.
if(ebx != mask[1]) //mask - 4
{
goto L;
}
//JNE IMPLIES a !=
I'm using this function to convert an 8-bit binary number represented as a boolean array to an integer. Is it efficient? I'm using it in an embedded system. It performs ok but I'm interested in some opinions or suggestions for improvement (or replacement) if there are any.
uint8_t b2i( bool *bs ){
uint8_t ret = 0;
ret = bs[7] ? 1 : 0;
ret += bs[6] ? 2 : 0;
ret += bs[5] ? 4 : 0;
ret += bs[4] ? 8 : 0;
ret += bs[3] ? 16 : 0;
ret += bs[2] ? 32 : 0;
ret += bs[1] ? 64 : 0;
ret += bs[0] ? 128 : 0;
return ret;
}
It is not possible to say without a specific system in mind. Disassemble the code and see what you got. Benchmark your code on a specific system. This is the key to understanding manual optimization.
Generally, there are lots of considerations. The CPU's data word size, instruction set, compiler optimizer performance, branch prediction (if any), data cache (if any) etc etc.
To make the code perform optimally regardless of data word size, you can change uint8_t to uint_fast8_t. That is unless you need exactly 8 bits, then leave it as uint8_t.
Cache use may or may not be more efficient if given an up-counting loop. At any rate, loop unrolling is an old kind of manual optimization that we shouldn't use in modern programming - the compiler is more capable of making that call than the programmer.
The worst problem with the code is the numerous branches. These might cause a bottleneck.
Your code results in the following x86 machine code gcc -O2:
b2i:
cmp BYTE PTR [rdi+6], 0
movzx eax, BYTE PTR [rdi+7]
je .L2
add eax, 2
.L2:
cmp BYTE PTR [rdi+5], 0
je .L3
add eax, 4
.L3:
cmp BYTE PTR [rdi+4], 0
je .L4
add eax, 8
.L4:
cmp BYTE PTR [rdi+3], 0
je .L5
add eax, 16
.L5:
cmp BYTE PTR [rdi+2], 0
je .L6
add eax, 32
.L6:
cmp BYTE PTR [rdi+1], 0
je .L7
add eax, 64
.L7:
lea edx, [rax-128]
cmp BYTE PTR [rdi], 0
cmovne eax, edx
ret
Whole lot of potentially inefficient branching. We can make the code both faster and more readable by using a loop:
uint8_t b2i (const bool bs[8])
{
uint8_t result = 0;
for(size_t i=0; i<8; i++)
{
result |= bs[8-1-i] << i;
}
return result;
}
(ideally the bool array should be arranged from LSB first but that would change the meaning of the code compared to the original)
Which gives this machine code instead:
b2i:
lea rsi, [rdi-8]
mov rax, rdi
xor r8d, r8d
.L2:
movzx edx, BYTE PTR [rax+7]
mov ecx, edi
sub ecx, eax
sub rax, 1
sal edx, cl
or r8d, edx
cmp rax, rsi
jne .L2
mov eax, r8d
ret
More instructions but less branching. It will likely perform better than your code on x86 and other high end CPUs with branch prediction and instruction cache. But worse than your code on a 8 bit microcontroller where only the total number of instructions count.
You can also do this with a loop and bit shifts to reduce code repetition:
int b2i(bool *bs) {
int ret = 0;
for (int i = 0; i < 8; i++) {
ret = ret << 1;
ret += bs[i];
}
return ret;
}
I have been tasked with doing some assembly work. All was going good, until I had to convert a program from using int to floats. I'm probably missing something simple in my attempts, but does anyone have a suggestion? I'll provide the int version which works.
#include <stdio.h>
int n;
int i;
int arr[50];
int output;
int main(void)
{
scanf("%d", &n);
for (i = 0; i < n; i++)
{
scanf("%d", &arr[i]);
}
__asm
{
jmp start
switching:
mov eax, ebx
jmp looping
looping:
mov ebx, arr[ecx*4]
inc ecx
cmp ebx, eax
jg switching
cmp ecx, n
jl looping
ret
start:
mov ecx, 0
mov eax, 0
call looping
mov output, eax
}
printf("%d", output);
scanf("%d", &n);
}
You will need to rewrite most of your solution. If you are new to floats and want to use x87, here is a good guide to read.
To test floats you will need something like this conditional jump:
fld <float to compare>
fcom <maximum value>
fnstsw ax
test ah,$1
jnz <notbigger>
I have been working on a program that will do a bubble sort for n integers. I have hit a wall, as I do not know to refresh the array once my assembler operation are done. Any suggestions would be great.
#include <stdio.h>
#include <stdlib.h>
int n;
int *input;
int output;
int i;
int main(void)
{
scanf("%d", &n);
input = (int *)malloc(sizeof(n));
for (i = 0; i < n; i++)
{
scanf("%d", &input[i]);
}
__asm
{
mov ebx, input
mov esi, n
outer_loop:
dec esi
jz end_outer
mov edi, n
inner_loop:
dec edi
jz outer_loop
compare:
mov al, [ebx + edi - 1]
mov dl, [ebx + edi]
cmp al, dl
jnl inner_loop
swap:
mov [ebx + edi], al
mov [ ebx + edi - 1], dl
jmp inner_loop
end_outer:
}
for (i = 0; i < n; i++)
{
printf("%d\n", input[i]);
}
scanf("%d", &output);
}
There's nothing to "refresh". Your code runs. ebx contains input and that's that. (Hint: Your C code also gets transformed into assembly. Looking at what your compiler generates through a disassembler might give you some insight.)
That said I see some problems:
input = (int *)malloc(sizeof(n));
This allocation is not big enough and your program will crash. You want to allocate sizeof(int) * n. You should also check the allocation for errors.
mov al, [ebx + edi - 1]
mov dl, [ebx + edi]
cmp al, dl
Kind of verbose. You should be able to do register-to-memory comparisons. (eg. cmp al, byte [ebx + edi])
Not to mention it's a complete waste of time to implement bubble sort in assembly. Rephrase: Learning assembly is great, but it would be a bad idea to use this in anything that matters. One of the most important things about knowing assembly is knowing when you don't need to use it. You'd probably find very often that what your compiler generates is good enough. Let's also not forget that a good algorithm in C will beat a bad algorithm in assembly, such as bubble sort.
#Giorgio also raises a good point in the comments. Your assembly is comparing and sorting bytes. You want to be doing things like this:
mov eax, [ebx + edi - 4] ; assumes edi is a byte offset, see next comment
mov edx, [ebx + edi]
And instead of dec edi etc., you want to do:
sub edi, 4
Your swap would also have to be re-done to use 32-bit quantities.
This is of course assuming int is 32 bits, which may not be the case. If you're using (non-standard) inline assembly it's probably fair that you're doing this - it means you're already targeting a particular compiler. (Based on the syntax I'd say VC++) Nitpickers might say you should use int32_t instead of int.
Note I'm not sure if this is the only problem, I haven't looked at your code too thoroughly.
I will also give it a try.
#include <stdio.h>
#include <stdlib.h>
int n;
int *input;
int output;
int i;
int s;
int main(void)
{
s = sizeof(int);
scanf("%d", &n);
input = (int *)malloc(sizeof(n));
for (i = 0; i < n; i++)
{
scanf("%d", &input[i]);
}
__asm
{
mov ecx, s
mov ebx, input
mov esi, n
mul esi, ecx
outer_loop:
sub esi, ecx
jz end_outer
mov edi, esi
inner_loop:
sub edi, ecx
jz outer_loop
compare:
mov edx, [ebx + edi]
sub edi, ecx
mov eax, [ebx + edi]
add edi, ecx
cmp eax, edx
jnl inner_loop
swap:
mov [ebx + edi], eax
sub edi, ecx
mov [ebx + edi], edx
add edi, ecx
jmp inner_loop
end_outer:
}
for (i = 0; i < n; i++)
{
printf("%d\n", input[i]);
}
scanf("%d", &output);
}
I used variable s to hold the size of an integer. To my knowledge it is not allowed to use an indirection like
mov eax, [ebx + edi + ecx]
therefore I had to add separate add and sub. It is not very nice, does anyone see a better solution?
You seem to intend to allocate and input an array of n int values. (Although the memory size in your malloc is incorrect, as has already been noted).
But then you proceed to sort your array as an array of n bytes. Why are you sorting bytes instead of sorting ints?
Even if your sorting algorithm is implemented correctly (as byte-sorting implementation), the end result will look totally meaningless, since you are printing your array as an array of ints in the end.
First make up your mind what is that you are trying to work with: ints or bytes (chars) and then act accordingly and consistently.
I need to translate what is commented within the method, to assembler. I have a roughly idea, but can't.
Anyone can help me please? Is for an Intel x32 architecture:
int
secuencia ( int n, EXPRESION * * o )
{
int a, i;
//--- Translate from here ...
for ( i = 0; i < n; i++ ){
a = evaluarExpresion( *o );
o++;
}
return a ;
//--- ... until here.
}
Translated code must be within __asm as:
__asm {
translated code
}
Thank you,
FINAL UPDATE:
This is the final version, working and commented, thanks to all for your help :)
int
secuencia ( int n, EXPRESION * * o )
{
int a = 0, i;
__asm
{
mov dword ptr [i],0 ; int i = 0
jmp salto1
ciclo1:
mov eax,dword ptr [i]
add eax,1 ; increment in 1 the value of i
mov dword ptr [i],eax ; i++
salto1:
mov eax,dword ptr [i]
cmp eax,dword ptr [n] ; Compare i and n
jge final ; If is greater goes to 'final'
mov eax,dword ptr [o]
mov ecx,dword ptr [eax] ; Recover * o (its value)
push ecx ; Make push of * o (At the stack, its value)
call evaluarExpresion ; call evaluarExpresion( * o )
add esp,4 ; Recover memory from the stack (4KB corresponding to the * o pointer)
mov dword ptr [a],eax ; Save the result of evaluarExpresion as the value of a
mov eax,dword ptr [o] ; extract the pointer to o
add eax,4 ; increment the pointer by a factor of 4 (next of the actual pointed by *o)
mov dword ptr [o],eax ; o++
jmp ciclo1 ; repeat
final: ; for's final
mov eax,dword ptr [a] ; return a - it save the return value at the eax registry (by convention this is where the result must be stored)
}
}
Essentially in assembly languages, strictly speaking there isn't a notion of a loop the same way there would be in a higher level language. It's all implemented with jumps (eg. as a "goto"...)
That said, x86 has some instructions with the assumption that you'll be writing "loops", implicitly using the register ECX as a loop counter.
Some examples:
mov ecx, 5 ; ecx = 5
.label:
; Loop body code goes here
; ECX will start out as 5, then 4, then 3, then 1...
loop .label ; if (--ecx) goto .label;
Or:
jecxz .loop_end ; if (!ecx) goto .loop_end;
.loop_start:
; Loop body goes here
loop .loop_start ; if (--ecx) goto .loop_start;
.loop_end:
And, if you don't like this loop instruction thing counting backwards... You can write something like:
xor ecx, ecx ; ecx = 0
.loop_start:
cmp ecx, 5 ; do (ecx-5) discarding result, then set FLAGS
jz .loop_end ; if (ecx-5) was zero (eg. ecx == 5), jump to .loop_end
; Loop body goes here.
inc ecx ; ecx++
jmp .loop_start
.loop_end:
This would be closer to the typical for (int i=0; i<5; ++i) { }
Note that
for (init; cond; advance) {
...
}
is essentially syntactic sugar for
init;
while(cond) {
...
advance;
}
which should be easy enough to translate into assembly language if you've been paying any attention in class.
Use gcc to generate the assembly code
gcc -S -c sample.c
man gcc is your friend
For that you would probably use the loop instruction that decrements the ecx (often called, extended counter) at each loop and goes out when ecx reaches zero.But why use inline asm for it anyway? I'm pretty sure something as simple as that will be optimized correctly by the compiler...
(We say x86 architecture, because it's based on 80x86 computers, but it's an "ok" mistake =p)