I have to come up with an ASM code (for emu8086) that will find the minimum and maximum value in an array of any given size. In the sample code, my instructor provides (what appears to be) a data segment that contains an array named LIST. He claims that he will replace this list with other lists of different sizes, and our code must be able to handle it.
Here's the sample code below. I've highlighted the parts that I've added, just to show you that I've done my best to solve this problem:
; You may customize this and other start-up templates;
; The location of this template is c:\emu8086\inc\0_com_template.txt
org 100h
data segment
LIST DB 05H, 31H, 34H, 30H, 38H, 37H
MINIMUM DB ?
MAXIMUM DB ?
AVARAGE DB ?
**SIZE=$-OFFSET LIST**
ends
stack segment **;**
DW 128 DUP(0) **; I have NO CLUE what this is supposed to do**
ends **;**
code segment
start proc far
; set segment registers:
MOV AX,DATA **;**
MOV DS,AX **;I'm not sure what the point of this is, especially since I'm supposed to be the programmer, not my teacher.**
MOV ES,AX **;**
; add your code here
**;the number of elements in LIST is SIZE (I think)
MOV CX,SIZE ;a loop counter, I think
;find the minimum value in LIST and store it into MINIMUM
;begin loop
AGAIN1:
LEA SI,LIST
LEA DI,MINIMUM
MOV AL,[SI]
CMP AL,[SI+1]
If carry flag=1:{I got no idea}
LOOP AGAIN1
;find the maximum value in LIST and store it into MAXIMUM
;Something similar to the other loop, but this time I gotta find the max.
AGAIN2:
LEA SI,LIST
LEA DI,MINIMUM
MOV AL,[SI]
CMP AL,[SI-1] ;???
LOOP AGAIN2
**
; exit to operating system.
MOV AX,4C00H
INT 21H
start endp
ends
end start ; set entry point and stop the assembler.
ret
I'm not positive, but I think you want to move the SIZE variable immediately after the LIST variable:
data segment
LIST DB 05H, 31H, 34H, 30H, 38H, 37H
SIZE=$-OFFSET LIST
MINIMUM DB ?
MAXIMUM DB ?
AVARAGE DB ?
ends
What it does is give you the number of bytes between the current address ($) and the beginning of the LIST variable - thus giving you the size (in bytes) of the list variable itself. Because the LIST is an array of bytes, SIZE will be the actual length of the array. If LIST was an array of WORDS, you'd have to divide SIZE by two. If your teacher wrote that code then perhaps you should leave it alone.
I'm not entirely clear on why your teacher made a stack segment, I can't think of any reason to use it, but perhaps it will become clear in a future assignment. For now, you probably should know that DUP is shorthand for duplicate. This line of code:
DW 128 DUP(0)
Is allocating 128 WORDS of memory initialized to 0.
The following lines of code:
MOV AX,DATA
MOV DS,AX
MOV ES,AX
Are setting up your pointers so that you can loop through the LIST. All you need to know is that, at this point, AX points to the beginning of the data segment and therefor the beginning of your LIST.
As for the rest... it looks like you have an endless loop. What you need to do is this:
Set SI to point to the beginning of the LIST
Set CX to be the length of the LIST, you've done that
Copy the first byte from [SI] to AX
Compare AX to the memory variable MINIMUM
If AX is smaller, copy it to MINIMUM
Increment IS and decriment CX
If CX = 0 (check the ZERO flag) exit the loop, otherwise, go back to #3
Related
I have a function that receives a number from 0 to 10 as an input in R0. Then I need to place the multiplication table from 1 to 10 into an array in the data segment and place the address of the result array in R1.
I have a loop to make the arithmetic operation and have the array setup however I have no idea how to place the values in the array.
Mi original idea is each time the loop runs it calculates an iteration and it stored in the array and so on.
myArray db 1000 dup (0)
.code
MOV R0,#8 ;user input
MOV R11, #9 ;reference to stop loop when it reaches 10th iteration
loop
ADD R10, R10, #1 ;functions as counter
ADD R1,R0,R1 ;add the input number to itserlf and stores it in r1
CMP R11,R10 ;substracts counter from 9
BMI finish ;if negative flag is set it ends the loop
B loop ;if negative flag is zero it continues
finish
end
Any help is much appreciated
Your code is on the right track but it needs some fixing.
To specifically answer your question about load and store, you need to reserve space in memory, make a pointer, and load and store to the location the pointer is pointing to. The pointer can be specified by a register, like R0.
Here is a play list of YT vids that covers all the things you need to make a loop (from memory allocation, to doing load store and looping). At the very least you can watch the code sections, load-store instructions, and looping and branch instructions videos.
Good luck!
I'm using Visual Studio 2012, editing the .asm file inside a windows32 solution.
This is the pseudo code that needs to be changed into assembly:
Declare a 32-bit integer array A[10] in memory
repeat
Prompt for and input user's array length L
until 0 <= L <= 10
for i := 0 to (L-1)
Prompt for and input A[i]
end for
while (First character of prompted input string for searching = 'Y' or 'y')
Prompt for and input value V to be searched
found := FALSE
for i := 0 to (L-1)
if V = A[i] then
found := TRUE
break
end if
end for
if (found) then
Display message that value was found at position i
else
Display message that value was not found
end if
end while
I can manage the inputs, loops, and jumps well enough but the things tripping me up are making the array have a length that the user inputs, and running through the array to compare the values. What would help me out the most is if someone would make and explain a segments of code to help me understand the parts I'm not getting. I've tried searching for it online but nearly everything I've come across uses a different assembler making it hard to dissect.
My code thus far is:
.586
.MODEL FLAT
INCLUDE io.h ; header file for input/output
.STACK 4096
.DATA
lengthA DWORD ?
promptL BYTE "Please enter a length of the array between 0 and 10: ", 0
string BYTE 40 DUP (?)
A DWORD 0 DUP (?)
.CODE
_MainProc PROC
Reread: input promptL, string, 40 ; read ASCII characters
atod string ; convert to integer
;while (promptL < 0 or > 10)
cmp eax, 0
jl Reread
cmp eax, 10
jg Reread
;end while
mov lengthA, eax ; store in memory
_MainProc ENDP
END ; end of source code
After checking that the user input is within range I just hit a brick wall and am not sure how to set up the array A to have the specified length or even if I declared A correctly.
You can't modify the assembly-time constant at runtime. The DUP directive is that, assembly-time directive reserving memory space by emitting the "init" value as many times, as you want, into resulting machine code. That forms a fixed executable, which is limited to that size used during assembling.
As your maximum L is 10, you can reserve array for the maximum possible size:
A DWORD 10 DUP (?)
And then later in the code you will have to fetch the [lengthA] to know how many elements are used (to make it look as dynamically resized array and to not process remaining unused part of that reserve after address A).
Other option is to reserve memory dynamically at runtime, either by calling OS service for allocation of heap memory, or reserving the array on stack. But both options are considerably more advanced than using the fixed-size memory like above.
I don't see in your pseudo code any request for dynamic memory allocation, the solution with fixed size array looks OK to me, especially if you are struggling with it already, then you are probably not ready to program your own dynamic memory manager/allocator.
EDIT: Actually the pseudo code does specify you should reserve fixed size memory, I somehow overlooked it at first read:
Declare a 32-bit integer array A[10] in memory
So example of pseudo code using the [lengthA]:
; for i := 0 to (L-1)
xor edi,edi ; index i
input_loop:
cmp edi,[lengthA]
jge input_loop_ends ; if (i >= L), end for
; TODO: Prompt for and input A[i]
; eax = input value, preserve edi (!)
mov [A + 4*edi],eax ; store value into A[i]
; end for
inc edi ; ++i
jmp input_loop
input_loop_ends:
; memory at address A contains L many DWORD values from user
Could someone help me speed up my Delphi function
To find a value in a byte array without using binary search.
I call this function thousands of times, is it possible to optimize it with assembly?
Thank you so much.
function IndexOf(const List: TArray< Byte >; const Value: byte): integer;
var
I: integer;
begin
for I := Low( List ) to High( List ) do begin
if List[ I ] = Value then
Exit ( I );
end;
Result := -1;
end;
The length of the array is about 15 items.
Well, let's think. At first, please edit this line:
For I := Low( List ) to High( List ) do
(you forgot 'do' at the end). When we compile it without optimization, here is the assembly code for this loop:
Unit1.pas.29: If List [I] = Value then
005C5E7A 8B45FC mov eax,[ebp-$04]
005C5E7D 8B55F0 mov edx,[ebp-$10]
005C5E80 8A0410 mov al,[eax+edx]
005C5E83 3A45FB cmp al,[ebp-$05]
005C5E86 7508 jnz $005c5e90
Unit1.pas.30: Exit (I);
005C5E88 8B45F0 mov eax,[ebp-$10]
005C5E8B 8945F4 mov [ebp-$0c],eax
005C5E8E EB0F jmp $005c5e9f
005C5E90 FF45F0 inc dword ptr [ebp-$10]
Unit1.pas.28: For I := Low (List) to High (List) do
005C5E93 FF4DEC dec dword ptr [ebp-$14]
005C5E96 75E2 jnz $005c5e7a
This code is far from being optimal: local variable i is really local variable, that is: it is stored in RAM, in stack (you can see it by [ebp-$10] adresses, ebp is stack pointer).
So at each new iteration we see how we load address of array into eax register (mov eax, [ebp-$04]),
then we load i from stack into edx register (mov edx, [ebp-$10]),
then we at least load List[i] into al register which is lower byte of eax (mov al, [eax+edx])
after which compare it with argument 'Value' taken again from memory, not from register!
This implementation is extremely slow.
But let's turn optimization on at last! It's done in Project options -> compiling -> code generation. Let's look at new code:
Unit1.pas.29: If List [I] = Value then
005C5E5A 3A1408 cmp dl,[eax+ecx]
005C5E5D 7504 jnz $005c5e63
Unit1.pas.30: Exit (I);
005C5E5F 8BC1 mov eax,ecx
005C5E61 5E pop esi
005C5E62 C3 ret
005C5E63 41 inc ecx
Unit1.pas.28: For I := Low (List) to High (List) do
005C5E64 4E dec esi
005C5E65 75F3 jnz $005c5e5a
now there are just 4 lines of code which gets repeated over and over.
Value is stored inside dl register (lower byte of edx register),
address of 0-th element of array is stored in eax register,
i is stored in ecx register.
So the line 'if List[i] = Value' converts into just 1 assembly line:
005C5E5A 3A1408 cmp dl,[eax+ecx]
the next line is conditional jump, 3 lines after that are executed just once or never (it's if condition is true), and at last there is increment of i,
decrement of loop variable (it's easier to compare it with zero then with anything else)
So, there is little we can do which Delphi compiler with optimizer didn't!
If it's permitted by your program, you can try to reverse direction of search, from last element to first:
For I := High( List ) downto Low( List ) do
this way compiler will be happy to compare i with zero to indicate that we checked everything (this operation is free: when we decrement i and got zero, CPU zero flag turns on!)
But in such implementation behaviour may be different: if you have several entries = Value, you'll get not the first one, but the last one!
Another very easy thing is to declare this IndexOf function as inline: this way you'll probably have no function call here: this code will be inserted at each place where you call it. Function calls are rather slow things.
There are also some crazy methods described in Knuth how to search in simple array as fast as possible, he introduces 'dummy' last element of array which equals your 'Value', that way you don't have to check boundaries (it will alway find something before going out of range), so there is just 1 condition inside loop instead of 2. Another method is 'unrolling' of loop: you write down 2 or 3 or more iterations inside a loop, so there are less jumps per each check, but this has even more downsides: it will be beneficial only for rather large arrays while may make it even slower for arrays with 1 or 2 elements.
As others said: the biggest improvement would be to understand what kind of data you store: does it change frequently or stays the same for long time, do you look for random elements or there are some 'leaders' which gets the most attention. Must these elements be in the same order as you put them or it's allowed to rearrange them as you wish? Then you can choose data structure accordingly. If you look for some 1 or 2 same entries all the time and they can be rearranged, a simple 'Move-to-front' method would be great: you don't just return index but first move element to first place, so it will be found very quickly the next time.
If your arrays are long, you can use the x86 built in string scan REP SCAS.
It is coded in microcode and has a moderate start-up time, but it is
heavily optimized in the CPU and runs fast given long enough data structures (>= 100 bytes).
In fact on a modern CPU it frequently outperforms very clever RISC code.
If your arrays are short, then no amount of optimization of this routine will help, because then your problem is in code not shown in the question, so there is no answer I can give you.
See: http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Internal_Data_Formats_(Delphi)
function IndexOf({$ifndef RunInSeperateThread} const {$endif} List: TArray<byte>; const Value: byte): integer;
//Lock the array if you run this in a separate thread.
{$ifdef CPUX64}
asm
//RCX = List
//DL = byte.
mov r8,[rcx-8] //3 - get the length ASAP.
push rdi //0 - hidden in mov r,m
mov eax,edx //0 - rename
mov rdi,rcx //0 - rename
mov rcx,r8 //0 - rename
mov rdx,r8 //0 - remember the length
//8 cycles setup
repne scasb //2n - repeat until byte found.
pop rdi //1
neg rcx //0
lea rax,[rdx+rcx] //1 result = length - bytes left.
end;
{$ENDIF}
{$ifdef CPUX86}
asm
//EAX = List
//DL = byte.
push edi
mov edi,eax
mov ecx,[eax-4] //get the length
mov eax,edx
mov edx,ecx //remember the length
repne scasb //repeat until byte found.
pop edi
neg ecx
lea eax,[edx+ecx] //result = length - bytes left.
end;
Timings
On my laptop using an array of 1KB with the target byte at the end this gives the following timings (lowest time using a 100.0000 runs)
Code | CPU cycles
| Len=1024 | Len=16
-------------------------------+----------+---------
Your code optimizations off | 5775 | 146
Your code optimizations on | 4540 | 93
X86 my code | 2726 | 60
X64 my code | 2733 | 69
The speed-up is OK (ish), but hardly worth the effort.
If your array's are short, then this code will not help you and you'll have to resort to better other options to optimize your code.
Speed up possible when using binary search
Binary search is a O(log n) operation, vs O(n) for naive search.
Using the same array this will find your data in log2(1024) * CPU cycles per search = 10 * 20 +/- 200 cycles. A 10+ times speed up over my optimized code.
Firstly, if this question is inappropriate because I am not providing any code or not doing any thinking on my own, I apologize, and I will delete this question.
For an assignment, we are required to create an array of nodes to simulate a linked list. Each node has an integer value and a pointer to the next node in the list. Here is my .DATA section
.DATA
linked_list DWORD 5 DUP (?) ;We are allowed to assume the linked list will have 5 items
linked_node STRUCT
value BYTE ?
next BYTE ?
linked_node ENDS
I am unsure if I am defining my STRUCT correctly, as I am unsure of what the type of next should be. Also, I am confused as how to approach this problem. To insert a node into linked_list I should be able to write mov [esi+TYPE linked_list*ecx], correct? Of course, I'd need to inc ecx every time. What I'm confused about is how to do mov linked_node.next, "pointer to next node" Is there some sort of operator that would allow me to set the pointer to the next index in the array equal to a linked_node.next ? Or am I thinking about this incorrectly? Any help would be appreciated!
Think about your design in terms of a language you are familiar with. Preferably C, because pointers and values in C are concepts that map directly to asm.
Let's say you want to keep track of your linked list by storing a pointer to the head element.
#include <stdint.h> // for int8_t
struct node {
int8_t next; // array index. More commonly, you'd use struct node *next;
// negative values for .next are a sentinel, like a NULL pointer, marking the end of the list
int8_t val;
};
struct node storage[5]; // .next field indexes into this array
uint8_t free_position = 0; // when you need a new node, take index = free_position++;
int8_t head = -1; // start with an empty list
There are tricks to reduce corner cases, like having the list head be a full node, rather than just a reference (pointer or index). You can treat it as a first element, instead of having to check for the empty-list case everywhere.
Anyway, given a node reference int8_t p (where p is the standard variable name for a pointer to a list node, in linked list code), the next node is storage[p.next]. The next node's val is storage[p.next].val.
Let's see what this looks like in asm. The NASM manual talks about how it's macro system can help you make code using global structs more readable, but I haven't done any macro stuff for this. You might define macros for NEXT and VAL or something, with 0 and 1, so you can say [storage + rdx*2 + NEXT]. Or even a macro that takes an argument, so you could say [NEXT(rdx*2)]. If you're not careful, you could end up with code that's more confusing to read, though.
section .bss
storage: resw 5 ;; reserve 5 words of zero-initialized space
free_position: db 0 ;; uint8_t free_position = 0;
section .data
head: db -1 ;; int8_t head = -1;
section .text
; p is stored in rdx. It's an integer index into storage
; We'll access storage directly, without loading it into a register.
; (normally you'd have it in a reg, since it would be space you got from malloc/realloc)
; lea rsi, [rel storage] ;; If you want RIP-relative addressing.
;; There is no [RIP+offset + scale*index] addressing mode, because global arrays are for tiny / toy programs.
test edx, edx
js .err_empty_list ;; check for p=empty list (sign-bit means negative)
movsx eax, byte [storage + 2*rdx] ;; load p.next into eax, with sign-extension
test eax, eax
js .err_empty_list ;; check that there is a next element
movsx eax, byte [storage + 2*rax + 1] ;; load storage[p.next].val, sign extended into eax
;; The final +1 in the effective address is because the val byte is 2nd.
;; you could have used a 3rd register if you wanted to keep p.next around for future use
ret ;; or not, if this is just the middle of some larger function
.err_empty_list: ; .symbol is a local symbol, doesn't have to be unique for the whole file
ud2 ; TODO: report an error instead of running an invalid insns
Notice that we get away with shorter instruction encoding by sign-extending into a 32bit reg, not to the full 64bit rax. If the value is negative, we aren't going to use rax as part of an address. We're just using movsx as a way to zero-out the rest of the register, because mov al, [storage + 2*rdx] would leave the upper 56 bits of rax with the old contents.
Another way to do this would be to movzx eax, byte [...] / test al, al, because the 8-bit test is just as fast to encode and execute as a 32bit test instruction. Also, movzx as a load has one cycle lower latency than movsx, on AMD Bulldozer-family CPUs (although they both still take an integer execution unit, unlike Intel where movsx/zx is handled entirely by a load port).
Either way, movsx or movzx is a good way to load 8-bit data, because you avoid problems with reading the full reg after writing a partial reg, and/or a false-dependency (on the previous contents of the upper bits of the reg, even if you know you already zeroed it, the CPU hardware still has to track it). Except if you know you're not optimizing for Intel pre-Haswell, then you don't have to worry about partial-register writes. Haswell does dual-bookkeeping or something to avoid extra uops to merge the partial value with the old full value when reading. AMD CPUs, P4, and Silvermont don't track partial-regs separately from the full-reg, so all you have to worry about is the false dependency.
Also note that you can load the next and val packed together, like
.search_loop:
movzx eax, word [storage + rdx*2] ; next in al, val in ah
test ah, ah
jz .found_a_zero_val
movzx edx, al ; use .next for the next iteration
test al, al
jns .search_loop
;; if we get here, we didn't find a zero val
ret
.found_a_zero_val:
;; do something with the element referred to by `rdx`
Notice how we have to use movzx anyway, because all the registers in an effective address have to be the same size. (So word [storage + al*2] doesn't work.)
This is probably more useful going the other way, to store both fields of a node with a single store, like mov [storage + rdx*2], ax or something, after getting next into al, and val into ah, probably from separate sources. (This is a case where you might want to use a regular byte load, instead of a movzx, if you don't already have it in another register). This isn't a big deal: don't make your code hard to read or more complex just to avoid doing two byte-stores. At least, not until you find out that store-port uops are the bottleneck in some loop.
Using an index into an array, instead of a pointer, can save a lot of space, esp. on 64bit systems where pointers take 8 bytes. If you don't need to free individual nodes (i.e. data structure only ever grows, or is deleted all at once when it is deleted), then an allocator for new nodes is trivial: just keep sticking them at the end of the array, and realloc(3). Or use a c++ std::vector.
With those building blocks, you should be all set to implement the usual linked list algos. Just store bytes with mov [storage + rdx*2], al or whatever.
If you need ideas on how to implement linked lists with clean algos that handle all the special-cases with as few branches as possible, have a look at this Codereview question. It's for Java, but my answer is very C-style. The other answers have some nice tricks, too, some of which I borrowed for my answer. (e.g. using a dummy node avoids branching to handle the insertion-as-a-new-head special case).
Here's a small snippet of assembly code (TASM) where I simply try to increment the value at the current index of the array. The idea is that the "freq" array will store a number (DWord size) that represents how many times that ASCII character was seen in the file. To keep the code short, "b" stores the current byte being read.
Declared in data segment
freq DD 256 DUP (0)
b DB ?
___________
Assume b contains current byte
mov bl, b
sub bh, bh
add bx, bx
inc freq[bx]
I receive this error at compilation time at the line containing "inc freq[bx]": ERROR Argument to operation or instruction has illegal size.
Any insight is greatly appreciated.
There is no inc that can increment a dword in 16 bit mode. You will have to synthesize it from add/adc, such as:
add freq[bx], 1
adc freq[bx + 2], 0
You might need to add a size override, such as word ptr or change your array definition to freq DW 512 DUP (0).
Also note that you have to scale the index by 4, not 2.